WorldWideScience

Sample records for eadgene chicken data-set

  1. Gene set analysis of the EADGENE chicken data-set

    DEFF Research Database (Denmark)

    Skarman, Axel; Jiang, Li; Hornshøj, Henrik

    2009-01-01

     Abstract Background: Gene set analysis is considered to be a way of improving our biological interpretation of the observed expression patterns. This paper describes different methods applied to analyse expression data from a chicken DNA microarray dataset. Results: Applying different gene set...... analyses to the chicken expression data led to different ranking of the Gene Ontology terms tested. A method for prediction of possible annotations was applied. Conclusion: Biological interpretation based on gene set analyses dependent on the statistical method used. Methods for predicting the possible...

  2. Analysis of the real EADGENE data set:

    DEFF Research Database (Denmark)

    Jaffrézic, Florence; de Koning, Dirk-Jan; Boettcher, Paul J

    2007-01-01

    A large variety of methods has been proposed in the literature for microarray data analysis. The aim of this paper was to present techniques used by the EADGENE (European Animal Disease Genomics Network of Excellence) WP1.4 participants for data quality control, normalisation and statistical...... methods for the detection of differentially expressed genes in order to provide some more general data analysis guidelines. All the workshop participants were given a real data set obtained in an EADGENE funded microarray study looking at the gene expression changes following artificial infection with two...... quarters. Very little transcriptional variation was observed for the bacteria S. aureus. Lists of differentially expressed genes found by the different research teams were, however, quite dependent on the method used, especially concerning the data quality control step. These analyses also emphasised...

  3. Analysis of the real EADGENE data set: Multivariate approaches and post analyis

    NARCIS (Netherlands)

    Sorensen, P.; Bonnet, A.; Buitenhuis, B.; Closset, R.; Dejean, S.; Delmas, C.; Duval, M.; Glass, L.; Hedegaard, J.; Hornshoj, H.; Hulsegge, B.; Jaffrezic, F.; Jensen, K.; Jiang, L.; Koning, de D.J.; Lê Cao, K.A.; Nie, H.; Petzl, W.; Pool, M.H.; Robert-Granie, C.; San Cristobal, M.; Lund, M.S.; Schothorst, van E.M.; Schuberth, H.J.; Seyfert, H.M.; Tosser-klopp, G.; Waddington, D.; Watson, D.; Yang, W.; Zerbe, H.

    2007-01-01

    The aim of this paper was to describe, and when possible compare, the multivariate methods used by the participants in the EADGENE WP1.4 workshop. The first approach was for class discovery and class prediction using evidence from the data at hand. Several teams used hierarchical clustering (HC) or

  4. Analysis of the real EADGENE data set::Multivariate approaches and post analysis

    OpenAIRE

    Schuberth Hans-Joachim; van Schothorst Evert M; Lund Mogens; San Cristobal Magali; Robert-Granié Christèle; Pool Marco H; Petzl Wolfram; Nie Haisheng; Cao Kim-Anh; de Koning Dirk-Jan; Jiang Li; Jensen Kirsty; Hulsegge Ina; Jaffrézic Florence; Hornshøj Henrik

    2007-01-01

    Abstract The aim of this paper was to describe, and when possible compare, the multivariate methods used by the participants in the EADGENE WP1.4 workshop. The first approach was for class discovery and class prediction using evidence from the data at hand. Several teams used hierarchical clustering (HC) or principal component analysis (PCA) to identify groups of differentially expressed genes with a similar expression pattern over time points and infective agent (E. coli or S. aureus). The m...

  5. Analysis of the real EADGENE data set: Multivariate approaches and post analysis (Open Access publication

    Directory of Open Access Journals (Sweden)

    Schuberth Hans-Joachim

    2007-11-01

    Full Text Available Abstract The aim of this paper was to describe, and when possible compare, the multivariate methods used by the participants in the EADGENE WP1.4 workshop. The first approach was for class discovery and class prediction using evidence from the data at hand. Several teams used hierarchical clustering (HC or principal component analysis (PCA to identify groups of differentially expressed genes with a similar expression pattern over time points and infective agent (E. coli or S. aureus. The main result from these analyses was that HC and PCA were able to separate tissue samples taken at 24 h following E. coli infection from the other samples. The second approach identified groups of differentially co-expressed genes, by identifying clusters of genes highly correlated when animals were infected with E. coli but not correlated more than expected by chance when the infective pathogen was S. aureus. The third approach looked at differential expression of predefined gene sets. Gene sets were defined based on information retrieved from biological databases such as Gene Ontology. Based on these annotation sources the teams used either the GlobalTest or the Fisher exact test to identify differentially expressed gene sets. The main result from these analyses was that gene sets involved in immune defence responses were differentially expressed.

  6. Analysis of the real EADGENE data set: Comparison of methods and guidelines for data normalisation and selection of differentially expressed genes (Open Access publication

    Directory of Open Access Journals (Sweden)

    Sørensen Peter

    2007-11-01

    Full Text Available Abstract A large variety of methods has been proposed in the literature for microarray data analysis. The aim of this paper was to present techniques used by the EADGENE (European Animal Disease Genomics Network of Excellence WP1.4 participants for data quality control, normalisation and statistical methods for the detection of differentially expressed genes in order to provide some more general data analysis guidelines. All the workshop participants were given a real data set obtained in an EADGENE funded microarray study looking at the gene expression changes following artificial infection with two different mastitis causing bacteria: Escherichia coli and Staphylococcus aureus. It was reassuring to see that most of the teams found the same main biological results. In fact, most of the differentially expressed genes were found for infection by E. coli between uninfected and 24 h challenged udder quarters. Very little transcriptional variation was observed for the bacteria S. aureus. Lists of differentially expressed genes found by the different research teams were, however, quite dependent on the method used, especially concerning the data quality control step. These analyses also emphasised a biological problem of cross-talk between infected and uninfected quarters which will have to be dealt with for further microarray studies.

  7. Analysis of the real EADGENE data set: Comparison of methods and guidelines for data normalisation and selection of differentially expressed genes

    NARCIS (Netherlands)

    Jafrezic, F.; Koning, de D.J.; Boettcher, P.; Bonnet, A.; Buitenhuis, B.; Closset, R.; Dejean, S.; Delmas, C.; Detilleux, J.C.; Dovc, P.; Duval, M.; Foulley, J.L.; Hedegaard, J.; Hoprnshoj, H.; Hulsegge, B.; Janss, L.; Jensen, K.; Jiang, L.; Lavric, M.; Cao Le, K.A.; Lund, M.S.; Malinverni, R.; Marot, G.; Nie, H.; Petzl, W.; Pool, M.H.; Robert-Granie, C.; Cristobal, M.; Schothorst, van E.M.; Schuberth, H.J.; Sorensen, P.; Stella, A.; Tosser-klopp, G.; Waddington, D.; Watson, M.; Yang, M.; Zerbe, H.; Seyfert, H.M.

    2007-01-01

    A large variety of methods has been proposed in the literature for microarray data analysis. The aim of this paper was to present techniques used by the EADGENE (European Animal Disease Genomics Network of Excellence) WP1.4 participants for data quality control, normalisation and statistical methods

  8. Analysis of the real EADGENE data set:

    DEFF Research Database (Denmark)

    Sørensen, Peter; Bonnet, Agnès; Buitenhuis, Bart

    2007-01-01

    approach looked at differential expression of predefined gene sets. Gene sets were defined based on information retrieved from biological databases such as Gene Ontology. Based on these annotation sources the teams used either the GlobalTest or the Fisher exact test to identify differentially expressed......) or principal component analysis (PCA) to identify groups of differentially expressed genes with a similar expression pattern over time points and infective agent (E. coli or S. aureus). The main result from these analyses was that HC and PCA were able to separate tissue samples taken at 24 h following E. coli...... infection from the other samples. The second approach identified groups of differentially co-expressed genes, by identifying clusters of genes highly correlated when animals were infected with E. coli but not correlated more than expected by chance when the infective pathogen was S. aureus. The third...

  9. Data set for the proteomic inventory and quantitative analysis of chicken uterine fluid during eggshell biomineralization

    Directory of Open Access Journals (Sweden)

    Pauline Marie

    2014-12-01

    Full Text Available Chicken eggshell is the protective barrier of the egg. It is a biomineral composed of 95% calcium carbonate on calcitic form and 3.5% organic matrix proteins. Mineralization process occurs in uterus into the uterine fluid. This acellular fluid contains ions and organic matrix proteins precursors which are interacting with the mineral phase and control crystal growth, eggshell structure and mechanical properties. We performed a proteomic approach and identified 308 uterine fluid proteins. Gene Ontology terms enrichments were determined to investigate their potential functions. Mass spectrometry analyses were also combined to label free quantitative analysis to determine the relative abundance of 96 proteins at initiation, rapid growth phase and termination of shell calcification. Sixty four showed differential abundance according to the mineralization stage. Their potential functions have been annotated. The complete proteomic, bioinformatic and functional analyses are reported in Marie et al., J. Proteomics (2015 [1].

  10. Data set for the proteomic inventory and quantitative analysis of chicken eggshell matrix proteins during the primary events of eggshell mineralization and the active growth phase of calcification

    Directory of Open Access Journals (Sweden)

    Pauline Marie

    2015-09-01

    Full Text Available Chicken eggshell is a biomineral composed of 95% calcite calcium carbonate mineral and of 3.5% organic matrix proteins. The assembly of mineral and its structural organization is controlled by its organic matrix. In a recent study [1], we have used quantitative proteomic, bioinformatic and functional analyses to explore the distribution of 216 eggshell matrix proteins at four key stages of shell mineralization defined as: (1 widespread deposition of amorphous calcium carbonate (ACC, (2 ACC transformation into crystalline calcite aggregates, (3 formation of larger calcite crystal units and (4 rapid growth of calcite as columnar structure with preferential crystal orientation. The current article detailed the quantitative analysis performed at the four stages of shell mineralization to determine the proteins which are the most abundant. Additionally, we reported the enriched GO terms and described the presence of 35 antimicrobial proteins equally distributed at all stages to keep the egg free of bacteria and of 81 proteins, the function of which could not be ascribed.

  11. IGBT accelerated aging data set.

    Data.gov (United States)

    National Aeronautics and Space Administration — Preliminary data from thermal overstress accelerated aging using the aging and characterization system. The data set contains aging data from 6 devices, one device...

  12. Ontology-Based Geographic Data Set Integration

    NARCIS (Netherlands)

    Uitermark, Henricus Theodorus Johannes Antonius

    2001-01-01

    Geographic data set integration is particularly important for update propagation, i.e. the reuse of updates from one data set in another data set. In this thesis geographic data set integration (also known as map integration) between two topographic data sets, GBKN and TOP10vector, is described. GBK

  13. Data Sets from Major NCI Initiaves

    Science.gov (United States)

    The NCI Data Catalog includes links to data collections produced by major NCI initiatives and other widely used data sets, including animal models, human tumor cell lines, epidemiology data sets, genomics data sets from TCGA, TARGET, COSMIC, GSK, NCI60.

  14. International Comprehensive Ocean Atmosphere Data Set (ICOADS)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — International Comprehensive Ocean Atmosphere Data Set (ICOADS) consists of digital data set DSI-1173, archived at the National Climatic Data Center (NCDC). ICOADS is...

  15. Accuracy in Robot Generated Image Data Sets

    DEFF Research Database (Denmark)

    Aanæs, Henrik; Dahl, Anders Bjorholm

    2015-01-01

    In this paper we present a practical innovation concerning how to achieve high accuracy of camera positioning, when using a 6 axis industrial robots to generate high quality data sets for computer vision. This innovation is based on the realization that to a very large extent the robots positioning...... error is deterministic, and can as such be calibrated away. We have successfully used this innovation in our efforts for creating data sets for computer vision. Since the use of this innovation has a significant effect on the data set quality, we here present it in some detail, to better aid others...... in using robots for image data set generation....

  16. Ontology-Based Geographic Data Set Integration

    NARCIS (Netherlands)

    Uitermark, Harry T.; Oosterom, Peter J.M.; Mars, Nicolaas J.I.; Molenaar, Martien

    1999-01-01

    In order to develop a system to propagate updates we investigate the semantic and spatial relationships between independently produced geographic data sets of the same region (data set integration). The goal of this system is to reduce operator intervention in update operations between corresponding

  17. Uniform Facility Data Set US (UFDS-1997)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Uniform Facility Data Set (UFDS), formerly the National Drug and Alcohol Treatment Unit Survey or NDATUS, was designed to measure the scope and use of drug abuse...

  18. Long Term Care Minimum Data Set (MDS)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Long-Term Care Minimum Data Set (MDS) is a standardized, primary screening and assessment tool of health status that forms the foundation of the comprehensive...

  19. Benchmark data set for wheat growth models

    DEFF Research Database (Denmark)

    Asseng, S; Ewert, F.; Martre, P;

    2015-01-01

    The data set includes a current representative management treatment from detailed, quality-tested sentinel field experiments with wheat from four contrasting environments including Australia, The Netherlands, India and Argentina. Measurements include local daily climate data (solar radiation, max...

  20. Teaching Nursing Research Using Large Data Sets.

    Science.gov (United States)

    Brosnan, Christine A.; Eriksen, Lillian R.; Lin, Yu-Feng

    2002-01-01

    Describes a process for teaching nursing research via secondary analysis of data sets from the National Center for Health Statistics. Addresses advantages, potential problems and limitations, guidelines for students, and evaluation methods. (Contains 32 references.) (SK)

  1. Long Term Care Minimum Data Set (MDS)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Long-Term Care Minimum Data Set (MDS) is a standardized, primary screening and assessment tool of health status that forms the foundation of the comprehensive...

  2. 2010 Federal STEM Education Inventory Data Set

    Data.gov (United States)

    Office of Science and Technology Policy, Executive Office of the President — This data set provides information for STEM education (pre-kindergarten through graduate) investments funded by Federal agencies at the level of $300,000 or above.

  3. Uniform Facility Data Set US (UFDS-1998)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Uniform Facility Data Set (UFDS) was designed to measure the scope and use of drug abuse treatment services in the United States. The survey collects information...

  4. Health Outcomes Survey - Limited Data Set

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Medicare Health Outcomes Survey (HOS) limited data sets (LDS) are comprised of the entire national sample for a given 2-year cohort (including both respondents...

  5. SIS - Species and Stock Administrative Data Set

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Species and Stock Administrative data set within the Species Information System (SIS) defines entities within the database that serve as the basis for recording...

  6. Intelligent Classification in Huge Heterogeneous Data Sets

    Science.gov (United States)

    2015-06-01

    INTELLIGENT CLASSIFICATION IN HUGE HETEROGENEOUS DATA SETS JUNE 2015 FINAL TECHNICAL REPORT APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED...To) JUL 2013 – APR 2015 4. TITLE AND SUBTITLE INTELLIGENT CLASSIFICATION IN HUGE HETEROGENEOUS DATA SETS 5a. CONTRACT NUMBER IN-HOUSE 5b. GRANT...signals and through data dimension reduction, and to develop and tailor algorithms for the extraction of intelligence from several huge heterogeneous

  7. GOMOS bright limb ozone data set

    Directory of Open Access Journals (Sweden)

    S. Tukiainen

    2015-01-01

    Full Text Available We have created a daytime ozone profile data set from the measurements of the Global Ozone Monitoring by Occultation of Stars (GOMOS instrument on board the Envisat satellite. This so-called GOMOS bright limb (GBL data set contains ~ 358 000 stratospheric daytime ozone profiles measured by GOMOS in 2002–2012. The GBL data set complements the widely used GOMOS night-time data based on stellar occultation measurements. The GBL data set is based on the GOMOS daytime occultations but instead of the transmitted star light, we use limb scattered solar light. The ozone profiles retrieved from these radiance spectra cover 18–60 km tangent height range and have approximately 2–3 km vertical resolution. We show that these profiles are generally in better than 10% agreement with the NDACC (Network for the Detection of Atmospheric Composition Change ozone sounding profiles and with the GOMOS night-time, MLS (Microwave Limb Sounder, and OSIRIS (Optical Spectrograph, and InfraRed Imaging System satellite measurements. However, there is a 10–13% negative bias at 40 km tangent height and a 10–50% positive bias at 50 km when the solar zenith angle > 75°. These biases are most likely caused by stray light which is difficult to characterize and remove entirely from the measured spectra. Nevertheless, the GBL data set approximately doubles the amount of useful GOMOS ozone profiles and improves coverage of the summer pole.

  8. ARM Cloud Retrieval Ensemble Data Set (ACRED)

    Energy Technology Data Exchange (ETDEWEB)

    Zhao, C; Xie, S; Klein, SA; McCoy, R; Comstock, JM; Delanoë, J; Deng, M; Dunn, M; Hogan, RJ; Jensen, MP; Mace, GG; McFarlane, SA; O’Connor, EJ; Protat, A; Shupe, MD; Turner, D; Wang, Z

    2011-09-12

    This document describes a new Atmospheric Radiation Measurement (ARM) data set, the ARM Cloud Retrieval Ensemble Data Set (ACRED), which is created by assembling nine existing ground-based cloud retrievals of ARM measurements from different cloud retrieval algorithms. The current version of ACRED includes an hourly average of nine ground-based retrievals with vertical resolution of 45 m for 512 layers. The techniques used for the nine cloud retrievals are briefly described in this document. This document also outlines the ACRED data availability, variables, and the nine retrieval products. Technical details about the generation of ACRED, such as the methods used for time average and vertical re-grid, are also provided.

  9. Tropical cyclones in reanalysis data sets

    Science.gov (United States)

    Murakami, Hiroyuki

    2014-03-01

    This study evaluates and compares tropical cyclones (TCs) in state-of-the-art reanalysis data sets including the following: the Japanese 55-year Reanalysis (JRA-55), Japanese 25-year Reanalysis, European Centre for Medium-Range Weather Forecasts Reanalysis-40, Interim Reanalysis, National Centers for Environmental Prediction Climate Forecast System Reanalysis, and NASA's Modern Era Retrospective Analysis for Research and Application (MERRA). Most of the reanalyses reproduce a reasonable global spatial distribution of observed TCs and temporal interannual variation of total TC frequency. Of the six reanalysis data sets, JRA-55 appears to be the best in terms of the following: the highest skill for spatial and temporal distribution of TC frequency of occurrence, highest TC hitting rate, lower false alarm rate, reasonable TC structure in terms of the relationship between maximum surface wind speed and sea level pressure, and higher correlation coefficients for interannual variations of TC frequency. These results also suggest that the finest-resolution reanalysis data sets, like MERRA, are not always the best in terms of TC climatology.

  10. Developing a Data-Set for Stereopsis

    Directory of Open Access Journals (Sweden)

    D.W Hunter

    2014-08-01

    Full Text Available Current research on binocular stereopsis in humans and non-human primates has been limited by a lack of available data-sets. Current data-sets fall into two categories; stereo-image sets with vergence but no ranging information (Hibbard, 2008, Vision Research, 48(12, 1427-1439 or combinations of depth information with binocular images and video taken from cameras in fixed fronto-parallel configurations exhibiting neither vergence or focus effects (Hirschmuller & Scharstein, 2007, IEEE Conf. Computer Vision and Pattern Recognition. The techniques for generating depth information are also imperfect. Depth information is normally inaccurate or simply missing near edges and on partially occluded surfaces. For many areas of vision research these are the most interesting parts of the image (Goutcher, Hunter, Hibbard, 2013, i-Perception, 4(7, 484; Scarfe & Hibbard, 2013, Vision Research. Using state-of-the-art open-source ray-tracing software (PBRT as a back-end, our intention is to release a set of tools that will allow researchers in this field to generate artificial binocular stereoscopic data-sets. Although not as realistic as photographs, computer generated images have significant advantages in terms of control over the final output and ground-truth information about scene depth is easily calculated at all points in the scene, even partially occluded areas. While individual researchers have been developing similar stimuli by hand for many decades, we hope that our software will greatly reduce the time and difficulty of creating naturalistic binocular stimuli. Our intension in making this presentation is to elicit feedback from the vision community about what sort of features would be desirable in such software.

  11. The Data Set on the Multiple Abilities

    DEFF Research Database (Denmark)

    Klynge, Alice Heegaard

    2008-01-01

    This paper presents a data set on multiple abilities. The abilities cover the Literacy and Math Ability, the Creative and Innovative Ability, the Learning Ability, the Communication Ability, the Social Competency, the Self-Management Ability, the Environmental Awareness, the Civic Competency......, the Intercultural Awareness, and the Health Awareness. The data stems from a unique cross-sectional survey carried out for the adult population in Denmark. Several dimensions and many questions pinpoint and measure every ability. The dimensions cover areas such as the individual behavior at work, the individual...

  12. Data set for Tifinagh handwriting character recognition

    Directory of Open Access Journals (Sweden)

    Omar Bencharef

    2015-09-01

    Full Text Available The Tifinagh alphabet-IRCAM is the official alphabet of the Amazigh language widely used in North Africa [1]. It includes thirty-one basic letter and two letters each composed of a base letter followed by the sign of labialization. Normalized only in 2003 (Unicode [2], ICRAM-Tifinagh is a young character repertoire. Which needs more work on all levels. In this context we propose a data set for handwritten Tifinagh characters composed of 1376 image; 43 Image For Each character. The dataset can be used to train a Tifinagh character recognition system, or to extract the meaning characteristics of each character.

  13. Data set for Tifinagh handwriting character recognition.

    Science.gov (United States)

    Bencharef, Omar; Chihab, Younes; Mousaid, Nouredine; Oujaoura, Mustapha

    2015-09-01

    The Tifinagh alphabet-IRCAM is the official alphabet of the Amazigh language widely used in North Africa [1]. It includes thirty-one basic letter and two letters each composed of a base letter followed by the sign of labialization. Normalized only in 2003 (Unicode) [2], ICRAM-Tifinagh is a young character repertoire. Which needs more work on all levels. In this context we propose a data set for handwritten Tifinagh characters composed of 1376 image; 43 Image For Each character. The dataset can be used to train a Tifinagh character recognition system, or to extract the meaning characteristics of each character.

  14. The Data Set on the Multiple Abilities

    DEFF Research Database (Denmark)

    Klynge, Alice Heegaard

    2008-01-01

    , the Intercultural Awareness, and the Health Awareness. The data stems from a unique cross-sectional survey carried out for the adult population in Denmark. Several dimensions and many questions pinpoint and measure every ability. The dimensions cover areas such as the individual behavior at work, the individual......This paper presents a data set on multiple abilities. The abilities cover the Literacy and Math Ability, the Creative and Innovative Ability, the Learning Ability, the Communication Ability, the Social Competency, the Self-Management Ability, the Environmental Awareness, the Civic Competency...

  15. Detecting Spammers via Aggregated Historical Data Set

    CERN Document Server

    Menahem, Eitan

    2012-01-01

    The battle between email service providers and senders of mass unsolicited emails (Spam) continues to gain traction. Vast numbers of Spam emails are sent mainly from automatic botnets distributed over the world. One method for mitigating Spam in a computationally efficient manner is fast and accurate blacklisting of the senders. In this work we propose a new sender reputation mechanism that is based on an aggregated historical data-set which encodes the behavior of mail transfer agents over time. A historical data-set is created from labeled logs of received emails. We use machine learning algorithms to build a model that predicts the \\emph{spammingness} of mail transfer agents in the near future. The proposed mechanism is targeted mainly at large enterprises and email service providers and can be used for updating both the black and the white lists. We evaluate the proposed mechanism using 9.5M anonymized log entries obtained from the biggest Internet service provider in Europe. Experiments show that propose...

  16. Modified Homogeneous Data Set of Coronal Intensities

    Science.gov (United States)

    Dorotovič, I.; Minarovjech, M.; Lorenc, M.; Rybanský, M.

    2014-07-01

    The Astronomical Institute of the Slovak Academy of Sciences has published the intensities, recalibrated with respect to a common intensity scale, of the 530.3 nm (Fe xiv) green coronal line observed at ground-based stations up to the year 2008. The name of this publication is Homogeneous Data Set (HDS). We have developed a method that allows one to successfully substitute the ground-based observations by satellite observations and, thus, continue with the publication of the HDS. For this purpose, the observations of the Extreme-ultraviolet Imaging Telescope (EIT), onboard the Solar and Heliospheric Observatory (SOHO) satellite, were exploited. Among other data the EIT instrument provides almost daily 28.4 nm (Fe xv) emission-line snapshots of the corona. The Fe xiv and Fe xv data (4051 observation days) taken in the period 1996 - 2008 have been compared and good agreement was found. The method to obtain the individual data for the HDS follows from the correlation analysis described in this article. The resulting data, now under the name of Modified Homogeneous Data Set (MHDS), are identical up to 1996 to those in the HDS. The MHDS can be used further for studies of the coronal solar activity and its cycle. These data are available at http://www.suh.sk.

  17. Characterization of X-ray data sets

    Energy Technology Data Exchange (ETDEWEB)

    Zwart, Peter H.; Grosse-Kunsteleve, Ralf W.; Adams, Paul D.

    2005-07-21

    With the emergence of structural genomics, more effort is being invested into developing methods that incorporate basic crystallographic knowledge to enhance decision making procedures (e.g. Panjikar, 2005). A key area where some crystallographic knowledge is often vital for the smooth progress of structure solution is that of judging the quality or characteristics of an X-ray dataset. For instance, detecting the presence of anisotropic diffraction or twinning while a crystal is on the beam line, may allow the user to change the data collection strategy in order to obtain a better or a more complete data set. In post-collection analyses, the presence of (for instance) non-crystallographic translational symmetry might help the user (or program!) to solve the structure more easily. Of course, the identification of problems is by no means a guarantee that the problems can be overcome, but knowledge of the idiosyncrasies of a given X-ray data set permits the user or software pipeline to tailor the structure solution and refinement procedures to increase the chances of success. In this report, a number of routines are presented that assist the user in detecting specific problems or features within a given dataset. The routines are made available via the open source CCTBX libraries (http://cctbx.sourceforge.net) and will also be included in the next available PHENIX (Adams, et al., 2004) release.

  18. U.S. MOPEX DATA SET

    Energy Technology Data Exchange (ETDEWEB)

    Schaake, J; Cong, S; Duan, Q

    2006-05-08

    A key step in applying land surface parameterization schemes is to estimate model parameters that vary spatially and are unique to each computational element. Improved methods for parameter estimation (especially for parameters important to runoff response) are needed and require data from a wide range of climate regimes throughout the world. Accordingly, the GEWEX Hydrometeorology Panel (GHP) endorsed the concept of an international Model Parameter Estimation Project (MOPEX) at its Toronto meeting, August 1996. Phase I of MOPEX was funded by NOAA in FY 1997, Phase II in FY 2000 and Phase III in FY 2003. MOPEX was adopted as projects of the IAHS/WMO Committee on GEWEX and of the WMO Commission on Hydrology (CHy) and now is a contributor to the Combine Enhanced Observing Period (CEOP) of the World Climate Research Program (WCRP). In 2004 MOPEX became a Working Group of the IAHS Prediction for Ungaged Basins (PUB) Initiative. MOPEX also is expected to contribute to the work of the Hydrologic Ensemble Prediction Experiment (HEPEX) (Franz et al, 2005). The primary goal of MOPEX is to develop techniques for the a priori estimation of the parameters used in land surface parameterization schemes of atmospheric models and in hydrologic models. A major early effort of MOPEX has been to assemble a large number of high quality historical hydrometeorological and river basin characteristics data sets for a wide range of river basins (500-10,000 km{sup 2}) throughout the world. MOPEX data sets are available via the Internet (ftp://hydrology.nws.noaa.gov). This paper documents the development of data sets for U.S. river basins. Several highly successful parameter estimation workshops have been organized by MOPEX. The first was held as part of the IAHS meeting in Birmingham, England in July, 1999. The second workshop was hosted April, 2002 in Tucson, AZ by SAHRA/University of Arizona. The third MOPEX workshop was held as part of the IAHS meeting in Sapporo, July, 2003. The fourth

  19. Entropy estimates of small data sets

    Energy Technology Data Exchange (ETDEWEB)

    Bonachela, Juan A; Munoz, Miguel A [Departamento de Electromagnetismo y Fisica de la Materia and Instituto de Fisica Teorica y Computacional Carlos I, Facultad de Ciencias, Universidad de Granada, 18071 Granada (Spain); Hinrichsen, Haye [Fakultaet fuer Physik und Astronomie, Universitaet Wuerzburg, Am Hubland, 97074 Wuerzburg (Germany)

    2008-05-23

    Estimating entropies from limited data series is known to be a non-trivial task. Naive estimations are plagued with both systematic (bias) and statistical errors. Here, we present a new 'balanced estimator' for entropy functionals (Shannon, Renyi and Tsallis) specially devised to provide a compromise between low bias and small statistical errors, for short data series. This new estimator outperforms other currently available ones when the data sets are small and the probabilities of the possible outputs of the random variable are not close to zero. Otherwise, other well-known estimators remain a better choice. The potential range of applicability of this estimator is quite broad specially for biological and digital data series. (fast track communication)

  20. Realisation of 3-dimensional data sets.

    Science.gov (United States)

    Brown, D.; Galsgaard, K.; Ireland, J.; Verwichte, E.; Walsh, R.

    The visualisation of three-dimensional objects on two dimensions is a very common problem, but is a tricky one to solve. Every discipline has its way of solving it. The artist uses light-shade interaction, perspective, special colour coding. The architect produces projections of the object. The cartographer uses both colour-coding and shading to represent height elevations. There have been many attempts in the last century by the entertainment industry to produce a three-dimensional illusion, in the fifties it was fashionable to have 3d movies which utilize the anaglyph method. Nowadays one can buy "Magic Eye" postcards which show a hidden three dimensional picture if you stare at it half cross-eyed. This poster attempts to demonstrate how some of these techniques can be applied to three-dimensional data sets that can occur in solar physics.

  1. Spatial occupancy models for large data sets

    Science.gov (United States)

    Johnson, Devin S.; Conn, Paul B.; Hooten, Mevin B.; Ray, Justina C.; Pond, Bruce A.

    2013-01-01

    Since its development, occupancy modeling has become a popular and useful tool for ecologists wishing to learn about the dynamics of species occurrence over time and space. Such models require presence–absence data to be collected at spatially indexed survey units. However, only recently have researchers recognized the need to correct for spatially induced overdisperison by explicitly accounting for spatial autocorrelation in occupancy probability. Previous efforts to incorporate such autocorrelation have largely focused on logit-normal formulations for occupancy, with spatial autocorrelation induced by a random effect within a hierarchical modeling framework. Although useful, computational time generally limits such an approach to relatively small data sets, and there are often problems with algorithm instability, yielding unsatisfactory results. Further, recent research has revealed a hidden form of multicollinearity in such applications, which may lead to parameter bias if not explicitly addressed. Combining several techniques, we present a unifying hierarchical spatial occupancy model specification that is particularly effective over large spatial extents. This approach employs a probit mixture framework for occupancy and can easily accommodate a reduced-dimensional spatial process to resolve issues with multicollinearity and spatial confounding while improving algorithm convergence. Using open-source software, we demonstrate this new model specification using a case study involving occupancy of caribou (Rangifer tarandus) over a set of 1080 survey units spanning a large contiguous region (108 000 km2) in northern Ontario, Canada. Overall, the combination of a more efficient specification and open-source software allows for a facile and stable implementation of spatial occupancy models for large data sets.

  2. International Spinal Cord Injury Male Sexual Function Basic Data Set

    DEFF Research Database (Denmark)

    Alexander, M S; Biering-Sørensen, F; Elliott, S

    2011-01-01

    To create the International Spinal Cord Injury (SCI) Male Sexual Function Basic Data Set within the International SCI Data Sets.......To create the International Spinal Cord Injury (SCI) Male Sexual Function Basic Data Set within the International SCI Data Sets....

  3. International spinal cord injury cardiovascular function basic data set

    DEFF Research Database (Denmark)

    Krassioukov, A; Alexander, M S; Karlsson, Anders Hans;

    2010-01-01

    To create an International Spinal Cord Injury (SCI) Cardiovascular Function Basic Data Set within the framework of the International SCI Data Sets.......To create an International Spinal Cord Injury (SCI) Cardiovascular Function Basic Data Set within the framework of the International SCI Data Sets....

  4. International Spinal Cord Injury Male Sexual Function Basic Data Set

    DEFF Research Database (Denmark)

    Alexander, M S; Biering-Sørensen, F; Elliott, S;

    2011-01-01

    To create the International Spinal Cord Injury (SCI) Male Sexual Function Basic Data Set within the International SCI Data Sets.......To create the International Spinal Cord Injury (SCI) Male Sexual Function Basic Data Set within the International SCI Data Sets....

  5. International spinal cord injury cardiovascular function basic data set

    DEFF Research Database (Denmark)

    Krassioukov, A; Alexander, M S; Karlsson, Anders Hans

    2010-01-01

    To create an International Spinal Cord Injury (SCI) Cardiovascular Function Basic Data Set within the framework of the International SCI Data Sets.......To create an International Spinal Cord Injury (SCI) Cardiovascular Function Basic Data Set within the framework of the International SCI Data Sets....

  6. Users Manual for TMY3 Data Sets (Revised)

    Energy Technology Data Exchange (ETDEWEB)

    Wilcox, S.; Marion, W.

    2008-05-01

    This users manual describes how to obtain and interpret the data in the Typical Meteorological Year version 3 (TMY3) data sets. These data sets are an update to the TMY2 data released by NREL in 1994.

  7. International spinal cord injury pulmonary function basic data set

    DEFF Research Database (Denmark)

    Biering-Sørensen, Fin; Krassioukov, A; Alexander, M S

    2012-01-01

    To develop the International Spinal Cord Injury (SCI) Pulmonary Function Basic Data Set within the framework of the International SCI Data Sets in order to facilitate consistent collection and reporting of basic bronchopulmonary findings in the SCI population.......To develop the International Spinal Cord Injury (SCI) Pulmonary Function Basic Data Set within the framework of the International SCI Data Sets in order to facilitate consistent collection and reporting of basic bronchopulmonary findings in the SCI population....

  8. International spinal cord injury musculoskeletal basic data set

    DEFF Research Database (Denmark)

    Biering-Sørensen, Fin; Burns, A S; Curt, A

    2012-01-01

    To develop an International Spinal Cord Injury (SCI) Musculoskeletal Basic Data Set as part of the International SCI Data Sets to facilitate consistent collection and reporting of basic musculoskeletal findings in the SCI population.Setting:International.......To develop an International Spinal Cord Injury (SCI) Musculoskeletal Basic Data Set as part of the International SCI Data Sets to facilitate consistent collection and reporting of basic musculoskeletal findings in the SCI population.Setting:International....

  9. Ontology-based integration of topographic data sets

    NARCIS (Netherlands)

    Uitermark, HT; van Oosterom, PJM; Mars, NJI; Molenaar, M

    2005-01-01

    The integration of topographic data sets is defined as the process of establishing relationships between corresponding object instances in different, autonomously produced, topographic data sets of the same geographic space. The problem of integrating topographic data sets is in finding these relati

  10. Recent Data Sets on Object Manipulation: A Survey.

    Science.gov (United States)

    Huang, Yongqiang; Bianchi, Matteo; Liarokapis, Minas; Sun, Yu

    2016-12-01

    Data sets is crucial not only for model learning and evaluation but also to advance knowledge on human behavior, thus fostering mutual inspiration between neuroscience and robotics. However, choosing the right data set to use or creating a new data set is not an easy task, because of the variety of data that can be found in the related literature. The first step to tackle this issue is to collect and organize those that are available. In this work, we take a significant step forward by reviewing data sets that were published in the past 10 years and that are directly related to object manipulation and grasping. We report on modalities, activities, and annotations for each individual data set and we discuss our view on its use for object manipulation. We also compare the data sets and summarize them. Finally, we conclude the survey by providing suggestions and discussing the best practices for the creation of new data sets.

  11. International urinary tract imaging basic spinal cord injury data set

    DEFF Research Database (Denmark)

    Biering-Sørensen, F; Craggs, M; Kennelly, M

    2008-01-01

    OBJECTIVE: To create an International Urinary Tract Imaging Basic Spinal Cord Injury (SCI) Data Set within the framework of the International SCI Data Sets. SETTING: An international working group. METHODS: The draft of the Data Set was developed by a working group comprising members appointed...... of the Data Set was developed after review and comments by members of the Executive Committee of the International SCI Standards and Data Sets, the ISCoS Scientific Committee, ASIA Board, relevant and interested international organizations and societies (around 40), individual persons with specific expertise...... of comparable minimal data. RESULTS: The variables included in the International Urinary Tract Imaging Basic SCI Data Set are the results obtained using the following investigations: intravenous pyelography or computer tomography urogram or ultrasound, X-ray, renography, clearance, cystogram, voiding cystogram...

  12. International urodynamic basic spinal cord injury data set

    DEFF Research Database (Denmark)

    Craggs, M.; Kennelly, M.; Schick, E.;

    2008-01-01

    OBJECTIVE: To create the International Urodynamic Basic Spinal Cord Injury (SCI) Data Set within the framework of the International SCI Data Sets. SETTING: International working group. METHODS: The draft of the data set was developed by a working group consisting of members appointed...... by the Neurourology Committee of the International Continence Society, the European Association of Urology, the American Spinal Injury Association (ASIA), the International Spinal Cord Society (ISCoS) and a representative of the Executive Committee of the International SCI Standards and Data Sets. The final version...

  13. Abnormal Returns, Risk, and Options in Large Data Sets

    NARCIS (Netherlands)

    S. Caserta; J. Daníelsson (Jón); C.G. de Vries (Casper)

    1998-01-01

    textabstractLarge data sets in finance with millions of observations have become widely available. Such data sets enable the construction of reliable semi-parametric estimates of the risk associated with extreme price movements. Our approach is based on semi-parametric statistical extreme value

  14. Greenhouse Effect Detection Experiment (GEDEX). Selected data sets

    Science.gov (United States)

    Olsen, Lola M.; Warnock, Archibald, III

    1992-01-01

    This CD-ROM contains selected data sets compiled by the participants of the Greenhouse Effect Detection Experiment (GEDEX) workshop on atmospheric temperature. The data sets include surface, upper air, and/or satellite-derived measurements of temperature, solar irradiance, clouds, greenhouse gases, fluxes, albedo, aerosols, ozone, and water vapor, along with Southern Oscillation Indices and Quasi-Biennial Oscillation statistics.

  15. Experience with automatic orientation from different data sets

    DEFF Research Database (Denmark)

    Potucková, Marketa

    2003-01-01

    by means of spatial resection. This paper describes in details the mentioned procedure as it was used and implemented during tests with two data sets from Denmark. Moreover, the results from a test made with a data set from the Czech Republic are added. It brought a different view to this complex...

  16. Digital data set describing surficial geology in the conterminous US

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This digital data set describes surficial geology of the conterminous United States. The data set was generated from a U.S. Geological Survey 1:7,500,000-scale map...

  17. International spinal cord injury upper extremity basic data set

    DEFF Research Database (Denmark)

    Biering-Sørensen, F; Bryden, A; Curt, A;

    2014-01-01

    OBJECTIVE: To develop an International Spinal Cord Injury (SCI) Upper Extremity Basic Data Set as part of the International SCI Data Sets, which facilitates consistent collection and reporting of basic upper extremity findings in the SCI population. SETTING: International. METHODS: A first draft.......iscos.org.uk). CONCLUSION: The International SCI Upper Extremity Basic Data Set will facilitate consistent collection and reporting of basic upper extremity findings in the SCI population....... of a SCI Upper Extremity Data Set was developed by an international working group. This was reviewed by many different organisations, societies and individuals over several months. A final version was created. VARIABLES: The final version of the International SCI Upper Extremity Data Set contains variables...

  18. A review of selected energy-related data sets

    Energy Technology Data Exchange (ETDEWEB)

    Nicholls, A.K.; Elliott, D.B.; Jones, M.L. [Pacific Northwest Lab., Richland, WA (United States); Hannifan, J.M.; Degroat, K.J.; Eichner, M.J.; King, J.E. [Meridian Corp., Alexandria, VA (United States)

    1992-09-01

    DOE`s Office of Planning and Assessment (OPA) performs crosscutting technical, policy, and environmental assessments of energy technologies and markets. To support these efforts, OPA is in the process of creating a data base management system (DBMS) that will include relevant data compiled from other sources. One of the first steps is a review of selected data sets that may be considered for inclusion in the DBMS. The review covered data sets in five categories: buildings-specific data, industry-specific data, transportation-specific data, utilities-specific data, and crosscutting/general data. Reviewed data sets covered a broad array of energy efficiency, renewable, and/or benchmark technologies. Most data sets reviewed in this report are sponsored by Federal government entities and major industry organizations. Additional data sets reviewed are sponsored by the states of California and New York and regional entities in the Pacific Northwest. Prior to full review, candidate data sets were screened for their utility to OPA. Screening criteria included requirements that a data set be particularly applicable to OPA`s data needs, documented, current, and obtainable. To fully implement its DBMS, OPA will need to expand the review to other data sources, and must carefully consider the implications of differing assumptions and methodologies when comparing data.

  19. A review of selected energy-related data sets

    Energy Technology Data Exchange (ETDEWEB)

    Nicholls, A.K.; Elliott, D.B.; Jones, M.L. (Pacific Northwest Lab., Richland, WA (United States)); Hannifan, J.M.; Degroat, K.J.; Eichner, M.J.; King, J.E. (Meridian Corp., Alexandria, VA (United States))

    1992-09-01

    DOE's Office of Planning and Assessment (OPA) performs crosscutting technical, policy, and environmental assessments of energy technologies and markets. To support these efforts, OPA is in the process of creating a data base management system (DBMS) that will include relevant data compiled from other sources. One of the first steps is a review of selected data sets that may be considered for inclusion in the DBMS. The review covered data sets in five categories: buildings-specific data, industry-specific data, transportation-specific data, utilities-specific data, and crosscutting/general data. Reviewed data sets covered a broad array of energy efficiency, renewable, and/or benchmark technologies. Most data sets reviewed in this report are sponsored by Federal government entities and major industry organizations. Additional data sets reviewed are sponsored by the states of California and New York and regional entities in the Pacific Northwest. Prior to full review, candidate data sets were screened for their utility to OPA. Screening criteria included requirements that a data set be particularly applicable to OPA's data needs, documented, current, and obtainable. To fully implement its DBMS, OPA will need to expand the review to other data sources, and must carefully consider the implications of differing assumptions and methodologies when comparing data.

  20. High Interactivity Visualization Software for Large Computational Data Sets Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Existing scientific visualization tools have specific limitations for large scale scientific data sets. Of these four limitations can be seen as paramount: (i)...

  1. Resident Assessment Instrument/Minimum Data Set (RAI/MDS)

    Data.gov (United States)

    Department of Veterans Affairs — The Resident Assessment Instrument/Minimum Data Set (RAI/MDS) is a comprehensive assessment and care planning process used by the nursing home industry since 1990 as...

  2. Treatment Episode Data Set: Admissions (TEDS-A-2011)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  3. Treatment Episode Data Set: Admissions (TEDS-A-2013)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  4. Treatment Episode Data Set: Admissions (TEDS-A-1994)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  5. Treatment Episode Data Set: Discharges (TEDS-D-2009)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Discharges (TEDS-D) is a national census data system of annual discharges from substance abuse treatment facilities. TEDS-D...

  6. Treatment Episode Data Set: Admissions (TEDS-A-2002)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  7. Treatment Episode Data Set: Discharges (TEDS-D-2010)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Discharges (TEDS-D) is a national census data system of annual discharges from substance abuse treatment facilities. TEDS-D...

  8. Treatment Episode Data Set: Admissions (TEDS-A-2010)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  9. Treatment Episode Data Set: Admissions (TEDS-A-1997)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  10. Treatment Episode Data Set: Admissions (TEDS-A-2001)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  11. Treatment Episode Data Set: Admissions (TEDS-A-1995)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  12. Treatment Episode Data Set: Admissions (TEDS-A-2003)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  13. Treatment Episode Data Set: Admissions (TEDS-A-2004)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  14. Treatment Episode Data Set: Admissions (TEDS-A-2005)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  15. Treatment Episode Data Set: Admissions (TEDS-A-2009)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  16. Treatment Episode Data Set: Discharges (TEDS-D-2006)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Discharges (TEDS-D) is a national census data system of annual discharges from substance abuse treatment facilities. TEDS-D...

  17. Treatment Episode Data Set: Admissions (TEDS-A-2006)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  18. Treatment Episode Data Set: Admissions (TEDS-A-1996)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  19. Treatment Episode Data Set: Admissions (TEDS-A-2012)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  20. Treatment Episode Data Set: Admissions (TEDS-A-1999)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  1. Treatment Episode Data Set: Discharges (TEDS-D-2007)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Discharges (TEDS-D) is a national census data system of annual discharges from substance abuse treatment facilities. TEDS-D...

  2. Treatment Episode Data Set: Admissions (TEDS-A-2008)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  3. Treatment Episode Data Set: Admissions (TEDS-A-1993)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  4. Treatment Episode Data Set: Discharges (TEDS-D-2011)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Discharges (TEDS-D) is a national census data system of annual discharges from substance abuse treatment facilities. TEDS-D...

  5. Treatment Episode Data Set: Admissions (TEDS-A-2000)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  6. Treatment Episode Data Set: Discharges (TEDS-D-2008)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Discharges (TEDS-D) is a national census data system of annual discharges from substance abuse treatment facilities. TEDS-D...

  7. Treatment Episode Data Set: Admissions (TEDS-A-1992)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  8. Treatment Episode Data Set: Admissions (TEDS-A-1998)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  9. Treatment Episode Data Set: Admissions (TEDS-A-2007)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  10. Nursing Minimum Data Set Based on EHR Archetypes Approach.

    Science.gov (United States)

    Spigolon, Dandara N; Moro, Cláudia M C

    2012-01-01

    The establishment of a Nursing Minimum Data Set (NMDS) can facilitate the use of health information systems. The adoption of these sets and represent them based on archetypes are a way of developing and support health systems. The objective of this paper is to describe the definition of a minimum data set for nursing in endometriosis represent with archetypes. The study was divided into two steps: Defining the Nursing Minimum Data Set to endometriosis, and Development archetypes related to the NMDS. The nursing data set to endometriosis was represented in the form of archetype, using the whole perception of the evaluation item, organs and senses. This form of representation is an important tool for semantic interoperability and knowledge representation for health information systems.

  11. Adaptive, multiresolution visualization of large data sets using parallel octrees.

    Energy Technology Data Exchange (ETDEWEB)

    Freitag, L. A.; Loy, R. M.

    1999-06-10

    The interactive visualization and exploration of large scientific data sets is a challenging and difficult task; their size often far exceeds the performance and memory capacity of even the most powerful graphics work-stations. To address this problem, we have created a technique that combines hierarchical data reduction methods with parallel computing to allow interactive exploration of large data sets while retaining full-resolution capability. The hierarchical representation is built in parallel by strategically inserting field data into an octree data structure. We provide functionality that allows the user to interactively adapt the resolution of the reduced data sets so that resolution is increased in regions of interest without sacrificing local graphics performance. We describe the creation of the reduced data sets using a parallel octree, the software architecture of the system, and the performance of this system on the data from a Rayleigh-Taylor instability simulation.

  12. Identification of noise in linear data sets by factor analysis

    Energy Technology Data Exchange (ETDEWEB)

    Roscoe, B.A.; Hopke, P.K.

    1982-01-01

    With the use of atomic and nuclear methods to analyze samples for a multitude of elements, very large data sets have been generated. Due to the ease of obtaining these results with computerized systems, the elemental data acquired are not always as thoroughly checked as they should be leading to some, if not many, bad data points. It is advantageous to have some feeling for the trouble spots in a data set before it is used for further studies. A technique which has the ability to identify bad data points, after the data has been generated, is classical factor analysis. The ability of classical factor analysis to identify two different types of data errors make it ideally suited for scanning large data sets. Since the results yielded by factor analysis indicate correlations between parameters, one must know something about the nature of the data set and the analytical techniques used to obtain it to confidentially isolate errors.

  13. Electronic Health Information Legal Epidemiology Data Set 2014

    Data.gov (United States)

    U.S. Department of Health & Human Services — Authors: Cason Schmit, JD, Gregory Sunshine, JD, Dawn Pepin, JD, MPH, Tara Ramanathan, JD, MPH, Akshara Menon, JD, MPH, Matthew Penn, JD, MLIS This legal data set...

  14. Portland, Oregon Test Data Set Arterial Loop Detector Data

    Data.gov (United States)

    Department of Transportation — This set of data files was acquired under USDOT FHWA cooperative agreement DTFH61-11-H-00025 as one of the four test data sets acquired by the USDOT Data Capture and...

  15. Digital data sets describing metropolitan areas in the conterminous US

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set describes metropolitan areas in the conterminous United States, developed from U.S. Bureau of the Census boundaries of Consolidated Metropolitan...

  16. High Interactivity Visualization Software for Large Computational Data Sets Project

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to develop a collection of computer tools and libraries called SciViz that enable researchers to visualize large scale data sets on HPC resources remotely...

  17. Novel Visualization of Large Health Related Data Sets

    Science.gov (United States)

    2014-03-01

    locations (e.g. areas with high pollen that increases the need for more intensive health care for people with asthma ) and save millions of dollars...be used as a means to explore novel visualizations of large health data sets. We expect this approach to digitized healthcare data will lead to...AD_________ Award Number: W81XWH-13-1-0061 TITLE: Novel Visualization of Large Health Related Data Sets PRINCIPAL INVESTIGATOR: William Ed

  18. International Spinal Cord Injury Urinary Tract Infection Basic Data Set

    DEFF Research Database (Denmark)

    Goetz, L L; Cardenas, D D; Kennelly, M

    2013-01-01

    To develop an International Spinal Cord Injury (SCI) Urinary Tract Infection (UTI) Basic Data Set presenting a standardized format for the collection and reporting of a minimal amount of information on UTIs in daily practice or research.......To develop an International Spinal Cord Injury (SCI) Urinary Tract Infection (UTI) Basic Data Set presenting a standardized format for the collection and reporting of a minimal amount of information on UTIs in daily practice or research....

  19. Identification of noise in linear data sets by factor analysis

    Energy Technology Data Exchange (ETDEWEB)

    Roscoe, B.A.; Hopke, Ph.K. (Illinois Univ., Urbana (USA))

    1982-01-01

    A technique which has the ability to identify bad data points, after the data has been generated, is classical factor analysis. The ability of classical factor analysis to identify two different types of data errors make it ideally suited for scanning large data sets. Since the results yielded by factor analysis indicate correlations between parameters, one must know something about the nature of the data set and the analytical techniques used to obtain it to confidentially isolate errors.

  20. The International Spinal Cord Injury Pain Basic Data Set

    DEFF Research Database (Denmark)

    Widerstrom-Noga, E.; Bryce, T.; Cardenas, D.D.

    2008-01-01

    Objective:To develop a basic pain data set (International Spinal Cord Injury Basic Pain Data Set, ISCIPDS:B) within the framework of the International spinal cord injury (SCI) data sets that would facilitate consistent collection and reporting of pain in the SCI population.Setting:International.M......Objective:To develop a basic pain data set (International Spinal Cord Injury Basic Pain Data Set, ISCIPDS:B) within the framework of the International spinal cord injury (SCI) data sets that would facilitate consistent collection and reporting of pain in the SCI population...... core questions about clinically relevant information concerning SCI-related pain that can be collected by health-care professionals with expertise in SCI in various clinical settings. The questions concern pain severity, physical and emotional function and include a pain-intensity rating, a pain...... classification and questions related to the temporal pattern of pain for each specific pain problem. The impact of pain on physical, social and emotional function, and sleep is evaluated for each pain.Spinal Cord (2008) 46, 818-823; doi:10.1038/sc.2008.64; published online 3 June 2008 Udgivelsesdato: 2008/12...

  1. Detecting gallbladders in chicken livers using spectral analysis

    DEFF Research Database (Denmark)

    Jørgensen, Anders; Mølvig Jensen, Eigil; Moeslund, Thomas B.

    2015-01-01

    This paper presents a method for detecting gallbladders attached to chicken livers using spectral imaging. Gallbladders can contaminate good livers, making them unfit for human consumption. A data set consisting of chicken livers with and without gallbladders, has been captured using 33 wavelengt...

  2. Detecting gallbladders in chicken livers using spectral analysis

    DEFF Research Database (Denmark)

    Jørgensen, Anders; Mølvig Jensen, Eigil; Moeslund, Thomas B.

    2015-01-01

    This paper presents a method for detecting gallbladders attached to chicken livers using spectral imaging. Gallbladders can contaminate good livers, making them unfit for human consumption. A data set consisting of chicken livers with and without gallbladders, has been captured using 33 wavelengths...

  3. Lacunarity Definition For Ramified Data Sets Based On Optimal Cover

    Energy Technology Data Exchange (ETDEWEB)

    Tolle, Charles Robert; Rohrbaugh, David Thomas; Timothy R. McJunkin

    2003-05-01

    Lacunarity is a measure of how data fills space. It complements fractal dimension, which measures how much space is filled. This paper discusses the limitations of the standard gliding box algorithm for calculating lacunarity, which leads to a re-examination of what lacunarity is meant to describe. Two new lacunarity measures for ramified data sets are then presented that more directly measure the gaps in a ramified data set. These measures are rigorously defined. An algorithm for estimating the new lacunarity measure, using Fuzzy-C means clustering algorithm, is developed. The lacunarity estimation algorithm is used to analyze two- and three-dimensional Cantor dusts. Applications for these measures include biological modeling and target detection within ramified data sets.

  4. Conserved Quantities of harmonic asymptotic initial data sets

    CERN Document Server

    Chen, Po-Ning

    2014-01-01

    In the first half of this article, we survey the new quasi-local and total angular momentum and center of mass defined in [9] and summarize the important properties of these definitions. To compute these conserved quantities involves solving a nonlinear PDE system (the optimal isometric embedding equation), which is rather difficult in general. We found a large family of initial data sets on which such a calculation can be carried out effectively. These are initial data sets of harmonic asymptotics, first proposed by Corvino and Schoen to solve the full vacuum constraint equation. In the second half of this article, the new total angular momentum and center of mass for these initial data sets are computed explicitly.

  5. [Essential data set's archetypes for nursing care of endometriosis patients].

    Science.gov (United States)

    Spigolon, Dandara Novakowski; Moro, Claudia Maria Cabral

    2012-12-01

    This study aimed to develop an Essential Data Set for Nursing Care of Patients with Endometriosis (CDEEPE), represented by archetypes. An exploratory applied research with specialists' participation that was carried out at Heath Informatics Laboratory of PUCPR, between February and November of 2010. It was divided in two stages: CDEEPE construction and evaluation including Nursing Process phases and Basic Human Needs, and archetypes development based on this data set. CDEEPE was evaluated by doctors and nurses with 95.9% of consensus and containing 51 data items. The archetype "Perception of Organs and Senses" was created to represents this data set. This study allowed identifying important information for nursing practices contributing to computerization and application of nursing process during care. The CDEEPE was the basis for archetype creation, that will make possible structured, organized, efficient, interoperable, and semantics records.

  6. Feature Selection Strategies for Classifying High Dimensional Astronomical Data Sets

    CERN Document Server

    Donalek, Ciro; Djorgovski, S G; Mahabal, Ashish A; Graham, Matthew J; Fuchs, Thomas J; Turmon, Michael J; Philip, N Sajeeth; Yang, Michael Ting-Chang; Longo, Giuseppe

    2013-01-01

    The amount of collected data in many scientific fields is increasing, all of them requiring a common task: extract knowledge from massive, multi parametric data sets, as rapidly and efficiently possible. This is especially true in astronomy where synoptic sky surveys are enabling new research frontiers in the time domain astronomy and posing several new object classification challenges in multi dimensional spaces; given the high number of parameters available for each object, feature selection is quickly becoming a crucial task in analyzing astronomical data sets. Using data sets extracted from the ongoing Catalina Real-Time Transient Surveys (CRTS) and the Kepler Mission we illustrate a variety of feature selection strategies used to identify the subsets that give the most information and the results achieved applying these techniques to three major astronomical problems.

  7. International bowel function extended spinal cord injury data set

    DEFF Research Database (Denmark)

    Krogh, K; Perkash, I; Stiens, S A;

    2008-01-01

    STUDY DESIGN: International expert working group.Objective:To develop an International Bowel Function Extended Spinal Cord Injury (SCI) Data Set presenting a standardized format for the collection and reporting of an extended amount of information on bowel function. SETTING: Working group...... consisting of members appointed by the American Spinal Injury Association (ASIA) and the International Spinal Cord Society (ISCoS). METHODS: A draft prepared by the working group was reviewed by Executive Committee of the International SCI Standards and Data Sets and later by the ISCoS Scientific Committee...

  8. International bowel function basic spinal cord injury data set

    DEFF Research Database (Denmark)

    Krogh, K; Perkash, I; Stiens, S A;

    2008-01-01

    STUDY DESIGN: International expert working group. OBJECTIVE: To develop an International Bowel Function Basic Spinal Cord Injury (SCI) Data Set presenting a standardized format for the collection and reporting of a minimal amount of information on bowel function in daily practice or in research....... SETTING: Working group consisting of members appointed by the American Spinal Injury Association (ASIA) and the International Spinal Cord Society (ISCoS). METHODS: A draft prepared by the working group was reviewed by Executive Committee of the International SCI Standards and Data Sets, and later by ISCo...

  9. Chicken Art

    Science.gov (United States)

    Bickett, Marianne

    2009-01-01

    In this article, the author describes how a visit from a flock of chickens provided inspiration for the children's chicken art. The gentle clucking of the hens, the rooster crowing, and the softness of the feathers all provided rich aural, tactile, visual, and emotional experiences. The experience affirms the importance and value of direct…

  10. Chicken Toast

    Institute of Scientific and Technical Information of China (English)

    1998-01-01

    Ingredients: 200 grams chicken breast; 50 grams sliced bread; 5 grams vegetable oil; one egg; minced ginger root and scallions; 25 grams Shredded radish; vinegar; sugar; salt and pepper to taste. Method: First chop the chicken and mix it with the vegetable oil, a beaten egg, ginger, scallions, Salt

  11. Chicken Art

    Science.gov (United States)

    Bickett, Marianne

    2009-01-01

    In this article, the author describes how a visit from a flock of chickens provided inspiration for the children's chicken art. The gentle clucking of the hens, the rooster crowing, and the softness of the feathers all provided rich aural, tactile, visual, and emotional experiences. The experience affirms the importance and value of direct…

  12. Data Sets, Ensemble Cloud Computing, and the University Library (Invited)

    Science.gov (United States)

    Plale, B. A.

    2013-12-01

    The environmental researcher at the public university has new resources at their disposal to aid in research and publishing. Cloud computing provides compute cycles on demand for analysis and modeling scenarios. Cloud computing is attractive for e-Science because of the ease with which cores can be accessed on demand, and because the virtual machine implementation that underlies cloud computing reduces the cost of porting a numeric or analysis code to a new platform. At the university, many libraries at larger universities are developing the e-Science skills to serve as repositories of record for publishable data sets. But these are confusing times for the publication of data sets from environmental research. The large publishers of scientific literature are advocating a process whereby data sets are tightly tied to a publication. In other words, a paper published in the scientific literature that gives results based on data, must have an associated data set accessible that backs up the results. This approach supports reproducibility of results in that publishers maintain a repository for the papers they publish, and the data sets that the papers used. Does such a solution that maps one data set (or subset) to one paper fit the needs of the environmental researcher who among other things uses complex models, mines longitudinal data bases, and generates observational results? The second school of thought has emerged out of NSF, NOAA, and NASA funded efforts over time: data sets exist coherent at a location, such as occurs at National Snow and Ice Data Center (NSIDC). But when a collection is coherent, reproducibility of individual results is more challenging. We argue for a third complementary option: the university repository as a location for data sets produced as a result of university-based research. This location for a repository relies on the expertise developing in the university libraries across the country, and leverages tools, such as are being developed

  13. MetaPhinder-Identifying Bacteriophage Sequences in Metagenomic Data Sets

    DEFF Research Database (Denmark)

    Jurtz, Vanessa Isabell; Villarroel, Julia; Lund, Ole

    2016-01-01

    and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e. contigs) of phage origin in metage-nomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic...

  14. A Penrose-Like Inequality for General Initial Data Sets

    CERN Document Server

    Khuri, Marcus A

    2009-01-01

    We establish a Penrose-Like Inequality for general (not necessarily time symmetric) initial data sets of the Einstein equations which satisfy the dominant energy condition. More precisely, it is shown that the ADM energy is bounded below by an expression which is proportional to the square root of the area of the outermost future (or past) apparent horizon.

  15. Satellite data sets for the atmospheric radiation measurement (ARM) program

    Energy Technology Data Exchange (ETDEWEB)

    Shi, L.; Bernstein, R.L. [SeaSpace Corp., San Diego, CA (United States)

    1996-04-01

    This abstract describes the type of data obtained from satellite measurements in the Atmospheric Radiation Measurement (ARM) program. The data sets have been widely used by the ARM team to derive cloud-top altitude, cloud cover, snow and ice cover, surface temperature, water vapor, and wind, vertical profiles of temperature, and continuoous observations of weather needed to track and predict severe weather.

  16. Using Large Data Sets to Study College Education Trajectories

    Science.gov (United States)

    Oseguera, Leticia; Hwang, Jihee

    2014-01-01

    This chapter presents various considerations researchers undertook to conduct a quantitative study on low-income students using a national data set. Specifically, it describes how a critical quantitative scholar approaches guiding frameworks, variable operationalization, analytic techniques, and result interpretation. Results inform how…

  17. From sensors to human spatial concepts: An annotated data set

    NARCIS (Netherlands)

    Zivkovic, Z.; Booij, O.; Kröse, B.; Topp, E.A.; Christensen, H.I.

    2008-01-01

    An annotated data set is presented meant to help researchers in developing, evaluating, and comparing various approaches in robotics for building space representations appropriate for communicating with humans. The data consist of omnidirectional images, laser range scans, sonar readings, and robot

  18. Imperfect World of $\\beta\\beta$-decay Nuclear Data Sets

    CERN Document Server

    Pritychenko, B

    2015-01-01

    The precision of double-beta ($\\beta\\beta$) decay experimental half lives and their uncertainties is reanalyzed. The method of Benford's distributions has been applied to nuclear reaction, structure and decay data sets. First-digit distribution trend for $\\beta\\beta$-decay T$_{1/2}^{2\

  19. IGBP-DIS soil data set for pedotransfer function development

    NARCIS (Netherlands)

    Tempel, P.; Batjes, N.H.; Engelen, van V.W.P.

    1996-01-01

    At the request of the Global Soil Data Task (GSDT) of the Data and Information System of the International Geosphere Biosphere Programme (IGBP-DIS), ISRIC prepared a uniform soil data set for the development of pedotransfer functions. The necessary chemical and physical soil data have been derived

  20. MSTWG Multistatic Tracker Evaluation Using Simulated Scenario Data Sets

    NARCIS (Netherlands)

    Theije, P.A.M. de; Cour, B.R. la; Lang, T.; Willett, P.; Grimmett, D.; Coraluppi, S.; Hempel, C.G.

    2008-01-01

    The Multistatic Tracking Working Group (MSTWG) was formed in 2005 by an international group of researchers interested in developing and improving tracking capabilities when applied to multistatic sonar and radar problems. The MSTWG developed several simulated multistatic sonar scenario data sets for

  1. AOIPS data base management systems support for GARP data sets

    Science.gov (United States)

    Gary, J. P.

    1977-01-01

    A data base management system is identified, developed to provide flexible access to data sets produced by GARP during its data systems tests. The content and coverage of the data base are defined and a computer-aided, interactive information storage and retrieval system, implemented to facilitate access to user specified data subsets, is described. The computer programs developed to provide the capability were implemented on the highly interactive, minicomputer-based AOIPS and are referred to as the data retrieval system (DRS). Implemented as a user interactive but menu guided system, the DRS permits users to inventory the data tape library and create duplicate or subset data sets based on a user selected window defined by time and latitude/longitude boundaries. The DRS permits users to select, display, or produce formatted hard copy of individual data items contained within the data records.

  2. Generation new MP3 data set after compression

    Science.gov (United States)

    Atoum, Mohammed Salem; Almahameed, Mohammad

    2016-02-01

    The success of audio steganography techniques is to ensure imperceptibility of the embedded secret message in stego file and withstand any form of intentional or un-intentional degradation of secret message (robustness). Crucial to that using digital audio file such as MP3 file, which comes in different compression rate, however research studies have shown that performing steganography in MP3 format after compression is the most suitable one. Unfortunately until now the researchers can not test and implement their algorithm because no standard data set in MP3 file after compression is generated. So this paper focuses to generate standard data set with different compression ratio and different Genre to help researchers to implement their algorithms.

  3. Measures for the characterisation of pattern-recognition data sets

    CSIR Research Space (South Africa)

    Van der Walt, Christiaan M

    2007-11-01

    Full Text Available are summarised in Table 2. We abbreviate dimensionality as d, number of samples as N and number of classes as C. The number of numerical attributes is abbreviated as d(Num) and the number of categorical attributes as d(Cat). The Diabetes, Heart, Australian...- Table 2: Summary of real-world data sets Data set d(Num) d(Cat) d N C Iris 4 - 4 150 4 Balance-scale 4 - 4 625 3 Diabetes 4 4 8 768 2 Tic-tac-toe - 9 9 958 2 Heart 7 6 13 270 2 Australian 6 9 15 690 2 Vehicle 18 - 18 846 4 German 7 13 20 1000 2...

  4. A large-scale crop protection bioassay data set.

    Science.gov (United States)

    Gaulton, Anna; Kale, Namrata; van Westen, Gerard J P; Bellis, Louisa J; Bento, A Patrícia; Davies, Mark; Hersey, Anne; Papadatos, George; Forster, Mark; Wege, Philip; Overington, John P

    2015-01-01

    ChEMBL is a large-scale drug discovery database containing bioactivity information primarily extracted from scientific literature. Due to the medicinal chemistry focus of the journals from which data are extracted, the data are currently of most direct value in the field of human health research. However, many of the scientific use-cases for the current data set are equally applicable in other fields, such as crop protection research: for example, identification of chemical scaffolds active against a particular target or endpoint, the de-convolution of the potential targets of a phenotypic assay, or the potential targets/pathways for safety liabilities. In order to broaden the applicability of the ChEMBL database and allow more widespread use in crop protection research, an extensive data set of bioactivity data of insecticidal, fungicidal and herbicidal compounds and assays was collated and added to the database.

  5. Causal Information Approach to Partial Conditioning in Multivariate Data Sets

    Directory of Open Access Journals (Sweden)

    D. Marinazzo

    2012-01-01

    Full Text Available When evaluating causal influence from one time series to another in a multivariate data set it is necessary to take into account the conditioning effect of the other variables. In the presence of many variables and possibly of a reduced number of samples, full conditioning can lead to computational and numerical problems. In this paper, we address the problem of partial conditioning to a limited subset of variables, in the framework of information theory. The proposed approach is tested on simulated data sets and on an example of intracranial EEG recording from an epileptic subject. We show that, in many instances, conditioning on a small number of variables, chosen as the most informative ones for the driver node, leads to results very close to those obtained with a fully multivariate analysis and even better in the presence of a small number of samples. This is particularly relevant when the pattern of causalities is sparse.

  6. Imperfect World of beta beta-decay Nuclear Data Sets

    Energy Technology Data Exchange (ETDEWEB)

    Pritychenko, B. [Brookhaven National Lab. (BNL), Upton, NY (United States). NNDC

    2015-01-03

    The precision of double-beta ββ-decay experimental half lives and their uncertainties is reanalyzed. The method of Benford's distributions has been applied to nuclear reaction, structure and decay data sets. First-digit distribution trend for ββ-decay T2v1/2 is consistent with large nuclear reaction and structure data sets and provides validation of experimental half-lives. A complementary analysis of the decay uncertainties indicates deficiencies due to small size of statistical samples, and incomplete collection of experimental information. Further experimental and theoretical efforts would lead toward more precise values of-decay half-lives and nuclear matrix elements.

  7. A large-scale crop protection bioassay data set

    Science.gov (United States)

    Gaulton, Anna; Kale, Namrata; van Westen, Gerard J. P.; Bellis, Louisa J.; Bento, A. Patrícia; Davies, Mark; Hersey, Anne; Papadatos, George; Forster, Mark; Wege, Philip; Overington, John P.

    2015-07-01

    ChEMBL is a large-scale drug discovery database containing bioactivity information primarily extracted from scientific literature. Due to the medicinal chemistry focus of the journals from which data are extracted, the data are currently of most direct value in the field of human health research. However, many of the scientific use-cases for the current data set are equally applicable in other fields, such as crop protection research: for example, identification of chemical scaffolds active against a particular target or endpoint, the de-convolution of the potential targets of a phenotypic assay, or the potential targets/pathways for safety liabilities. In order to broaden the applicability of the ChEMBL database and allow more widespread use in crop protection research, an extensive data set of bioactivity data of insecticidal, fungicidal and herbicidal compounds and assays was collated and added to the database.

  8. Haplotype estimation for biobank-scale data sets.

    Science.gov (United States)

    O'Connell, Jared; Sharp, Kevin; Shrine, Nick; Wain, Louise; Hall, Ian; Tobin, Martin; Zagury, Jean-Francois; Delaneau, Olivier; Marchini, Jonathan

    2016-07-01

    The UK Biobank (UKB) has recently released genotypes on 152,328 individuals together with extensive phenotypic and lifestyle information. We present a new phasing method, SHAPEIT3, that can handle such biobank-scale data sets and results in switch error rates as low as ∼0.3%. The method exhibits O(NlogN) scaling with sample size N, enabling fast and accurate phasing of even larger cohorts.

  9. Updated and new data sets for WDMAM 2011

    Science.gov (United States)

    Korhonen, Juha V.

    2010-05-01

    Poster outlines major updated and released data sets for WDMAM2011, including MF6-model, ADMAP profile and point data, Arctic subsets, NURE, joint Russia and nearby countries, Australia version 4, NGDC oceanic profiles, Ishihara group oceanic set, Getech 7,5 minutes grid, and some new coverage on oceans and continental areas like Cuba, British Guyana and further South American and African countries. More information at http://projects.gtk.fi/WDMAM/

  10. Novel Visualization of Large Health Related Data Sets - NPHRD

    Science.gov (United States)

    2015-11-01

    data visualization in health care, most focusing on the technical aspects of visualization, medical imaging , and genomics. A number of prototypes have...been also been reported. LifeLines, first described in 1996 by Plaisant and colleagues, 7,8 was used to visualize health data across a personal ...Award Number: W81XWH-13-1-0061 TITLE: Novel Visualization of Large Health Related Data Sets - NPHRD PRINCIPAL INVESTIGATOR: William Ed Hammond

  11. Some Statistical Properties of the ELITE Data Set

    NARCIS (Netherlands)

    Lerou, R.J.L.; Heuvel, J.C. van den; Driesenaar, M.L.

    1998-01-01

    We present some remarkable results of our analysis of the lidar data set of the ELITE measurements at 1.06 um. The data are obtained from Dr. W. Renger (Deutsche Forschungsanstalt für Luft- und Raumfahrt e.V., DLR Institut für Fysik der Atmosphäre) and from Dr. J. Pelon (Service d'Aeronomie du CNRS,

  12. A global surface drifter data set at hourly resolution

    Science.gov (United States)

    Elipot, Shane; Lumpkin, Rick; Perez, Renellys C.; Lilly, Jonathan M.; Early, Jeffrey J.; Sykulski, Adam M.

    2016-05-01

    The surface drifting buoys, or drifters, of the Global Drifter Program (GDP) are predominantly tracked by the Argos positioning system, providing drifter locations with O(100 m) errors at nonuniform temporal intervals, with an average interval of 1.2 h since January 2005. This data set is thus a rich and global source of information on high-frequency and small-scale oceanic processes, yet is still relatively understudied because of the challenges associated with its large size and sampling characteristics. A methodology is described to produce a new high-resolution global data set since 2005, consisting of drifter locations and velocities estimated at hourly intervals, along with their respective errors. Locations and velocities are obtained by modeling locally in time trajectories as a first-order polynomial with coefficients obtained by maximizing a likelihood function. This function is derived by modeling the Argos location errors with t location-scale probability distribution functions. The methodology is motivated by analyzing 82 drifters tracked contemporaneously by Argos and by the Global Positioning System, where the latter is assumed to provide true locations. A global spectral analysis of the velocity variance from the new data set reveals a sharply defined ridge of energy closely following the inertial frequency as a function of latitude, distinct energy peaks near diurnal and semidiurnal frequencies, as well as higher-frequency peaks located near tidal harmonics as well as near replicates of the inertial frequency. Compared to the spectra that can be obtained using the standard 6-hourly GDP product, the new data set contains up to 100% more spectral energy at some latitudes.

  13. Generating an Ordered Data Set from an OCR Text File

    Directory of Open Access Journals (Sweden)

    Jon Crump

    2014-11-01

    Full Text Available This tutorial illustrates strategies for taking raw OCR output from a scanned text, parsing it to isolate and correct essential elements of metadata, and generating an ordered data set (a python dictionary from it. These illustrations are specific to a particular text, but the overall strategy, and some of the individual procedures, can be adapted to organize any scanned text, even if it doesn’t look like this one.

  14. Discovering Psychological Principles by Mining Naturally Occurring Data Sets.

    Science.gov (United States)

    Goldstone, Robert L; Lupyan, Gary

    2016-07-01

    The very expertise with which psychologists wield their tools for achieving laboratory control may have had the unwelcome effect of blinding psychologists to the possibilities of discovering principles of behavior without conducting experiments. When creatively interrogated, a diverse range of large, real-world data sets provides powerful diagnostic tools for revealing principles of human judgment, perception, categorization, decision-making, language use, inference, problem solving, and representation. Examples of these data sets include patterns of website links, dictionaries, logs of group interactions, collections of images and image tags, text corpora, history of financial transactions, trends in twitter tag usage and propagation, patents, consumer product sales, performance in high-stakes sporting events, dialect maps, and scientific citations. The goal of this issue is to present some exemplary case studies of mining naturally existing data sets to reveal important principles and phenomena in cognitive science, and to discuss some of the underlying issues involved with conducting traditional experiments, analyses of naturally occurring data, computational modeling, and the synthesis of all three methods.

  15. Widespread Contamination of Arabidopsis Embryo and Endosperm Transcriptome Data Sets.

    Science.gov (United States)

    Schon, Michael A; Nodine, Michael D

    2017-04-01

    A major goal of global gene expression profiling in plant seeds has been to investigate the parental contributions to the transcriptomes of early embryos and endosperm. However, consistency between independent studies has been poor, leading to considerable debate. We have developed a statistical tool that reveals the presence of substantial RNA contamination from maternal tissues in nearly all published Arabidopsis thaliana endosperm and early embryo transcriptomes generated in these studies. We demonstrate that maternal RNA contamination explains the poor reproducibility of these transcriptomic data sets. Furthermore, we found that RNA contamination from maternal tissues has been repeatedly misinterpreted as epigenetic phenomena, which has resulted in inaccurate conclusions regarding the parental contributions to both the endosperm and early embryo transcriptomes. After accounting for maternal RNA contamination, no published genome-wide data set supports the concept of delayed paternal genome activation in plant embryos. Moreover, our analysis suggests that maternal and paternal genomic imprinting are equally rare events in Arabidopsis endosperm. Our publicly available software (https://github.com/Gregor-Mendel-Institute/tissue-enrichment-test) can help the community assess the level of contamination in transcriptome data sets generated from both seed and non-seed tissues. © 2017 American Society of Plant Biologists. All rights reserved.

  16. Cooperative inversion of magnetotelluric and seismic data sets

    Science.gov (United States)

    Markovic, M.; Santos, F.

    2012-04-01

    Cooperative inversion of magnetotelluric and seismic data sets Milenko Markovic,Fernando Monteiro Santos IDL, Faculdade de Ciências da Universidade de Lisboa 1749-016 Lisboa Inversion of single geophysical data has well-known limitations due to the non-linearity of the fields and non-uniqueness of the model. There is growing need, both in academy and industry to use two or more different data sets and thus obtain subsurface property distribution. In our case ,we are dealing with magnetotelluric and seismic data sets. In our approach,we are developing algorithm based on fuzzy-c means clustering technique, for pattern recognition of geophysical data. Separate inversion is performed on every step, information exchanged for model integration. Interrelationships between parameters from different models is not required in analytical form. We are investigating how different number of clusters, affects zonation and spatial distribution of parameters. In our study optimization in fuzzy c-means clustering (for magnetotelluric and seismic data) is compared for two cases, firstly alternating optimization and then hybrid method (alternating optimization+ Quasi-Newton method). Acknowledgment: This work is supported by FCT Portugal

  17. Looking at large data sets using binned data plots

    Energy Technology Data Exchange (ETDEWEB)

    Carr, D.B.

    1990-04-01

    This report addresses the monumental challenge of developing exploratory analysis methods for large data sets. The goals of the report are to increase awareness of large data sets problems and to contribute simple graphical methods that address some of the problems. The graphical methods focus on two- and three-dimensional data and common task such as finding outliers and tail structure, assessing central structure and comparing central structures. The methods handle large sample size problems through binning, incorporate information from statistical models and adapt image processing algorithms. Examples demonstrate the application of methods to a variety of publicly available large data sets. The most novel application addresses the too many plots to examine'' problem by using cognostics, computer guiding diagnostics, to prioritize plots. The particular application prioritizes views of computational fluid dynamics solution sets on the fly. That is, as each time step of a solution set is generated on a parallel processor the cognostics algorithms assess virtual plots based on the previous time step. Work in such areas is in its infancy and the examples suggest numerous challenges that remain. 35 refs., 15 figs.

  18. Designing minimum data sets of health smart card system

    Directory of Open Access Journals (Sweden)

    Mohtaram Nematollahi

    2014-10-01

    Full Text Available Introduction: Nowadays different countries benefit from health system based on health cards and projects related to smart cards. Lack of facilities which cover this technology is obvious in our society. This paper aims to design Minimum Data Sets of Health Smart Card System for Iran. Method: This research was an applied descriptive study. At first, we reviewed the same projects and guidelines of selected countries and the proposed model was designed in accordance to the country’s needs, taking people’s attitude about it by Delphi technique. A data analysis in study stage of MDS(Minimum Data Sets of Health Smart Card in the selective countries was done by comparative tables and determination of similarities and differences of the MDS. In the stage of gaining credit for model, it was accomplished with descriptive statistics to the extent of absolute and relative frequency through SPSS (version 16. Results: MDS of Health Smart Card for Iran is presented in the patient’s card and health provider’s card on basisof studiesin America, Australia, Turkey and Belgium and needs of our country and after doing Delphi technique with 94 percent agreement confirmed. Conclusion: Minimum Data Sets of Health Smart Card provides continuous care for patients and communication among providers. So, it causes a decrease in the complications of threatening diseases. Collection of MDS of diseases increases the quality of care assessment

  19. Prairie Chicken

    Data.gov (United States)

    Kansas Data Access and Support Center — An outline of the general range occupied by greayter and lesser prairie chickens. The range was delineated by expert opinion, then varified by local wildlife...

  20. Prestack depth migration applied to deep seismic data sets

    Science.gov (United States)

    Yoon, M.; Buske, S.; Lüth, S.; Shapiro, S.; Wigger, P.

    2003-04-01

    We present the results of Kirchhoff prestack depth migration applied to two onshore deep seismic reflection data sets (ANCORP'96 and PRECORP'95). The prestack depth migration was implemented in 3D (ANCORP) and in 2D (PRECORP), respectively, from topography. The 3D velocity model was obtained by extending a 2D velocity model received from refraction data analysis. The traveltime calculation was performed using a finite difference eikonal solver. An additional "offline stacking" provided a final 370 km long 2D depth section of the ANCORP data set. The migration procedure of the PRECORP data set consisted of three steps: First, early arrivals (0-15 s TWT) were processed. Second, later arrivals (15-40 s TWT) were passed to migration . Finally, both depth sections have been stacked and yielded the final 100 km deep subsurface image. In this paper a 180 km long part of the ANCORP section and a 110 km long PRECORP depth profile are presented. In comparison to earlier processing results (ANCORP working group, 1999; 2002) the prestack depth images contain new aspects. The final 2D ANCORP section shows a sharpened image of the oceanic crust. Except for some areas a nearly complete image of the Nazca reflector is present in both data sets between depths of 60 - 90 km. The compilation with local earthquake data shows that the seismogenic zone coincides with the upper reflector of the oceanic crust, but not with the Nazca reflector at depths larger than 80 km. The final depth sections contain two prominent features, the Quebrada Blanca Bright Spot (QBBS, ANCORP) and the Calama Bright Spot (CBS, PRECORP) located 160 km further to the south. Besides the west dip of the QBBS a 3D analysis of the ANCORP data set shows an additional north-dipping trend of the QBBS. Furthermore, the CBS is discovered for the first time. ANCORP Working Group (1999) Seismic reflection image revealing offset of Andean subduction-zone earthquake locations into oceanic mantle. Nature, 397:341--344. ANCORP

  1. Data Set for Pathology Reporting of Cutaneous Invasive Melanoma

    Science.gov (United States)

    Judge, Meagan J.; Evans, Alan; Frishberg, David P.; Prieto, Victor G.; Thompson, John F.; Trotter, Martin J.; Walsh, Maureen Y.; Walsh, Noreen M.G.; Ellis, David W.

    2013-01-01

    An accurate and complete pathology report is critical for the optimal management of cutaneous melanoma patients. Protocols for the pathologic reporting of melanoma have been independently developed by the Royal College of Pathologists of Australasia (RCPA), Royal College of Pathologists (United Kingdom) (RCPath), and College of American Pathologists (CAP). In this study, data sets, checklists, and structured reporting protocols for pathologic examination and reporting of cutaneous melanoma were analyzed by an international panel of melanoma pathologists and clinicians with the aim of developing a common, internationally agreed upon, evidence-based data set. The International Collaboration on Cancer Reporting cutaneous melanoma expert review panel analyzed the existing RCPA, RCPath, and CAP data sets to develop a protocol containing “required” (mandatory/core) and “recommended” (nonmandatory/noncore) elements. Required elements were defined as those that had agreed evidentiary support at National Health and Medical Research Council level III-2 level of evidence or above and that were unanimously agreed upon by the review panel to be essential for the clinical management, staging, or assessment of the prognosis of melanoma or fundamental for pathologic diagnosis. Recommended elements were those considered to be clinically important and recommended for good practice but with lesser degrees of supportive evidence. Sixteen core/required data elements for cutaneous melanoma pathology reports were defined (with an additional 4 core/required elements for specimens received with lymph nodes). Eighteen additional data elements with a lesser level of evidentiary support were included in the recommended data set. Consensus response values (permitted responses) were formulated for each data item. Development and agreement of this evidence-based protocol at an international level was accomplished in a timely and efficient manner, and the processes described herein may

  2. Under-utilized Important Data Sets from Barrow, Alaska

    Science.gov (United States)

    Jensen, A. M.; Misarti, N.

    2012-12-01

    The Barrow region has a number of high resolution data sets of high quality and high scientific and stakeholder relevance. Many are described as being of long duration, yet span mere decades. Here we highlight the fact that there are data sets available in the Barrow area that span considerably greater periods of time (centuries to millennia), at varying degrees of resolution. When used appropriately, these data sets can contribute to the study and understanding of the changing Arctic. However, because these types of data are generally acquired as part of archaeological projects, funded through Arctic Social Science and similar programs, their use in other sciences has been limited. Archaeologists focus on analyzing these data sets in ways designed to answer particular anthropological questions. That in no way precludes archaeological collaboration with other types of scientists nor the analysis of these data sets in new and innovative ways, in order to look at questions of Arctic change over a time span beginning well before the Industrial Revolution introduced complicating factors. One major data group consists of zooarchaeological data from sites in the Barrow area. This consists of faunal remains of human subsistence activities, recovered either from middens (refuse deposits) or dwellings. In effect, occupants of a site were sampling their environment as it existed at the time of occupation, although not in a random or systematic way. When analyzed to correct for biases introduced by taphonomic and human behavioral factors, such data sets are used by archaeologists to understand past people's subsistence practices, and how such practices changed through time. However, there is much additional information that can be obtained from these collections. Certain species have fairly specific habitat requirements, and their presence in significant numbers at a site indicates that such conditions existed relatively nearby at a particular time in the past, and

  3. Using sparse photometric data sets for asteroid lightcurve studies

    Science.gov (United States)

    Warner, Brian D.; Harris, Alan W.

    2011-12-01

    With the advent of wide-field imagers, it has become possible to conduct a photometric lightcurve survey of many asteroids simultaneously, either for that single purpose (e.g., Dermawan, B., Nakamura, T., Yoshida, F. [2011]. Publ. Astron. Soc. Japan 63, S555-S576; Masiero, J., Jedicke, R., Ďurech, J., Gwyn, S., Denneau, L., Larsen, J. [2009]. Icarus 204, 145-171), or as a part of a multipurpose survey (e.g., Pan-STARRS, LSST). Such surveys promise to yield photometric data for many thousands of asteroids, but these data sets will be “sparse” compared to most of those taken in a “targeted” mode directed to one asteroid at a time. We consider the potential limitations of sparse data sets using different sampling rates with respect to specific research questions that might be addressed with lightcurve data. For our study we created synthetic sparse data sets similar to those from wide-field surveys by generating more than 380,000 individual lightcurves that were combined into more than 47,000 composite lightcurves. The variables in generating the data included the number of observations per night, number of nights, noise, and the intervals between observations and nights, in addition to periods ranging from 0.1 to 400 h and amplitudes ranging from 0.1 to 2.0 mag. A Fourier analysis pipeline was used to find the period for each composite lightcurve and then review the derived period and period spectrum to gauge how well an automated analysis of sparse data sets would perform in finding the true period. For this part of the analysis, a normally distributed noise level of 0.03 mag was added to the data, regardless of amplitude, thus simulating a relatively high SNR for the observations. For the second part of the analysis, a smaller set of composite curves was generated with fixed core parameters of eight observations per night, 8 nights within a 14-day span, periods ranging from 2 to 6 h, and an amplitude of either 0.3 mag or 0.4 mag. Individual data sets using

  4. Results of LLNL investigation of NYCT data sets

    Energy Technology Data Exchange (ETDEWEB)

    Sale, K; Harrison, M; Guo, M; Groza, M

    2007-08-01

    Upon examination we have concluded that none of the alarms indicate the presence of a real threat. A brief history and results from our examination of the NYCT ASP occupancy data sets dated from 2007-05-14 19:11:07 to 2007-06-20 15:46:15 are presented in this letter report. When the ASP data collection campaign at NYCT was completed, rather than being shut down, the Canberra ASP annunciator box was unplugged leaving the data acquisition system running. By the time it was discovered that the ASP was still acquiring data about 15,000 occupancies had been recorded. Among these were about 500 alarms (classified by the ASP analysis system as either Threat Alarms or Suspect Alarms). At your request, these alarms have been investigated. Our conclusion is that none of the alarm data sets indicate the presence of a real threat (within statistics). The data sets (ICD1 and ICD2 files with concurrent JPEG pictures) were delivered to LLNL on a removable hard drive labeled FOUO. The contents of the data disk amounted to 53.39 GB of data requiring over two days for the standard LLNL virus checking software to scan before work could really get started. Our first step was to walk through the directory structure of the disk and create a database of occupancies. For each occupancy, the database was populated with the occupancy date and time, occupancy number, file path to the ICD1 data and the alarm ('No Alarm', 'Suspect Alarm' or 'Threat Alarm') from the ICD2 file along with some other incidental data. In an attempt to get a global understanding of what was going on, we investigated the occupancy information. The occupancy date/time and alarm type were binned into one-hour counts. These data are shown in Figures 1 and 2.

  5. Genome display tool: visualizing features in complex data sets

    Directory of Open Access Journals (Sweden)

    Lu Yue

    2007-02-01

    Full Text Available Abstract Background The enormity of the information contained in large data sets makes it difficult to develop intuitive understanding. It would be useful to have software that allows visualization of possible correlations between properties that can be associated with a core data set. In the case of bacterial genomes, existing visualization tools focus on either global properties such as variations in composition or detailed local displays of the features that comprise the annotation. It is not easy to visualize other information in the context of this core information. Results A Java based software known as the Genome Display Tool (GDT, allows the user to simultaneously view the distribution of multiple attributes pertaining to genes and intragenic regions in a single bacterial genome using different colours and shapes on a single screen. The display represents each gene by small boxes that correlate with physical position in the genome. The size of the boxes is dynamically allocated based on the number of genes and a zoom feature allows close-up inspection of regions of interest. The display is interfaced with a MS-Access relational database and can display any feature in the database that can be represented by discrete values. Data is readily added to the database from an MS-Excel spread sheet. The functionality of GDT is demonstrated by comparing the results of two predictions of recent horizontal transfer events in the genome of Synechocystis PCC-6803. The resulting display allows the user to immediately see how much agreement exists between the two methods and also visualize how genes in various categories (e.g. predicted in both methods, one method etc are distributed in the genome. Conclusion The GDT software provides the user with a powerful tool that allows development of an intuitive understanding of the relative distribution of features in a large data set. As additional features are added to the data set, the number of possible

  6. Cleaning the GenBank Arabidopsis thaliana data set

    DEFF Research Database (Denmark)

    Korning, Peter G.; Hebsgaard, Stefan M.; Rouze, Pierre;

    1996-01-01

    extracted a data set from the A. thaliana entries in GenBank. A number of simple `sanity' checks, based on the nature of the data, revealed an alarmingly high error rate. More than 15% of the most important entries extracted did contain erroneous information. In addition, a number of entries had directly...... common. It is proposed that the level of error correction should be increased and that gene structure sanity checks should be incorporated - also at the submitter level - to avoid or reduce the problem in the future. A non-redundant and error corrected subset of the data for A. thaliana is made available...

  7. Lipoprotein subclasses in genetic studies: The Berkeley Data Set

    Energy Technology Data Exchange (ETDEWEB)

    Krauss, R.M.; Williams, P.T.; Blanche, P.J.; Cavanaugh, A.; Holl, L.G. [Lawrence Berkeley Lab., CA (United States); Austin, M.A. [Washington Univ., Seattle, WA (United States). Dept. of Epidemiology

    1992-10-01

    Data from the Berkeley Data Set was used to investigate familial correlations of HDL-subclasses. Analysis of the sibling intraclass correlation coefficient by HDL particle diameter showed that sibling HDL levels were significantly correlated for HDL{sub 2b}, HDL{sub 3a} and HDL{sub 3b} subclasses. The percentage of the offsprings` variance explained by their two parents. Our finding that parents and offspring-have the highest correlation for HDL{sub 2b} is consistent with published reports that show higher heritability estimates for HDL{sub 2} compared with HDL{sub 3}{minus} cholesterol.

  8. Data Set for Emperical Validation of Double Skin Facade Model

    DEFF Research Database (Denmark)

    Kalyanova, Olena; Jensen, Rasmus Lund; Heiselberg, Per

    2008-01-01

    the International Energy Agency (IEA) Task 34 Annex 43. This paper describes the full-scale outdoor experimental test facility ‘the Cube', where the experiments were conducted, the experimental set-up and the measurements procedure for the data sets. The empirical data is composed for the key-functioning modes...... of a double skin facade: 1. External air curtain mode, it is the naturally ventilated DSF cavity with the top and bottom openings open to the outdoor; 2. Thermal insulation mode, when all of the DSF openings closed; 3. Preheating mode, with the bottom DSF openings open to the outdoor and top openings open...

  9. Gap filling strategies for long term energy flux data sets

    DEFF Research Database (Denmark)

    Falge, E.; Baldocchi, D.; Olson, R.

    2001-01-01

    At present a network of over 100 field sites are measuring carbon dioxide, water vapor and sensible heat fluxes between the biosphere and atmosphere, on a nearly continuous basis. Gaps in the long term measurements of evaporation and sensible heat flux must be filled before these data can be used...... for hydrological and meteorological applications. We adapted methods of gap filling for NEE (net ecosystem exchange of carbon) to energy fluxes and applied them to data sets available from the EUROFLUX and AmeriFlux eddy covariance databases. The average data coverage for the sites selected was 69% and 75...

  10. STEME: a robust, accurate motif finder for large data sets.

    Directory of Open Access Journals (Sweden)

    John E Reid

    Full Text Available Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME to the EM algorithm that is at the core of many motif finders such as MEME. This approximation allows the EM algorithm to be applied to large data sets. In this work we describe several efficient extensions to STEME that are based on the MEME algorithm. Together with the original STEME EM approximation, these extensions make STEME a fully-fledged motif finder with similar properties to MEME. We discuss the difficulty of objectively comparing motif finders. We show that STEME performs comparably to existing prominent discriminative motif finders, DREME and Trawler, on 13 sets of transcription factor binding data in mouse ES cells. We demonstrate the ability of STEME to find long degenerate motifs which these discriminative motif finders do not find. As part of our method, we extend an earlier method due to Nagarajan et al. for the efficient calculation of motif E-values. STEME's source code is available under an open source license and STEME is available via a web interface.

  11. A public data set of human balance evaluations

    Directory of Open Access Journals (Sweden)

    Damiana A. Santos

    2016-11-01

    Full Text Available The goal of this study was to create a public data set with results of qualitative and quantitative evaluations related to human balance. Subject’s balance was evaluated by posturography using a force platform and by the Mini Balance Evaluation Systems Tests. In the posturography test, we evaluated subjects standing still for 60 s in four different conditions where vision and the standing surface were manipulated: on a rigid surface with eyes open; on a rigid surface with eyes closed; on an unstable surface with eyes open; on an unstable surface with eyes closed. Each condition was performed three times and the order of the conditions was randomized. In addition, the following tests were employed in order to better characterize each subject: Short Falls Efficacy Scale International; International Physical Activity Questionnaire Short Version; and Trail Making Test. The subjects were also interviewed to collect information about their socio-cultural, demographic, and health characteristics. The data set comprises signals from the force platform (raw data for the force, moments of forces, and centers of pressure of 163 subjects plus one file with information about the subjects and balance conditions and the results of the other evaluations. All the data is available at PhysioNet and at Figshare.

  12. Sodankylä ionospheric tomography data set 2003-2014

    Science.gov (United States)

    Norberg, Johannes; Roininen, Lassi; Kero, Antti; Raita, Tero; Ulich, Thomas; Markkanen, Markku; Juusola, Liisa; Kauristie, Kirsti

    2016-07-01

    Sodankylä Geophysical Observatory has been operating a receiver network for ionospheric tomography and collecting the produced data since 2003. The collected data set consists of phase difference curves measured from COSMOS navigation satellites from the Russian Parus network (Wood and Perry, 1980) and tomographic electron density reconstructions obtained from these measurements. In this study vertical total electron content (VTEC) values are integrated from the reconstructed electron densities to make a qualitative and quantitative analysis to validate the long-term performance of the tomographic system. During the observation period, 2003-2014, there were three to five operational stations at the Fennoscandia sector. Altogether the analysis consists of around 66 000 overflights, but to ensure the quality of the reconstructions, the examination is limited to cases with descending (north to south) overflights and maximum elevation over 60°. These constraints limit the number of overflights to around 10 000. Based on this data set, one solar cycle of ionospheric VTEC estimates is constructed. The measurements are compared against the International Reference Ionosphere (IRI)-2012 model, F10.7 solar flux index and sunspot number data. Qualitatively the tomographic VTEC estimate corresponds to reference data very well, but the IRI-2012 model results are on average 40 % higher than that of the tomographic results.

  13. Towards a Framework for Change Detection in Data Sets

    Science.gov (United States)

    Böttcher, Mirko; Nauck, Detlef; Ruta, Dymitr; Spott, Martin

    Since the world with its markets, innovations and customers is changing faster than ever before, the key to survival for businesses is the ability to detect, assess and respond to changing conditions rapidly and intelligently. Discovering changes and reacting to or acting upon them before others do has therefore become a strategical issue for many companies. However, existing data analysis techniques are insufflent for this task since they typically assume that the domain under consideration is stable over time. This paper presents a framework that detects changes within a data set at virtually any level of granularity. The underlying idea is to derive a rule-based description of the data set at different points in time and to subsequently analyse how these rules change. Nevertheless, further techniques are required to assist the data analyst in interpreting and assessing their changes. Therefore the framework also contains methods to discard rules that are non-drivers for change and to assess the interestingness of detected changes.

  14. Identification of noise in linear data sets by factor analysis

    Energy Technology Data Exchange (ETDEWEB)

    Roscoe, B.A.; Hopke, P.K.

    1981-01-01

    The approach to classical factor analysis described in this paper, i.e. doing the analysis for varying numbers of factors without prior assumptions to the number of factors, prevents one from getting eroneous results by inherent computer code assumptions. Identification of a factor containing most of the variance of one variable with little variance of other variables, pinpoints a possible difficulty in the data, if the singularity has no obvious physical significance. Examination of the factor scores will determine whether the problem is isolated to a few samples or over all the samples. Having this information, one may then go back to the raw data and take the appropriate corrective action. Classical factor analysis has the ability to identify several types of errors in data after it has been generated. It is then ideally suited for scanning large data sets. The ease of the identification technique makes it a beneficial tool to use before reduction and analysis of large data sets and should, in the long run, save time and effort.

  15. Association Analysis for Visual Exploration of Multivariate Scientific Data Sets.

    Science.gov (United States)

    Liu, Xiaotong; Shen, Han-Wei

    2016-01-01

    The heterogeneity and complexity of multivariate characteristics poses a unique challenge to visual exploration of multivariate scientific data sets, as it requires investigating the usually hidden associations between different variables and specific scalar values to understand the data's multi-faceted properties. In this paper, we present a novel association analysis method that guides visual exploration of scalar-level associations in the multivariate context. We model the directional interactions between scalars of different variables as information flows based on association rules. We introduce the concepts of informativeness and uniqueness to describe how information flows between scalars of different variables and how they are associated with each other in the multivariate domain. Based on scalar-level associations represented by a probabilistic association graph, we propose the Multi-Scalar Informativeness-Uniqueness (MSIU) algorithm to evaluate the informativeness and uniqueness of scalars. We present an exploration framework with multiple interactive views to explore the scalars of interest with confident associations in the multivariate spatial domain, and provide guidelines for visual exploration using our framework. We demonstrate the effectiveness and usefulness of our approach through case studies using three representative multivariate scientific data sets.

  16. Querying Large Physics Data Sets Over an Information Grid

    Institute of Scientific and Technical Information of China (English)

    NigelBaker; ZsoltKovacs; 等

    2001-01-01

    Optimising use of the Web(WWW) for LHC data analysis is a complex problem and illustrates the challenges arising from the integration of and computation across massive ampunts of information distributed worldwide.Finding the right piece of information can,at times,be extremely time-consuming,if not impossible,SO-called Grids have been proposed to facilitate LHC computing and many groups have embarked on studies of data replication,data migration and netwroking plhilosophies.Other aspects such as the role of moddleware' for Grids are emerging as requiring research.This paper positions the need for appropriate middleware that enables users to resolve physics queries across massive data sets.It identifies the role of meta-data for query resolution and the importance of Information Grids for high-energy physics analysis rather than just computational or Data Grids,This paper identifies software that is being implemented at CERN to enable the querying of very large collaborating HEP data-sets,initially being employed for the construction of CMS detectors.

  17. NASIS data base management system: IBM 360 TSS implementation. Volume 3: Data set specifications

    Science.gov (United States)

    1973-01-01

    The data set specifications for the NASA Aerospace Safety Information System (NASIS) are presented. The data set specifications describe the content, format, and medium of communication of every data set required by the system. All relevant information pertinent to a particular data set is prepared in a standard form and centralized in a single document. The format for the data set is provided.

  18. Techniques for Efficiently Managing Large Geosciences Data Sets

    Science.gov (United States)

    Kruger, A.; Krajewski, W. F.; Bradley, A. A.; Smith, J. A.; Baeck, M. L.; Steiner, M.; Lawrence, R. E.; Ramamurthy, M. K.; Weber, J.; Delgreco, S. A.; Domaszczynski, P.; Seo, B.; Gunyon, C. A.

    2007-12-01

    We have developed techniques and software tools for efficiently managing large geosciences data sets. While the techniques were developed as part of an NSF-Funded ITR project that focuses on making NEXRAD weather data and rainfall products available to hydrologists and other scientists, they are relevant to other geosciences disciplines that deal with large data sets. Metadata, relational databases, data compression, and networking are central to our methodology. Data and derived products are stored on file servers in a compressed format. URLs to, and metadata about the data and derived products are managed in a PostgreSQL database. Virtually all access to the data and products is through this database. Geosciences data normally require a number of processing steps to transform the raw data into useful products: data quality assurance, coordinate transformations and georeferencing, applying calibration information, and many more. We have developed the concept of crawlers that manage this scientific workflow. Crawlers are unattended processes that run indefinitely, and at set intervals query the database for their next assignment. A database table functions as a roster for the crawlers. Crawlers perform well-defined tasks that are, except for perhaps sequencing, largely independent from other crawlers. Once a crawler is done with its current assignment, it updates the database roster table, and gets its next assignment by querying the database. We have developed a library that enables one to quickly add crawlers. The library provides hooks to external (i.e., C-language) compiled codes, so that developers can work and contribute independently. Processes called ingesters inject data into the system. The bulk of the data are from a real-time feed using UCAR/Unidata's IDD/LDM software. An exciting recent development is the establishment of a Unidata HYDRO feed that feeds value-added metadata over the IDD/LDM. Ingesters grab the metadata and populate the Postgre

  19. NCBI GEO: archive for functional genomics data sets--update.

    Science.gov (United States)

    Barrett, Tanya; Wilhite, Stephen E; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Holko, Michelle; Yefanov, Andrey; Lee, Hyeseung; Zhang, Naigong; Robertson, Cynthia L; Serova, Nadezhda; Davis, Sean; Soboleva, Alexandra

    2013-01-01

    The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.

  20. Mining large heterogeneous data sets in drug discovery.

    Science.gov (United States)

    Wild, David J

    2009-10-01

    Increasingly, effective drug discovery involves the searching and data mining of large volumes of information from many sources covering the domains of chemistry, biology and pharmacology amongst others. This has led to a proliferation of databases and data sources relevant to drug discovery. This paper provides a review of the publicly-available large-scale databases relevant to drug discovery, describes the kinds of data mining approaches that can be applied to them and discusses recent work in integrative data mining that looks for associations that pan multiple sources, including the use of Semantic Web techniques. The future of mining large data sets for drug discovery requires intelligent, semantic aggregation of information from all of the data sources described in this review, along with the application of advanced methods such as intelligent agents and inference engines in client applications.

  1. BICEP2. II. Experiment and three-year data set

    Energy Technology Data Exchange (ETDEWEB)

    Ade, P. A. R. [School of Physics and Astronomy, Cardiff University, Cardiff, CF24 3AA (United Kingdom); Aikin, R. W.; Bock, J. J.; Brevik, J. A.; Filippini, J. P.; Golwala, S. R.; Hildebrandt, S. R. [Department of Physics, California Institute of Technology, Pasadena, CA 91125 (United States); Amiri, M.; Davis, G.; Halpern, M.; Hasselfield, M. [Department of Physics and Astronomy, University of British Columbia, Vancouver, BC (Canada); Barkats, D. [Joint ALMA Observatory, ESO, Santiago (Chile); Benton, S. J. [Department of Physics, University of Toronto, Toronto, ON (Canada); Bischoff, C. A.; Buder, I. [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street MS 42, Cambridge, MA 02138 (United States); Bullock, E. [Minnesota Institute for Astrophysics, University of Minnesota, Minneapolis, MN 55455 (United States); Day, P. K.; Dowell, C. D. [Jet Propulsion Laboratory, Pasadena, CA 91109 (United States); Duband, L. [Université Grenoble Alpes, CEA INAC-SBT, F-38000 Grenoble (France); Fliescher, S., E-mail: ogburn@stanford.edu [Department of Physics, University of Minnesota, Minneapolis, MN 55455 (United States); Collaboration: Bicep2 Collaboration; and others

    2014-09-01

    We report on the design and performance of the BICEP2 instrument and on its three-year data set. BICEP2 was designed to measure the polarization of the cosmic microwave background (CMB) on angular scales of 1°-5°(ℓ = 40-200), near the expected peak of the B-mode polarization signature of primordial gravitational waves from cosmic inflation. Measuring B-modes requires dramatic improvements in sensitivity combined with exquisite control of systematics. The BICEP2 telescope observed from the South Pole with a 26 cm aperture and cold, on-axis, refractive optics. BICEP2 also adopted a new detector design in which beam-defining slot antenna arrays couple to transition-edge sensor (TES) bolometers, all fabricated on a common substrate. The antenna-coupled TES detectors supported scalable fabrication and multiplexed readout that allowed BICEP2 to achieve a high detector count of 500 bolometers at 150 GHz, giving unprecedented sensitivity to B-modes at degree angular scales. After optimization of detector and readout parameters, BICEP2 achieved an instrument noise-equivalent temperature of 15.8 μK√s. The full data set reached Stokes Q and U map depths of 87.2 nK in square-degree pixels (5.'2 μK) over an effective area of 384 deg{sup 2} within a 1000 deg{sup 2} field. These are the deepest CMB polarization maps at degree angular scales to date. The power spectrum analysis presented in a companion paper has resulted in a significant detection of B-mode polarization at degree scales.

  2. Physics Mining of Multi-Source Data Sets

    Science.gov (United States)

    Helly, John; Karimabadi, Homa; Sipes, Tamara

    2012-01-01

    Powerful new parallel data mining algorithms can produce diagnostic and prognostic numerical models and analyses from observational data. These techniques yield higher-resolution measures than ever before of environmental parameters by fusing synoptic imagery and time-series measurements. These techniques are general and relevant to observational data, including raster, vector, and scalar, and can be applied in all Earth- and environmental science domains. Because they can be highly automated and are parallel, they scale to large spatial domains and are well suited to change and gap detection. This makes it possible to analyze spatial and temporal gaps in information, and facilitates within-mission replanning to optimize the allocation of observational resources. The basis of the innovation is the extension of a recently developed set of algorithms packaged into MineTool to multi-variate time-series data. MineTool is unique in that it automates the various steps of the data mining process, thus making it amenable to autonomous analysis of large data sets. Unlike techniques such as Artificial Neural Nets, which yield a blackbox solution, MineTool's outcome is always an analytical model in parametric form that expresses the output in terms of the input variables. This has the advantage that the derived equation can then be used to gain insight into the physical relevance and relative importance of the parameters and coefficients in the model. This is referred to as physics-mining of data. The capabilities of MineTool are extended to include both supervised and unsupervised algorithms, handle multi-type data sets, and parallelize it.

  3. Northern California Earthquake Data Center: Data Sets and Data Services

    Science.gov (United States)

    Neuhauser, D. S.; Allen, R. M.; Zuzlewski, S.

    2015-12-01

    The Northern California Earthquake Data Center (NCEDC) provides a permanent archive and real-time data distribution services for a unique and comprehensive data set of seismological and geophysical data sets encompassing northern and central California. We provide access to over 85 terabytes of continuous and event-based time series data from broadband, short-period, strong motion, and strain sensors as well as continuous and campaign GPS data at both standard and high sample rates. The Northen California Seismic System (NCSS), operated by UC Berkeley and USGS Menlo Park, has recorded over 900,000 events from 1984 to the present, and the NCEDC serves catalog, parametric information, moment tensors and first motion mechanisms, and time series data for these events. We also serve event catalogs, parametric information, and event waveforms for DOE enhanced geothermal system monitoring in northern California and Nevada. The NCEDC provides a several ways for users to access these data. The most recent development are web services, which provide interactive, command-line, or program-based workflow access to data. Web services use well-established server and client protocols and RESTful software architecture that allow users to easily submit queries and receive the requested data in real-time rather than through batch or email-based requests. Data are returned to the user in the appropriate format such as XML, RESP, simple text, or MiniSEED depending on the service and selected output format. The NCEDC supports all FDSN-defined web services as well as a number of IRIS-defined and NCEDC-defined services. We also continue to support older email-based and browser-based access to data. NCEDC data and web services can be found at http://www.ncedc.org and http://service.ncedc.org.

  4. Hydraulic Travel Time Tomography Appraisal Using Synthetic Data Sets

    Science.gov (United States)

    Brauchler, R.; Cheng, J.; Dietrich, P.; Everett, M.; Johnson, B.

    2003-12-01

    Hydraulic tomography is an aquifer characterization method allowing the two and three dimensional spatial identification of hydraulic properties in the subsurface. Such information is essential for rigorous analysis of a variety of engineering, geotechnical and hydrogeological problems within the context of water resources management. We propose a tomographic approach providing the inversion of travel times of multiwell slug tests. The inversion is based on the relation between the travel times of a recorded transient pressure curve and the diffusivity of the geological medium. Usually, just one value of a measured hydraulic signal, mostly the peak time, is used as the data for the inversion in order to reconstruct the diffusivity field of the investigated system. This situation is not satisfying because much information is lost. Therefore, we have developed a transformation factor allowing to apply our approach to several travel times characterizing each signal. Thereby, each travel time is inverted separately. The main focus is to appraise the influence of the various travel times on the inversion results. It can be assumed that early travel times are dominated by preferential flow along fast, high permeability paths, while the inversion results based on late travel times reflect an integration of the received signal over many flow paths. Synthetic data sets were created using the Finite Element Heat and Mass Transfer Code (FEHM) from Los Alamos National Laboratory. The data base of the inversion comprises simulated slug tests, in which the position of the sources (injection ports) and the receivers (observation ports) isolated with packers, are varied between the tests. We also investigate the effects of input parameters such as the number of source-receiver positions used, borehole storage and permeability distribution. The hydraulic tomography appraisal shows a strong dependence of the inversion results on the used travel times and input parameters. Results of

  5. A Standardized Reference Data Set for Vertebrate Taxon Name Resolution.

    Directory of Open Access Journals (Sweden)

    Paula F Zermoglio

    Full Text Available Taxonomic names associated with digitized biocollections labels have flooded into repositories such as GBIF, iDigBio and VertNet. The names on these labels are often misspelled, out of date, or present other problems, as they were often captured only once during accessioning of specimens, or have a history of label changes without clear provenance. Before records are reliably usable in research, it is critical that these issues be addressed. However, still missing is an assessment of the scope of the problem, the effort needed to solve it, and a way to improve effectiveness of tools developed to aid the process. We present a carefully human-vetted analysis of 1000 verbatim scientific names taken at random from those published via the data aggregator VertNet, providing the first rigorously reviewed, reference validation data set. In addition to characterizing formatting problems, human vetting focused on detecting misspelling, synonymy, and the incorrect use of Darwin Core. Our results reveal a sobering view of the challenge ahead, as less than 47% of name strings were found to be currently valid. More optimistically, nearly 97% of name combinations could be resolved to a currently valid name, suggesting that computer-aided approaches may provide feasible means to improve digitized content. Finally, we associated names back to biocollections records and fit logistic models to test potential drivers of issues. A set of candidate variables (geographic region, year collected, higher-level clade, and the institutional digitally accessible data volume and their 2-way interactions all predict the probability of records having taxon name issues, based on model selection approaches. We strongly encourage further experiments to use this reference data set as a means to compare automated or computer-aided taxon name tools for their ability to resolve and improve the existing wealth of legacy data.

  6. A Standardized Reference Data Set for Vertebrate Taxon Name Resolution.

    Science.gov (United States)

    Zermoglio, Paula F; Guralnick, Robert P; Wieczorek, John R

    2016-01-01

    Taxonomic names associated with digitized biocollections labels have flooded into repositories such as GBIF, iDigBio and VertNet. The names on these labels are often misspelled, out of date, or present other problems, as they were often captured only once during accessioning of specimens, or have a history of label changes without clear provenance. Before records are reliably usable in research, it is critical that these issues be addressed. However, still missing is an assessment of the scope of the problem, the effort needed to solve it, and a way to improve effectiveness of tools developed to aid the process. We present a carefully human-vetted analysis of 1000 verbatim scientific names taken at random from those published via the data aggregator VertNet, providing the first rigorously reviewed, reference validation data set. In addition to characterizing formatting problems, human vetting focused on detecting misspelling, synonymy, and the incorrect use of Darwin Core. Our results reveal a sobering view of the challenge ahead, as less than 47% of name strings were found to be currently valid. More optimistically, nearly 97% of name combinations could be resolved to a currently valid name, suggesting that computer-aided approaches may provide feasible means to improve digitized content. Finally, we associated names back to biocollections records and fit logistic models to test potential drivers of issues. A set of candidate variables (geographic region, year collected, higher-level clade, and the institutional digitally accessible data volume) and their 2-way interactions all predict the probability of records having taxon name issues, based on model selection approaches. We strongly encourage further experiments to use this reference data set as a means to compare automated or computer-aided taxon name tools for their ability to resolve and improve the existing wealth of legacy data.

  7. STEME: efficient EM to find motifs in large data sets

    Science.gov (United States)

    Reid, John E.; Wernisch, Lorenz

    2011-01-01

    MEME and many other popular motif finders use the expectation–maximization (EM) algorithm to optimize their parameters. Unfortunately, the running time of EM is linear in the length of the input sequences. This can prohibit its application to data sets of the size commonly generated by high-throughput biological techniques. A suffix tree is a data structure that can efficiently index a set of sequences. We describe an algorithm, Suffix Tree EM for Motif Elicitation (STEME), that approximates EM using suffix trees. To the best of our knowledge, this is the first application of suffix trees to EM. We provide an analysis of the expected running time of the algorithm and demonstrate that STEME runs an order of magnitude more quickly than the implementation of EM used by MEME. We give theoretical bounds for the quality of the approximation and show that, in practice, the approximation has a negligible effect on the outcome. We provide an open source implementation of the algorithm that we hope will be used to speed up existing and future motif search algorithms. PMID:21785132

  8. MetaPhinder-Identifying Bacteriophage Sequences in Metagenomic Data Sets.

    Science.gov (United States)

    Jurtz, Vanessa Isabell; Villarroel, Julia; Lund, Ole; Voldby Larsen, Mette; Nielsen, Morten

    Bacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder.

  9. Fate modelling of chemical compounds with incomplete data sets

    DEFF Research Database (Denmark)

    Birkved, Morten; Heijungs, Reinout

    2011-01-01

    , and to provide simplified proxies for the more complicated “real”model relationships. In the presented study two approaches for the reduction of the data demand associated with characterization of chemical emissions in USEtoxTM are tested: The first approach yields a simplified set of mode of entry specific meta......-models with a data demand of app. 63 % (5/8) of the USEtoxTM characterization model. The second yields a simplified set of mode of entry specific meta-models with a data demand of 75 % (6/8) of the original model. The results of the study indicate that it is possible to simplify characterization models and lower...... the data demand of these models applying the presented approach. The results further indicate that the second approach relying on 75 % of the original data set provides the meta-model sets which best mimics the original model. An overall trend observed from the 75 % data demand meta-model sets...

  10. Web based visualization of large climate data sets

    Science.gov (United States)

    Alder, Jay R.; Hostetler, Steven W.

    2015-01-01

    We have implemented the USGS National Climate Change Viewer (NCCV), which is an easy-to-use web application that displays future projections from global climate models over the United States at the state, county and watershed scales. We incorporate the NASA NEX-DCP30 statistically downscaled temperature and precipitation for 30 global climate models being used in the Fifth Assessment Report (AR5) of the Intergovernmental Panel on Climate Change (IPCC), and hydrologic variables we simulated using a simple water-balance model. Our application summarizes very large, complex data sets at scales relevant to resource managers and citizens and makes climate-change projection information accessible to users of varying skill levels. Tens of terabytes of high-resolution climate and water-balance data are distilled to compact binary format summary files that are used in the application. To alleviate slow response times under high loads, we developed a map caching technique that reduces the time it takes to generate maps by several orders of magnitude. The reduced access time scales to >500 concurrent users. We provide code examples that demonstrate key aspects of data processing, data exporting/importing and the caching technique used in the NCCV.

  11. Evaluating topological conflict in centipede phylogeny using transcriptomic data sets.

    Science.gov (United States)

    Fernández, Rosa; Laumer, Christopher E; Vahtera, Varpu; Libro, Silvia; Kaluziak, Stefan; Sharma, Prashant P; Pérez-Porro, Alicia R; Edgecombe, Gregory D; Giribet, Gonzalo

    2014-06-01

    Relationships between the five extant orders of centipedes have been considered solved based on morphology. Phylogenies based on samples of up to a few dozen genes have largely been congruent with the morphological tree apart from an alternative placement of one order, the relictual Craterostigmomorpha, consisting of two species in Tasmania and New Zealand. To address this incongruence, novel transcriptomic data were generated to sample all five orders of centipedes and also used as a test case for studying gene-tree incongruence. Maximum likelihood and Bayesian mixture model analyses of a data set composed of 1,934 orthologs with 45% missing data, as well as the 389 orthologs in the least saturated, stationary quartile, retrieve strong support for a sister-group relationship between Craterostigmomorpha and all other pleurostigmophoran centipedes, of which the latter group is newly named Amalpighiata. The Amalpighiata hypothesis, which shows little gene-tree incongruence and is robust to the influence of among-taxon compositional heterogeneity, implies convergent evolution in several morphological and behavioral characters traditionally used in centipede phylogenetics, such as maternal brood care, but accords with patterns of first appearances in the fossil record.

  12. Using multiple data sets to populate probabilistic volcanic event trees

    Science.gov (United States)

    Newhall, C.G.; Pallister, John S.

    2014-01-01

    The key parameters one needs to forecast outcomes of volcanic unrest are hidden kilometers beneath the Earth’s surface, and volcanic systems are so complex that there will invariably be stochastic elements in the evolution of any unrest. Fortunately, there is sufficient regularity in behaviour that some, perhaps many, eruptions can be forecast with enough certainty for populations to be evacuated and kept safe. Volcanologists charged with forecasting eruptions must try to understand each volcanic system well enough that unrest can be interpreted in terms of pre-eruptive process, but must simultaneously recognize and convey uncertainties in their assessment. We have found that use of event trees helps to focus discussion, integrate data from multiple sources, reach consensus among scientists about both pre-eruptive process and uncertainties and, in some cases, to explain all of this to officials. Figure 1 shows a generic volcanic event tree from Newhall and Hoblitt (2002) that can be modified as needed for each specific volcano. This paper reviews how we and our colleagues have used such trees during a number of volcanic crises worldwide, for rapid hazard assessments in situations in which more formal expert elicitations could not be conducted. We describe how Multiple Data Sets can be used to estimate probabilities at each node and branch. We also present case histories of probability estimation during crises, how the estimates were used by public officials, and some suggestions for future improvements.

  13. Identifying Ionized Regions in Noisy Redshifted 21 cm Data Sets

    CERN Document Server

    Malloy, Matthew

    2012-01-01

    One of the most promising approaches for studying reionization is to use the redshifted 21 cm line. Early generations of redshifted 21 cm surveys will not, however, have the sensitivity to make detailed maps of the reionization process, and will instead focus on statistical measurements. Here we show that it may nonetheless be possible to {\\em directly identify ionized regions} in upcoming data sets by applying suitable filters to the noisy data. The locations of prominent minima in the filtered data correspond well with the positions of ionized regions. In particular, we corrupt semi-numeric simulations of the redshifted 21 cm signal during reionization with thermal noise at the level expected for a 500 antenna tile version of the Murchison Widefield Array (MWA), and mimic the degrading effects of foreground cleaning. Using a matched filter technique, we find that the MWA should be able to directly identify ionized regions despite the large thermal noise. In a plausible fiducial model in which ~20% of the vo...

  14. Fast generation of multiple resolution instances of raster data sets

    DEFF Research Database (Denmark)

    Arge, Lars; Haverkort, Herman; Tsirogiannis, Constantinos

    2012-01-01

    of this problem. Given a raster G of √N × √N cells we first consider the problem of computing for every 2 ≤ μ ≤ √N a raster Gμ of √N/μ × √N/μ cells such that each cell of Gμ stores the average of the values of μ×μ cells of G. We describe an algorithm that solves this problem in (N) time when the handled data fit...... in the main memory of the computer. We also provide two algorithms that solve this problem in external memory, that is when the input raster is larger than the main memory. The first external algorithm is very easy to implement and requires O(sort(N)) data block transfers from/to the external memory......In many GIS applications it is important to study the characteristics of a raster data set at multiple resolutions. Often this is done by generating several coarser resolution rasters from a fine resolution raster. In this paper we describe efficient algorithms for different variants...

  15. Multiscale Geometric Methods for Data Sets II: Geometric Wavelets

    CERN Document Server

    Allard, William K; Maggioni, Mauro

    2011-01-01

    Data sets are often modeled as point clouds in $R^D$, for $D$ large. It is often assumed that the data has some interesting low-dimensional structure, for example that of a $d$-dimensional manifold $M$, with $d$ much smaller than $D$. When $M$ is simply a linear subspace, one may exploit this assumption for encoding efficiently the data by projecting onto a dictionary of $d$ vectors in $R^D$ (for example found by SVD), at a cost $(n+D)d$ for $n$ data points. When $M$ is nonlinear, there are no "explicit" constructions of dictionaries that achieve a similar efficiency: typically one uses either random dictionaries, or dictionaries obtained by black-box optimization. In this paper we construct data-dependent multi-scale dictionaries that aim at efficient encoding and manipulating of the data. Their construction is fast, and so are the algorithms that map data points to dictionary coefficients and vice versa. In addition, data points are guaranteed to have a sparse representation in terms of the dictionary. We t...

  16. Three Dimensional (3D Lumbar Vertebrae Data Set

    Directory of Open Access Journals (Sweden)

    H. Bennani

    2016-08-01

    Full Text Available 3D modelling can be used for a variety of purposes, including biomedical modelling for orthopaedic or anatomical applications. Low back pain is prevalent in society yet few validated 3D models of the lumbar spine exist to facilitate assessment. We therefore created a 3D surface data set for lumbar vertebrae from human vertebrae. Models from 86 lumbar vertebrae were constructed using an inexpensive method involving image capture by digital camera and reconstruction of 3D models via an image-based technique. The reconstruction method was validated using a laser-based arm scanner and measurements derived from real vertebrae using electronic callipers. Results show a mean relative error of 5.2% between image-based models and real vertebrae, a mean relative error of 4.7% between image-based and arm scanning models and 95% of vertices’ errors are less than 3.5 millimetres with a median of 1.1 millimetres. The accuracy of the method indicates that the generated models could be useful for biomechanical modelling or 3D visualisation of the spine.

  17. How to cite an Earth science data set

    Science.gov (United States)

    Parsons, M. A.

    2011-12-01

    Creating a great data set can be a life's work (consider Charles Keeling). Yet, scientists do not receive much recognition for creating rigorous, useful data. At the same time, in a post "climategate" world there is increased scrutiny on science and a greater need than ever to adhere to scientific principles of transparency and repeatability. The Council of the American Geophysical Union (AGU) asserts that the scientific community should recognize the value of data collection, preparation, and description and that data "publications" should "be credited and cited like the products of any other scientific activity." Currently, however, authors rarely cite data formally in journal articles, and they often lack guidance on how data should be cited. The Federation of Earth Science Information Partners (ESIP) Preservation and Stewardship Cluster has been working this issue for some time now and has begun to address some of the challenges. Overall, scientists and data managers have a professional and ethical responsibility to do their best to meet the data publication goals asserted by AGU. This talk outlines a data citation approach to increase the credit and credibility of data producers.

  18. MetaPhinder—Identifying Bacteriophage Sequences in Metagenomic Data Sets

    Science.gov (United States)

    Villarroel, Julia; Lund, Ole; Voldby Larsen, Mette; Nielsen, Morten

    2016-01-01

    Bacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder. PMID:27684958

  19. An effective filter for IBD detection in large data sets.

    KAUST Repository

    Huang, Lin

    2014-03-25

    Identity by descent (IBD) inference is the task of computationally detecting genomic segments that are shared between individuals by means of common familial descent. Accurate IBD detection plays an important role in various genomic studies, ranging from mapping disease genes to exploring ancient population histories. The majority of recent work in the field has focused on improving the accuracy of inference, targeting shorter genomic segments that originate from a more ancient common ancestor. The accuracy of these methods, however, is achieved at the expense of high computational cost, resulting in a prohibitively long running time when applied to large cohorts. To enable the study of large cohorts, we introduce SpeeDB, a method that facilitates fast IBD detection in large unphased genotype data sets. Given a target individual and a database of individuals that potentially share IBD segments with the target, SpeeDB applies an efficient opposite-homozygous filter, which excludes chromosomal segments from the database that are highly unlikely to be IBD with the corresponding segments from the target individual. The remaining segments can then be evaluated by any IBD detection method of choice. When examining simulated individuals sharing 4 cM IBD regions, SpeeDB filtered out 99.5% of genomic regions from consideration while retaining 99% of the true IBD segments. Applying the SpeeDB filter prior to detecting IBD in simulated fourth cousins resulted in an overall running time that was 10,000x faster than inferring IBD without the filter and retained 99% of the true IBD segments in the output.

  20. Trans-dimensional Bayesian inference for large sequential data sets

    Science.gov (United States)

    Mandolesi, E.; Dettmer, J.; Dosso, S. E.; Holland, C. W.

    2015-12-01

    This work develops a sequential Monte Carlo method to infer seismic parameters of layered seabeds from large sequential reflection-coefficient data sets. The approach provides parameter estimates and uncertainties along survey tracks with the goal to aid in the detection of unexploded ordnance in shallow water. The sequential data are acquired by a moving platform with source and receiver array towed close to the seabed. This geometry requires consideration of spherical reflection coefficients, computed efficiently by massively parallel implementation of the Sommerfeld integral via Levin integration on a graphics processing unit. The seabed is parametrized with a trans-dimensional model to account for changes in the environment (i.e. changes in layering) along the track. The method combines advanced Markov chain Monte Carlo methods (annealing) with particle filtering (resampling). Since data from closely-spaced source transmissions (pings) often sample similar environments, the solution from one ping can be utilized to efficiently estimate the posterior for data from subsequent pings. Since reflection-coefficient data are highly informative, the likelihood function can be extremely peaked, resulting in little overlap between posteriors of adjacent pings. This is addressed by adding bridging distributions (via annealed importance sampling) between pings for more efficient transitions. The approach assumes the environment to be changing slowly enough to justify the local 1D parametrization. However, bridging allows rapid changes between pings to be addressed and we demonstrate the method to be stable in such situations. Results are in terms of trans-D parameter estimates and uncertainties along the track. The algorithm is examined for realistic simulated data along a track and applied to a dataset collected by an autonomous underwater vehicle on the Malta Plateau, Mediterranean Sea. [Work supported by the SERDP, DoD.

  1. International Spinal Cord Injury Female Sexual and Reproductive Function Basic Data Set

    DEFF Research Database (Denmark)

    Alexander, M S; Biering-Sørensen, F; Elliott, S

    2011-01-01

    To create the International Spinal Cord Injury (SCI) Female Sexual and Reproductive Function Basic Data Set within the International SCI Data Sets.......To create the International Spinal Cord Injury (SCI) Female Sexual and Reproductive Function Basic Data Set within the International SCI Data Sets....

  2. NASIS data base management system - IBM 360/370 OS MVT implementation. 3: Data set specifications

    Science.gov (United States)

    1973-01-01

    The data set specifications for the NASA Aerospace Safety Information System (NASIS) are presented. The data set specifications describe the content, format, and medium of communication of every data set required by the system. All relevant information pertinent to a particular set is prepared in a standard form and centralized in a single document. The format for the data set is provided.

  3. International spinal cord injury skin and thermoregulation function basic data set

    DEFF Research Database (Denmark)

    Karlsson, Annette; Krassioukov, A; Alexander, M S

    2012-01-01

    To create an international spinal cord injury (SCI) skin and thermoregulation basic data set within the framework of the International SCI Data Sets.......To create an international spinal cord injury (SCI) skin and thermoregulation basic data set within the framework of the International SCI Data Sets....

  4. Genetic diversity and conservation of South African indigenous chicken populations.

    Science.gov (United States)

    Mtileni, B J; Muchadeyi, F C; Maiwashe, A; Groeneveld, E; Groeneveld, L F; Dzama, K; Weigend, S

    2011-06-01

    In this study, we compare the level and distribution of genetic variation between South African conserved and village chicken populations using microsatellite markers. In addition, diversity in South African chickens was compared to that of a reference data set consisting of other African and purebred commercial lines. Three chicken populations Venda, Ovambo and Eastern Cape and four conserved flocks of the Venda, Ovambo, Naked Neck and Potchefstroom Koekoek from the Poultry Breeding Resource Unit of the Agricultural Research Council were genotyped at 29 autosomal microsatellite loci. All markers were polymorphic. Village chicken populations were more diverse than conservation flocks. structure software was used to cluster individuals to a predefined number of 2 ≤ K ≤ 6 clusters. The most probable clustering was found at K = 5 (95% identical runs). At this level of differentiation, the four conservation flocks separated as four independent clusters, while the three village chicken populations together formed another cluster. Thus, cluster analysis indicated a clear subdivision of each of the conservation flocks that were different from the three village chicken populations. The contribution of each South African chicken populations to the total diversity of the chickens studied was determined by calculating the optimal core set contributions based on Marker estimated kinship. Safe set analysis was carried out using bootstrapped kinship values calculated to relate the added genetic diversity of seven South African chicken populations to a set of reference populations consisting of other African and purebred commercial broiler and layer chickens. In both core set and the safe set analyses, village chicken populations scored slightly higher to the reference set compared to conservation flocks. Overall, the present study demonstrated that the conservation flocks of South African chickens displayed considerable genetic variability that is different from that of the

  5. Chicken Breast Paste

    Institute of Scientific and Technical Information of China (English)

    1994-01-01

    Ingredients: 50 grams of chicken breast, 150 grams of egg white, ham, cucumber and water chestnuts, 50 grams of starch, 50 grams of oil, salt and MSG. Directions: 1. Chop up the chicken breast and water chestnuts. Mix with egg white and starch into chicken breast paste. 2. Heat the oil for a moment and then place chicken paste in pot.

  6. My Chicken Adventure

    Institute of Scientific and Technical Information of China (English)

    DOROTHY; TECKLENBURG

    2006-01-01

    I am suffering from chicken envy. I'm determined to cook a chicken like the golden brown ones you buy in any Washington grocery store, those beautiful roasted chickens done on a revolving spit. Those chickens you take for granted because you can just waltz in at 6 p.m. and buy one for dinner.

  7. HMF sectors since 1926: Comparison of two ground-based data sets

    Science.gov (United States)

    Hiltula, T.; Mursula, K.

    In this paper, we compare two recent long-term data sets of daily HMF sector polarities since 1926 based on ground-based geomagnetic measurements: the combined data set by Echer and Svalgaard [Echer, E., Svalgaard, L. Asymmetry in the Rosenberg-Coleman effect around solar minimum revealed by wavelet analysis of the interplanetary magnetic field polarity data (1927-2002). Geophys. Res. Lett. 31, 12808, 2004] (ES data set) and a three-station data set derived by Vennerstroem et al. [Vennerstroem, S., Zieger, B., Friis-Christensen, E. An improved method of inferring interplanetary sector structure, 1905-present. J. Geophys. Res. 106 (15), 16011-16020, 2001] (VZF data set). The Rosenberg-Coleman rule is consistently valid in the ES data during the last 80 years, but fails in the VZF data set in the early cycles. There is a clear bias (T sector dominance) in the VZF data that is not observed in satellite measurements collected in the OMNI-2 data set, or in the ES data. Also, there is a difference on the success rates between the two sectors in the VZF data. Therefore, we conclude that the ES data set is more reliable, especially in cycles 16-18, in reproducing the HMF sector structure. Both data sets reproduce the southward shift of the heliospheric current sheet during the OMNI-2 interval. However, only the more reliable ES data set depicts this systematically also during the early cycles 16-18.

  8. International lower urinary tract function basic spinal cord injury data set

    DEFF Research Database (Denmark)

    Craggs, M.; Kennelly, M.; Schick, E.;

    2008-01-01

    OBJECTIVE: To create the International Lower Urinary Tract Function Basic Spinal Cord Injury (SCI) Data Set within the framework of the International SCI Data Sets. SETTING: International working group. METHODS: The draft of the Data Set was developed by a working group consisting of the members...... appointed by the International Continence Society, the European Association of Urology, the American Spinal Injury Association (ASIA), the International Spinal Cord Society (ISCoS) and a representative of the Executive Committee of the International SCI Standards and Data Sets. The final version of the Data...... Set was developed after review and comments by the members of the Executive Committee of the International SCI Standards and Data Sets, the ISCoS Scientific Committee, ASIA Board, relevant and interested (international) organizations and societies (around 40) and persons, and the ISCoS Council...

  9. The complete mitochondrial genome sequence of the Daweishan Mini chicken.

    Science.gov (United States)

    Yan, Ming-Li; Ding, Su-Ping; Ye, Shao-Hui; Wang, Chun-Guang; He, Bao-Li; Yuan, Zhi-Dong; Liu, Li-Li

    2016-01-01

    Daweishan Mini chicken is a valuable chicken breed in China. In this study, the complete mitochondrial genome sequence of Daweishan Mini chicken using PCR amplification, sequencing and assembling has been obtained for the first time. The total length of the mitochondrial genome was 16,785 bp, with the base composition of 30.26% A, 23.73% T, 32.51% C, 13.51% G. It contained 37 genes (2 ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes) and a major non-coding control region (D-loop region). The protein start codons are ATG, except for COX1 that begins with GTG. The complete mitochondrial genome sequence of Daweishan Mini chicken provides an important data set for further investigation on the phylogenetic relationships within Gallus gallus.

  10. Gap bootstrap methods for massive data sets with an application to transportation engineering

    OpenAIRE

    Lahiri, S.N.; Spiegelman, C.; Appiah, J.; Rilett, L.

    2013-01-01

    In this paper we describe two bootstrap methods for massive data sets. Naive applications of common resampling methodology are often impractical for massive data sets due to computational burden and due to complex patterns of inhomogeneity. In contrast, the proposed methods exploit certain structural properties of a large class of massive data sets to break up the original problem into a set of simpler subproblems, solve each subproblem separately where the data exhibit a...

  11. Concordance and predictive value of two adverse drug event data sets

    OpenAIRE

    Cami, Aurel; Reis, Ben Y

    2014-01-01

    Background: Accurate prediction of adverse drug events (ADEs) is an important means of controlling and reducing drug-related morbidity and mortality. Since no single “gold standard” ADE data set exists, a range of different drug safety data sets are currently used for developing ADE prediction models. There is a critical need to assess the degree of concordance between these various ADE data sets and to validate ADE prediction models against multiple reference standards. Methods: We systemati...

  12. EPA Enforcement and Compliance History Online: Water Discharge Monitoring Report Data Sets for FY2011

    Data.gov (United States)

    U.S. Environmental Protection Agency — Integrated Compliance Information System (ICIS) National Pollutant Discharge Elimination System (NPDES) Discharge Monitoring Report (DMR) data sets for Clean Water...

  13. EPA Enforcement and Compliance History Online: Water Discharge Monitoring Report Data Sets for FY2015

    Data.gov (United States)

    U.S. Environmental Protection Agency — Integrated Compliance Information System (ICIS) National Pollutant Discharge Elimination System (NPDES) Discharge Monitoring Report (DMR) data sets for Clean Water...

  14. EPA Enforcement and Compliance History Online: Water Discharge Monitoring Report Data Sets for FY2014

    Data.gov (United States)

    U.S. Environmental Protection Agency — Integrated Compliance Information System (ICIS) National Pollutant Discharge Elimination System (NPDES) Discharge Monitoring Report (DMR) data sets for Clean Water...

  15. EPA Enforcement and Compliance History Online: Water Discharge Monitoring Report Data Sets for FY2016

    Data.gov (United States)

    U.S. Environmental Protection Agency — Integrated Compliance Information System (ICIS) National Pollutant Discharge Elimination System (NPDES) Discharge Monitoring Report (DMR) data sets for Clean Water...

  16. EPA Enforcement and Compliance History Online: Water Discharge Monitoring Report Data Sets for FY2013

    Data.gov (United States)

    U.S. Environmental Protection Agency — Integrated Compliance Information System (ICIS) National Pollutant Discharge Elimination System (NPDES) Discharge Monitoring Report (DMR) data sets for Clean Water...

  17. EPA Enforcement and Compliance History Online: Water Discharge Monitoring Report Data Sets for FY2009

    Data.gov (United States)

    U.S. Environmental Protection Agency — Integrated Compliance Information System (ICIS) National Pollutant Discharge Elimination System (NPDES) Discharge Monitoring Report (DMR) data sets for Clean Water...

  18. EPA Enforcement and Compliance History Online: Water Discharge Monitoring Report Data Sets for FY2010

    Data.gov (United States)

    U.S. Environmental Protection Agency — Integrated Compliance Information System (ICIS) National Pollutant Discharge Elimination System (NPDES) Discharge Monitoring Report (DMR) data sets for Clean Water...

  19. Benchmarking Data Sets for the Evaluation of Virtual Ligand Screening Methods: Review and Perspectives.

    Science.gov (United States)

    Lagarde, Nathalie; Zagury, Jean-François; Montes, Matthieu

    2015-07-27

    Virtual screening methods are commonly used nowadays in drug discovery processes. However, to ensure their reliability, they have to be carefully evaluated. The evaluation of these methods is often realized in a retrospective way, notably by studying the enrichment of benchmarking data sets. To this purpose, numerous benchmarking data sets were developed over the years, and the resulting improvements led to the availability of high quality benchmarking data sets. However, some points still have to be considered in the selection of the active compounds, decoys, and protein structures to obtain optimal benchmarking data sets.

  20. EPA Enforcement and Compliance History Online: Water Discharge Monitoring Report Data Sets for FY2012

    Data.gov (United States)

    U.S. Environmental Protection Agency — Integrated Compliance Information System (ICIS) National Pollutant Discharge Elimination System (NPDES) Discharge Monitoring Report (DMR) data sets for Clean Water...

  1. Modeller subjectivity in estimating pesticide parameters for leaching models using the same laboratory data set

    NARCIS (Netherlands)

    Boesten, J.J.T.I.

    2000-01-01

    User-dependent subjectivity in the process of testing pesticide leaching models is relevant because it may result in wrong interpretation of model tests. About 20 modellers used the same data set to test pesticide leaching models (one or two models per modeller). The data set included laboratory stu

  2. The managerial pay structure : some tests on a Danish data set

    DEFF Research Database (Denmark)

    Eriksson, Tor Viking

    1997-01-01

    The purpose of this paper is to add to the small amount of empirical literature on managerial pay structure. I test several propositions of tournament models on a fairly rich data set. The data set is an unbalanced panel containing information about 2600 executives in 260 Danish firms (per year...

  3. Simpson's Paradox: A Data Set and Discrimination Case Study Exercise

    Science.gov (United States)

    Taylor, Stanley A.; Mickel, Amy E.

    2014-01-01

    In this article, we present a data set and case study exercise that can be used by educators to teach a range of statistical concepts including Simpson's paradox. The data set and case study are based on a real-life scenario where there was a claim of discrimination based on ethnicity. The exercise highlights the importance of performing…

  4. Modeller subjectivity in estimating pesticide parameters for leaching models using the same laboratory data set

    NARCIS (Netherlands)

    Boesten, J.J.T.I.

    2000-01-01

    User-dependent subjectivity in the process of testing pesticide leaching models is relevant because it may result in wrong interpretation of model tests. About 20 modellers used the same data set to test pesticide leaching models (one or two models per modeller). The data set included laboratory

  5. Developing a pressure ulcer risk factor minimum data set and risk assessment framework

    NARCIS (Netherlands)

    Coleman, S.; Nelson, E.A.; Keen, J.; Wilson, L.; McGinnis, E.; Dealey, C.; Stubbs, N.; Muir, D.; Farrin, A.; Dowding, D.; Schols, J.M.; Cuddigan, J.; Berlowitz, D.; Jude, E.; Vowden, P.; Bader, D.L.; Gefen, A.; Oomens, C.W.; Schoonhoven, L.; Nixon, J.

    2014-01-01

    AIM: To agree a draft pressure ulcer risk factor Minimum Data Set to underpin the development of a new evidenced-based Risk Assessment Framework. BACKGROUND: A recent systematic review identified the need for a pressure ulcer risk factor Minimum Data Set and development and validation of an evidence

  6. How to combine correlated data sets-A Bayesian hyperparameter matrix method

    Science.gov (United States)

    Ma, Y.-Z.; Berndsen, A.

    2014-07-01

    We construct a “hyperparameter matrix” statistical method for performing the joint analyses of multiple correlated astronomical data sets, in which the weights of data sets are determined by their own statistical properties. This method is a generalization of the hyperparameter method constructed by Lahav et al. (2000) and Hobson et al. (2002) which was designed to combine independent data sets. The advantage of our method is to treat correlations between multiple data sets and gives appropriate relevant weights of multiple data sets with mutual correlations. We define a new “element-wise” product, which greatly simplifies the likelihood function with hyperparameter matrix. We rigorously prove the simplified formula of the joint likelihood and show that it recovers the original hyperparameter method in the limit of no covariance between data sets. We then illustrate the method by applying it to a demonstrative toy model of fitting a straight line to two sets of data. We show that the hyperparameter matrix method can detect unaccounted systematic errors or underestimated errors in the data sets. Additionally, the ratio of Bayes' factors provides a distinct indicator of the necessity of including hyperparameters. Our example shows that the likelihood we construct for joint analyses of correlated data sets can be widely applied to many astrophysical systems.

  7. Simpson's Paradox: A Data Set and Discrimination Case Study Exercise

    Science.gov (United States)

    Taylor, Stanley A.; Mickel, Amy E.

    2014-01-01

    In this article, we present a data set and case study exercise that can be used by educators to teach a range of statistical concepts including Simpson's paradox. The data set and case study are based on a real-life scenario where there was a claim of discrimination based on ethnicity. The exercise highlights the importance of performing…

  8. Response Grids: Practical Ways to Display Large Data Sets with High Visual Impact

    Science.gov (United States)

    Gates, Simon

    2013-01-01

    Spreadsheets are useful for large data sets but they may be too wide or too long to print as conventional tables. Response grids offer solutions to the challenges posed by any large data set. They have wide application throughout science and for every subject and context where visual data displays are designed, within education and elsewhere.…

  9. A lower bound for the mass of axisymmetric connected black hole data sets

    CERN Document Server

    Chruściel, Piotr T

    2011-01-01

    We present a generalisation of the Brill-type proof of positivity of mass for axisymmetric initial data to initial data sets with black hole boundaries. The argument leads to a strictly positive lower bound for the mass of simply connected, connected axisymmetric black hole data sets in terms of the mass of a reference Schwarzschild metric.

  10. How to combine correlated data sets -- A Bayesian hyperparameter matrix method

    CERN Document Server

    Ma, Yin-Zhe

    2013-01-01

    We construct a statistical method for performing the joint analyses of multiple correlated astronomical data sets, in which the weights of data sets are determined by their own statistical properties. This method is a generalization of the hyperparameter method constructed by \\cite{Lahav00} and \\cite{Hobson02} which was designed to combine independent data sets. The hyperparameter matrix method we present here includes the relevant weights of multiple data sets and mutual correlations, and when the hyperparameters are marginalized over, the parameters of interest are recovered. We define a new "element-wise" product, which greatly simplifies the likelihood function with hyperparameter matrix. We rigorously prove the simplified formula of the joint likelihood and show that it recovers the original hyperparameter method in the limit of no covariance between data sets. We then illustrate the method by applying a classic model of fitting a straight line to two sets of data. We show that the hyperparameter matrix ...

  11. Issues and Considerations regarding Sharable Data Sets for Recommender Systems in Technology Enhanced Learning

    DEFF Research Database (Denmark)

    Drachsler, Hendrik; Bogers, Toine; Vuorikari, Riina

    2010-01-01

    This paper raises the issue of missing standardised data sets for recommender systems in Technology Enhanced Learning (TEL) that can be used as benchmarks to compare different recommendation approaches. It discusses how suitable data sets could be created according to some initial suggestions......, and investigates a number of steps that may be followed in order to develop reference data sets that will be adopted and reused within a scientific community. In addition, policies are discussed that are needed to enhance sharing of data sets by taking into account legal protection rights. Finally, an initial...... elaboration of a representation and exchange format for sharable TEL data sets is carried out. The paper concludes with future research needs....

  12. Quality Control and Peer Review of Data Sets: Mapping Data Archiving Processes to Data Publication Requirements

    Science.gov (United States)

    Mayernik, M. S.; Daniels, M.; Eaker, C.; Strand, G.; Williams, S. F.; Worley, S. J.

    2012-12-01

    Data sets exist within scientific research and knowledge networks as both technical and non-technical entities. Establishing the quality of data sets is a multi-faceted task that encompasses many automated and manual processes. Data sets have always been essential for science research, but now need to be more visible as first-class scholarly objects at national, international, and local levels. Many initiatives are establishing procedures to publish and curate data sets, as well as to promote professional rewards for researchers that collect, create, manage, and preserve data sets. Traditionally, research quality has been assessed by peer review of textual publications, e.g. journal articles, conference proceedings, and books. Citation indices then provide standard measures of productivity used to reward individuals for their peer-reviewed work. Whether a similar peer review process is appropriate for assessing and ensuring the quality of data sets remains as an open question. How does the traditional process of peer review apply to data sets? This presentation will describe current work being done at the National Center for Atmospheric Research (NCAR) in the context of the Peer REview for Publication & Accreditation of Research Data in the Earth sciences (PREPARDE) project. PREPARDE is assessing practices and processes for data peer review, with the goal of developing recommendations. NCAR data management teams perform various kinds of quality assessment and review of data sets prior to making them publicly available. The poster will investigate how notions of peer review relate to the types of data review already in place at NCAR. We highlight the data set characteristics and management/archiving processes that challenge the traditional peer review processes by using a number of questions as probes, including: Who is qualified to review data sets? What formal and informal documentation is necessary to allow someone outside of a research team to review a data set

  13. Development of the International Spinal Cord Injury Activities and Participation Basic Data Set

    DEFF Research Database (Denmark)

    Post, M W; Charlifue, S; Biering-Sørensen, F

    2016-01-01

    STUDY DESIGN: Consensus decision-making process. OBJECTIVES: The objective of this study was to develop an International Spinal Cord Injury (SCI) Activities and Participation (A&P) Basic Data Set. SETTING: International working group. METHODS: A committee of experts was established to select...... and define A&P data elements to be included in this data set. A draft data set was developed and posted on the International Spinal Cord Society (ISCoS) and American Spinal Injury Association websites and was also disseminated among appropriate organizations for review. Suggested revisions were considered...

  14. KFUPM-KAUST Red Sea model: Digital viscoelastic depth model and synthetic seismic data set

    KAUST Repository

    Al-Shuhail, Abdullatif A.

    2017-06-01

    The Red Sea is geologically interesting due to its unique structures and abundant mineral and petroleum resources, yet no digital geologic models or synthetic seismic data of the Red Sea are publicly available for testing algorithms to image and analyze the area\\'s interesting features. This study compiles a 2D viscoelastic model of the Red Sea and calculates a corresponding multicomponent synthetic seismic data set. The models and data sets are made publicly available for download. We hope this effort will encourage interested researchers to test their processing algorithms on this data set and model and share their results publicly as well.

  15. Statistical Analysis of Probability of Detection Hit/Miss Data for Small Data Sets

    Science.gov (United States)

    Harding, C. A.; Hugo, G. R.

    2003-03-01

    This paper examines the validity of statistical methods for determining nondestructive inspection probability of detection (POD) curves from relatively small hit/miss POD data sets. One method published in the literature is shown to be invalid for analysis of POD hit/miss data. Another standard method is shown to be valid only for data sets containing more than 200 observations. An improved method is proposed which allows robust lower 95% confidence limit POD curves to be determined from data sets containing as few as 50 hit/miss observations.

  16. The EADGENE Microarray Data Analysis Workshop

    NARCIS (Netherlands)

    Koning, de D.J.; Jaffrezic, F.; Lund, M.S.; Watson, M.; Channing, C.; Hulsegge, B.; Pool, M.H.; Buitenhuis, B.; Hedegaard, J.; Hornshoj, H.; Sorensen, P.; Marot, G.; Delmas, C.; Lê Cao, K.A.; San Cristobal, M.; Baron, M.D.; Malinverni, R.; Stella, A.; Brunner, R.M.; Seyfert, H.M.; Jensen, K.; Mouzaki, D.; Waddington, D.; Jiménez-Marín, A.; Perez-Alegre, M.; Perez-Reinado, E.; Closset, R.; Detilleux, J.C.; Dovc, P.; Lavric, M.; Nie, H.; Janss, L.

    2007-01-01

    Microarray analyses have become an important tool in animal genomics. While their use is becoming widespread, there is still a lot of ongoing research regarding the analysis of microarray data. In the context of a European Network of Excellence, 31 researchers representing 14 research groups from 10

  17. The EADGENE Microarray Data Analysis Workshop

    DEFF Research Database (Denmark)

    de Koning, Dirk-Jan; Jaffrézic, Florence; Lund, Mogens Sandø

    2007-01-01

    Microarray analyses have become an important tool in animal genomics. While their use is becoming widespread, there is still a lot of ongoing research regarding the analysis of microarray data. In the context of a European Network of Excellence, 31 researchers representing 14 research groups from...... 10 countries performed and discussed the statistical analyses of real and simulated 2-colour microarray data that were distributed among participants. The real data consisted of 48 microarrays from a disease challenge experiment in dairy cattle, while the simulated data consisted of 10 microarrays...... statistical weights, to omitting a large number of spots or omitting entire slides. Surprisingly, these very different approaches gave quite similar results when applied to the simulated data, although not all participating groups analysed both real and simulated data. The workshop was very successful...

  18. Digital data sets that describe aquifer characteristics of the Central Oklahoma aquifer in central Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized polygons of a constant hydraulic conductivity value for the Central Oklahoma aquifer in central Oklahoma. This area encompasses...

  19. Digital data sets that describe aquifer characteristics of the Central Oklahoma aquifer in central Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized water-level elevation contours for the Central Oklahoma aquifer in central Oklahoma. This area encompasses all or part of...

  20. Digital data sets that describe aquifer characteristics of the Central Oklahoma aquifer in central Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized aquifer boundaries created for a previously published report about the Central Oklahoma aquifer in central Oklahoma. This area...

  1. Digital data sets that describe aquifer characteristics of the Central Oklahoma aquifer in central Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized polygons of a constant recharge value for the Central Oklahoma aquifer in central Oklahoma. This area encompasses all or part of...

  2. Stratospheric Water and OzOne Satellite Homogenized (SWOOSH) data set

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Stratospheric Water and Ozone Satellite Homogenized (SWOOSH) data set is a merged record of stratospheric ozone and water vapor measurements taken by a number of...

  3. International Comprehensive Ocean Atmosphere Data Set (ICOADS) in Near-Real Time (NRT)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Near-Real-Time (NRT) product is an extension of the official ICOADS dataset with preliminary...

  4. Digital data sets that describe aquifer characteristics of the Rush Springs aquifer in western Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized aquifer boundaries for the Rush Springs aquifer in western Oklahoma. This area encompasses all or part of Blaine, Caddo,...

  5. Data sets for manuscript titled Unexpected benefits of reducing aerosol cooling effects

    Data.gov (United States)

    U.S. Environmental Protection Agency — These data sets were created using extensive model simulation results from the WRF-CMAQ model, population distributions, and through the use of an health impact...

  6. Analysis of Water and Energy Budgets and Trends Using the NLDAS Monthly Data Sets

    Science.gov (United States)

    Vollmer, Bruce E.; Rui, Hualan; Mocko, David M.; Teng, William L.; Lei, Guang-Dih

    2012-01-01

    The North American Land Data Assimilation System (NLDAS) is a collaborative project between NASA GSFC, NOAA, Princeton University, and the University of Washington. NLDAS has created surface meteorological forcing data sets using the best-available observations and reanalyses. The forcing data sets are used to drive four separate land-surface models (LSMs), Mosaic, Noah, VIC, and SAC, to produce data sets of soil moisture, snow, runoff, and surface fluxes. NLDAS hourly data, accessible from the NASA GES DISC Hydrology Data Holdings Portal, http://disc.sci.gsfc.nasa.gov/hydrology/data-holdings, are widely used by various user communities in modeling, research, and applications, such as drought and flood monitoring, watershed and water quality management, and case studies of extreme events. More information is available at http://ldas.gsfc.nasa.gov/. To further facilitate analysis of water and energy budgets and trends, NLDAS monthly data sets have been recently released by NASA GES DISC.

  7. EPA Enforcement and Compliance History Online: EPA Enforcement Action Data Set

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Enforcement and Compliance History Online (ECHO) data sets have been compiled for access to larger sets of national data to ensure that ECHO meets your data...

  8. Digital data sets that describe aquifer characteristics of the Elk City aquifer in western Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized polygons of constant hydraulic conductivity values for the Elk City aquifer in western Oklahoma. The aquifer covers an area of...

  9. Digital data sets that describe aquifer characteristics of the Enid isolated terrace aquifer in northwestern Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized aquifer boundaries for the Enid isolated terrace aquifer in northwestern Oklahoma. The Enid isolated terrace aquifer covers...

  10. Digital data sets that describe aquifer characteristics of the Enid isolated terrace aquifer in northwestern Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of a digitized polygon of a constant recharge value for the Enid isolated terrace aquifer in northwestern Oklahoma. The Enid isolated terrace...

  11. Digital data sets that describe aquifer characteristics of the Elk City aquifer in western Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized polygons of constant recharge values for the Elk City aquifer in western Oklahoma. The aquifer covers an area of approximately...

  12. EPA Enforcement and Compliance History Online: Legacy System Clean Air Act Data Set (ZIP)

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Enforcement and Compliance History Online (ECHO) data sets have been compiled for access to larger sets of national data to ensure that ECHO meets your data...

  13. EPA Enforcement and Compliance History Online: Clean Water Act Dischargers Data Set (effluent violations)

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Enforcement and Compliance History Online (ECHO) data sets have been compiled for access to larger sets of national data to ensure that ECHO meets your data...

  14. Digital data sets that describe aquifer characteristics of the Rush Springs aquifer in western Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized polygons of constant hydraulic conductivity values for the Rush Springs aquifer in western Oklahoma. This area encompasses all or...

  15. Treatment Episode Data Set: Discharges (TEDS-D-2006-2011)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Discharges (TEDS-D) is a national census data system of annual discharges from substance abuse treatment facilities. TEDS-D...

  16. Digital data sets that describe aquifer characteristics of the Antlers aquifer in southeastern Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized polygons of constant recharge values for the Antlers aquifer in southeastern Oklahoma. The Early Cretaceous-age Antlers Sandstone...

  17. Digital data sets that describe aquifer characteristics of the High Plains aquifer in western Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digital polygons of constant hydraulic conductivity values for the High Plains aquifer in Oklahoma. This area encompasses the panhandle...

  18. Digital data sets that describe aquifer characteristics of the High Plains aquifer in western Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digital polygons of constant recharge rates for the High Plains aquifer in Oklahoma. This area encompasses the panhandle counties of...

  19. EPA Enforcement and Compliance History Online: Clean Air Act Data Set (ZIP)

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Enforcement and Compliance History Online (ECHO) data sets have been compiled for access to larger sets of national data to ensure that ECHO meets your data...

  20. EPA Enforcement and Compliance History Online: Clean Water Act Dischargers Data Set

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Enforcement and Compliance History Online (ECHO) data sets have been compiled for access to larger sets of national data to ensure that ECHO meets your data...

  1. Digital data sets that describe aquifer characteristics of the Enid isolated terrace aquifer in northwestern Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized water-level elevation contours for the Enid isolated terrace aquifer in northwestern Oklahoma. The Enid isolated terrace aquifer...

  2. Digital data sets that describe aquifer characteristics of the Enid isolated terrace aquifer in northwestern Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized polygons of constant hydraulic conductivity values for the Enid isolated terrace aquifer in northwestern Oklahoma. The Enid...

  3. Digital data sets that describe aquifer characteristics of the Antlers aquifer in southeastern Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized aquifer boundaries of the Antlers aquifer in southeastern Oklahoma. The Early Cretaceous-age Antlers Sandstone is an important...

  4. Digital data sets that describe aquifer characteristics of the Elk City aquifer in western Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized water-level elevation contours for the Elk City aquifer in western Oklahoma. The aquifer covers an area of approximately 193,000...

  5. Digital data sets that describe aquifer characteristics of the High Plains aquifer in western Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized water-level elevation contours for the High Plains aquifer in western Oklahoma. This area encompasses the panhandle counties of...

  6. Digital data sets that describe aquifer characteristics of the Elk City aquifer in western Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized aquifer boundaries for the Elk City aquifer in western Oklahoma. The aquifer covers an area of approximately 193,000 acres and...

  7. Digital data sets that describe aquifer characteristics of the High Plains aquifer in western Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digital aquifer boundaries for the High Plains aquifer in western Oklahoma. This area encompasses the panhandle counties of Cimarron,...

  8. Digital data sets that describe aquifer characteristics of the Antlers aquifer in southeastern Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized polygons of a constant hydraulic conductivity value for the Antlers aquifer in southeastern Oklahoma. The Early Cretaceous-age...

  9. Digital data sets that describe aquifer characteristics of the Antlers aquifer in southeastern Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized water-level elevation contours for the Antlers aquifer in southeastern Oklahoma. The Early Cretaceous-age Antlers Sandstone is an...

  10. Digital images of combined oceanic and continental data sets and their use in tectonic studies

    Science.gov (United States)

    Haxby, W. F.; Labrecque, J. L.; Weissel, J. K.; Karner, G. D.

    1983-01-01

    It is shown how crustal and lithospheric studies can benefit when continental and oceanic data sets are combined. It is also shown how digital imaging techniques provide an effective means for displaying the information contained in these combined data sets. The region of Australia, New Zealand, and the surrounding ocean is chosen for illustrating the advantages of combining continental and oceanic data sets. Here, the tectonic setting of Australia, a relatively stable continent in an intraplate environment, can be contrasted with New Zealand, which is traversed by one of the world's major plate boundaries. Simultaneous display and analysis of complementary data sets make possible a rapid geologic and tectonic interpretation of regional areas. It is shown, by way of example, that the relationship between topography and gravity anomalies in central Australia gives important new information concerning the state of isostasy of thrust terrains and their related sedimentary basins and hence provides a means of understanding the mechanical properties of the continental lithosphere.

  11. EPA Enforcement and Compliance History Online: Hazardous Waste Sites Data Set

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Enforcement and Compliance History Online (ECHO) data sets have been compiled for access to larger sets of national data to ensure that ECHO meets your data...

  12. Digital data sets that describe aquifer characteristics of the Rush Springs aquifer in western Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized polygons of constant recharge values for the Rush Springs aquifer in western Oklahoma. This area encompasses all or part of...

  13. Trails, Bike, bike trail data set, Published in 2006, Washoe County.

    Data.gov (United States)

    NSGIC GIS Inventory (aka Ramona) — This Trails, Bike dataset, was produced all or in part from Field Survey/GPS information as of 2006. It is described as 'bike trail data set'. Data by this publisher...

  14. Digital data sets that describe aquifer characteristics of the Rush Springs aquifer in western Oklahoma

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set consists of digitized water-level elevation contours for the Rush Springs aquifer in western Oklahoma. This area encompasses all or part of Blaine,...

  15. Digital data set describing ground-water regions with unconsolidated watercourses in the conterminous US

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set describes ground-water regions in the United States defined by the U.S. Geological Survey. These ground-water regions are useful for dividing the...

  16. 1992 through 2010 Treatment Episode Data Set - Admissions (TEDS-A)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set (TEDS) is an administrative data system providing descriptive information about the national flow of admissions to providers of...

  17. Hotels and Motels, casino hotel data set, Published in 2006, Washoe County.

    Data.gov (United States)

    NSGIC GIS Inventory (aka Ramona) — This Hotels and Motels dataset, was produced all or in part from Published Reports/Deeds information as of 2006. It is described as 'casino hotel data set'. Data by...

  18. How we load our data sets with theories and why we do so purposefully.

    Science.gov (United States)

    Rochefort-Maranda, Guillaume

    2016-12-01

    In this paper, I compare theory-laden perceptions with imputed data sets. The similarities between the two allow me to show how the phenomenon of theory-ladenness can manifest itself in statistical analyses. More importantly, elucidating the differences between them will allow me to broaden the focus of the existing literature on theory-ladenness and to introduce some much-needed nuances. The topic of statistical imputation has received no attention in philosophy of science. Yet, imputed data sets are very similar to theory-laden perceptions, and they are now an integral part of many scientific inferences. Unlike the existence of theory-laden perceptions, that of imputed data sets cannot be challenged or reduced to a manageable source of error. In fact, imputed data sets are created purposefully in order to improve the quality of our inferences. They do not undermine the possibility of scientific knowledge; on the contrary, they are epistemically desirable.

  19. SkData: data sets and algorithm evaluation protocols in Python

    Science.gov (United States)

    Bergstra, James; Pinto, Nicolas; Cox, David D.

    2015-01-01

    Machine learning benchmark data sets come in all shapes and sizes, whereas classification algorithms assume sanitized input, such as (x, y) pairs with vector-valued input x and integer class label y. Researchers and practitioners know all too well how tedious it can be to get from the URL of a new data set to a NumPy ndarray suitable for e.g. pandas or sklearn. The SkData library handles that work for a growing number of benchmark data sets (small and large) so that one-off in-house scripts for downloading and parsing data sets can be replaced with library code that is reliable, community-tested, and documented. The SkData library also introduces an open-ended formalization of training and testing protocols that facilitates direct comparison with published research. This paper describes the usage and architecture of the SkData library.

  20. Benchmark products for land evapotranspiration: LandFlux-EVAL multi-data set synthesis

    KAUST Repository

    Mueller, B.

    2013-10-01

    Land evapotranspiration (ET) estimates are available from several global data sets.Here, Monthly Global Land et Synthesis Products, Merged from These Individual Data Sets over the Time Periods 1989-1995 (7 Yr) and 1989-2005 (17 Yr), Are Presented. the Merged Synthesis Products over the Shorter Period Are Based on A Total of 40 Distinct Data Sets while Those over the Longer Period Are Based on A Total of 14 Data Sets. in the Individual Data Sets, et Is Derived from Satellite And/or in Situ Observations (Diagnostic Data Sets) or Calculated Via Land-surface Models (LSMs) Driven with Observations-based Forcing or Output from Atmospheric Reanalyses. Statistics for Four Merged Synthesis Products Are Provided, One Including All Data Sets and Three Including only Data Sets from One Category Each (Diagnostic, LSMs, and Reanalyses). the Multi-annual Variations of et in the Merged Synthesis Products Display Realistic Responses. They Are Also Consistent with Previous Findings of A Global Increase in et between 1989 and 1997 (0.13 Mm yr-2 in Our Merged Product) Followed by A Significant Decrease in This Trend (-0.18 Mm yr-2), although These Trends Are Relatively Small Compared to the Uncertainty of Absolute et Values. the Global Mean et from the Merged Synthesis Products (Based on All Data Sets) Is 493 Mm yr-1 (1.35 Mm d-1) for Both the 1989-1995 and 1989-2005 Products, Which Is Relatively Low Compared to Previously Published Estimates. We Estimate Global Runoff (Precipitation Minus ET) to 263 Mm yr -1 (34 406 km3 yr-1) for A Total Land Area of 130 922 000 km2. Precipitation, Being An Important Driving Factor and Input to Most Simulated et Data Sets, Presents Uncertainties between Single Data Sets As Large As Those in the et Estimates. in Order to Reduce Uncertainties in Current et Products, Improving the Accuracy of the Input Variables, Especially Precipitation, As Well As the Parameterizations of ET, Are Crucial. 2013 Author(s).

  1. Can survival prediction be improved by merging gene expression data sets?

    Directory of Open Access Journals (Sweden)

    Haleh Yasrebi

    Full Text Available BACKGROUND: High-throughput gene expression profiling technologies generating a wealth of data, are increasingly used for characterization of tumor biopsies for clinical trials. By applying machine learning algorithms to such clinically documented data sets, one hopes to improve tumor diagnosis, prognosis, as well as prediction of treatment response. However, the limited number of patients enrolled in a single trial study limits the power of machine learning approaches due to over-fitting. One could partially overcome this limitation by merging data from different studies. Nevertheless, such data sets differ from each other with regard to technical biases, patient selection criteria and follow-up treatment. It is therefore not clear at all whether the advantage of increased sample size outweighs the disadvantage of higher heterogeneity of merged data sets. Here, we present a systematic study to answer this question specifically for breast cancer data sets. We use survival prediction based on Cox regression as an assay to measure the added value of merged data sets. RESULTS: Using time-dependent Receiver Operating Characteristic-Area Under the Curve (ROC-AUC and hazard ratio as performance measures, we see in overall no significant improvement or deterioration of survival prediction with merged data sets as compared to individual data sets. This apparently was due to the fact that a few genes with strong prognostic power were not available on all microarray platforms and thus were not retained in the merged data sets. Surprisingly, we found that the overall best performance was achieved with a single-gene predictor consisting of CYB5D1. CONCLUSIONS: Merging did not deteriorate performance on average despite (a The diversity of microarray platforms used. (b The heterogeneity of patients cohorts. (c The heterogeneity of breast cancer disease. (d Substantial variation of time to death or relapse. (e The reduced number of genes in the merged data

  2. Cost-conscious comparison of supervised learning algorithms over multiple data sets

    OpenAIRE

    Ulaş, Aydın; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem

    2012-01-01

    In the literature, there exist statistical tests to compare supervised learning algorithms on multiple data sets in terms of accuracy but they do not always generate an ordering. We propose Multi(2)Test, a generalization of our previous work, for ordering multiple learning algorithms on multiple data sets from "best" to "worst" where our goodness measure is composed of a prior cost term additional to generalization error. Our simulations show that Multi2Test generates orderings using pairwise...

  3. Comparison of multivariate calibration techniques applied to experimental NIR data sets

    OpenAIRE

    Centner, V; Verdu-Andres, J; Walczak, B; Jouan-Rimbaud, D; Despagne, F; Pasti, L; Poppi, R; Massart, DL; de Noord, OE

    2000-01-01

    The present study compares the performance of different multivariate calibration techniques applied to four near-infrared data sets when test samples are well within the calibration domain. Three types of problems are discussed: the nonlinear calibration, the calibration using heterogeneous data sets, and the calibration in the presence of irrelevant information in the set of predictors. Recommendations are derived from the comparison, which should help to guide a nonchemometrician through th...

  4. Meta-analysis of pathway enrichment: combining independent and dependent omics data sets.

    Directory of Open Access Journals (Sweden)

    Alexander Kaever

    Full Text Available A major challenge in current systems biology is the combination and integrative analysis of large data sets obtained from different high-throughput omics platforms, such as mass spectrometry based Metabolomics and Proteomics or DNA microarray or RNA-seq-based Transcriptomics. Especially in the case of non-targeted Metabolomics experiments, where it is often impossible to unambiguously map ion features from mass spectrometry analysis to metabolites, the integration of more reliable omics technologies is highly desirable. A popular method for the knowledge-based interpretation of single data sets is the (Gene Set Enrichment Analysis. In order to combine the results from different analyses, we introduce a methodical framework for the meta-analysis of p-values obtained from Pathway Enrichment Analysis (Set Enrichment Analysis based on pathways of multiple dependent or independent data sets from different omics platforms. For dependent data sets, e.g. obtained from the same biological samples, the framework utilizes a covariance estimation procedure based on the nonsignificant pathways in single data set enrichment analysis. The framework is evaluated and applied in the joint analysis of Metabolomics mass spectrometry and Transcriptomics DNA microarray data in the context of plant wounding. In extensive studies of simulated data set dependence, the introduced correlation could be fully reconstructed by means of the covariance estimation based on pathway enrichment. By restricting the range of p-values of pathways considered in the estimation, the overestimation of correlation, which is introduced by the significant pathways, could be reduced. When applying the proposed methods to the real data sets, the meta-analysis was shown not only to be a powerful tool to investigate the correlation between different data sets and summarize the results of multiple analyses but also to distinguish experiment-specific key pathways.

  5. Meta-analysis of pathway enrichment: combining independent and dependent omics data sets.

    Science.gov (United States)

    Kaever, Alexander; Landesfeind, Manuel; Feussner, Kirstin; Morgenstern, Burkhard; Feussner, Ivo; Meinicke, Peter

    2014-01-01

    A major challenge in current systems biology is the combination and integrative analysis of large data sets obtained from different high-throughput omics platforms, such as mass spectrometry based Metabolomics and Proteomics or DNA microarray or RNA-seq-based Transcriptomics. Especially in the case of non-targeted Metabolomics experiments, where it is often impossible to unambiguously map ion features from mass spectrometry analysis to metabolites, the integration of more reliable omics technologies is highly desirable. A popular method for the knowledge-based interpretation of single data sets is the (Gene) Set Enrichment Analysis. In order to combine the results from different analyses, we introduce a methodical framework for the meta-analysis of p-values obtained from Pathway Enrichment Analysis (Set Enrichment Analysis based on pathways) of multiple dependent or independent data sets from different omics platforms. For dependent data sets, e.g. obtained from the same biological samples, the framework utilizes a covariance estimation procedure based on the nonsignificant pathways in single data set enrichment analysis. The framework is evaluated and applied in the joint analysis of Metabolomics mass spectrometry and Transcriptomics DNA microarray data in the context of plant wounding. In extensive studies of simulated data set dependence, the introduced correlation could be fully reconstructed by means of the covariance estimation based on pathway enrichment. By restricting the range of p-values of pathways considered in the estimation, the overestimation of correlation, which is introduced by the significant pathways, could be reduced. When applying the proposed methods to the real data sets, the meta-analysis was shown not only to be a powerful tool to investigate the correlation between different data sets and summarize the results of multiple analyses but also to distinguish experiment-specific key pathways.

  6. The Effect of Training Data Set Composition on the Performance of a Neural Image Caption Generator

    Science.gov (United States)

    2017-09-01

    ARL-TR-8124 ● SEP 2017 US Army Research Laboratory The Effect of Training Data Set Composition on the Performance of a Neural...Laboratory The Effect of Training Data Set Composition on the Performance of a Neural Image Caption Generator by Abigail Wilson Montgomery Blair...notwithstanding any other provision of law, no person shall be subject to any penalty for failing to comply with a collection of information if it does not

  7. Structural break or long memory: an empirical survey on daily rainfall data sets across Malaysia

    OpenAIRE

    F. Yusof; Kane, I. L.; Yusop, Z.

    2013-01-01

    A short memory process that encounters occasional structural breaks in mean can show a slower rate of decay in the autocorrelation function and other properties of fractional integrated I (d) processes. In this paper we employed a procedure for estimating the fractional differencing parameter in semiparametric contexts proposed by Geweke and Porter-Hudak (1983) to analyse nine daily rainfall data sets across Malaysia. The results indicate that all the data sets exhibit long ...

  8. Acoustic Metadata Management and Transparent Access to Networked Oceanographic Data Sets

    Science.gov (United States)

    2015-09-30

    Transparent Access to Networked Oceanographic Data Sets Marie A. Roch Dept. of Computer Science San Diego State University 5500 Campanile Drive San...University. In addition to providing database services, the Tethys metadata server also provides access to oceanographic data sets in a consistent...passive acoustic monitoring community. 3 3. Access to network available data products in a standard manner (e.g. ephemeris). 4. Secure access on

  9. Regression with Small Data Sets: A Case Study using Code Surrogates in Additive Manufacturing

    Energy Technology Data Exchange (ETDEWEB)

    Kamath, C. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Fan, Y. J. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2017-04-11

    There has been an increasing interest in recent years in the mining of massive data sets whose sizes are measured in terabytes. While it is easy to collect such large data sets in some application domains, there are others where collecting even a single data point can be very expensive, so the resulting data sets have only tens or hundreds of samples. For example, when complex computer simulations are used to understand a scientific phenomenon, we want to run the simulation for many different values of the input parameters and analyze the resulting output. The data set relating the simulation inputs and outputs is typically quite small, especially when each run of the simulation is expensive. However, regression techniques can still be used on such data sets to build an inexpensive \\surrogate" that could provide an approximate output for a given set of inputs. A good surrogate can be very useful in sensitivity analysis, uncertainty analysis, and in designing experiments. In this paper, we compare different regression techniques to determine how well they predict melt-pool characteristics in the problem domain of additive manufacturing. Our analysis indicates that some of the commonly used regression methods do perform quite well even on small data sets.

  10. Identifiers for Earth Science Data Sets: Where We Have Been and Where We Need to Go

    Directory of Open Access Journals (Sweden)

    Justin C. Goldstein

    2017-04-01

    Full Text Available Considerable attention has been devoted to the use of persistent identifiers for assets of interest to scientific and other communities alike over the last two decades. Among persistent identifiers, Digital Object Identifiers (DOIs stand out quite prominently, with approximately 133 million DOIs assigned to various objects as of February 2017. While the assignment of DOIs to objects such as scientific publications has been in place for many years, their assignment to Earth science data sets is more recent. Applying persistent identifiers to data sets enables improved tracking of their use and reuse, facilitates the crediting of data producers, and aids reproducibility through associating research with the exact data set(s used. Maintaining provenance – i.e., tracing back lineage of significant scientific conclusions to the entities (data sets, algorithms, instruments, satellites, etc. that lead to the conclusions, would be prohibitive without persistent identifiers. This paper provides a brief background on the use of persistent identifiers in general within the US, and DOIs more specifically. We examine their recent use for Earth science data sets, and outline successes and some remaining challenges. Among the challenges, for example, is the ability to conveniently and consistently obtain data citation statistics using the DOIs assigned by organizations that manage data sets.

  11. Data sets for author name disambiguation: an empirical analysis and a new resource.

    Science.gov (United States)

    Müller, Mark-Christoph; Reitz, Florian; Roy, Nicolas

    2017-01-01

    Data sets of publication meta data with manually disambiguated author names play an important role in current author name disambiguation (AND) research. We review the most important data sets used so far, and compare their respective advantages and shortcomings. From the results of this review, we derive a set of general requirements to future AND data sets. These include both trivial requirements, like absence of errors and preservation of author order, and more substantial ones, like full disambiguation and adequate representation of publications with a small number of authors and highly variable author names. On the basis of these requirements, we create and make publicly available a new AND data set, SCAD-zbMATH. Both the quantitative analysis of this data set and the results of our initial AND experiments with a naive baseline algorithm show the SCAD-zbMATH data set to be considerably different from existing ones. We consider it a useful new resource that will challenge the state of the art in AND and benefit the AND research community.

  12. Measuring the value of research data: a citation analysis of oceanographic data sets.

    Directory of Open Access Journals (Sweden)

    Christopher W Belter

    Full Text Available Evaluation of scientific research is becoming increasingly reliant on publication-based bibliometric indicators, which may result in the devaluation of other scientific activities--such as data curation--that do not necessarily result in the production of scientific publications. This issue may undermine the movement to openly share and cite data sets in scientific publications because researchers are unlikely to devote the effort necessary to curate their research data if they are unlikely to receive credit for doing so. This analysis attempts to demonstrate the bibliometric impact of properly curated and openly accessible data sets by attempting to generate citation counts for three data sets archived at the National Oceanographic Data Center. My findings suggest that all three data sets are highly cited, with estimated citation counts in most cases higher than 99% of all the journal articles published in Oceanography during the same years. I also find that methods of citing and referring to these data sets in scientific publications are highly inconsistent, despite the fact that a formal citation format is suggested for each data set. These findings have important implications for developing a data citation format, encouraging researchers to properly curate their research data, and evaluating the bibliometric impact of individuals and institutions.

  13. Global temperature response to the major volcanic eruptions in multiple reanalysis data sets

    Directory of Open Access Journals (Sweden)

    M. Fujiwara

    2015-12-01

    Full Text Available The global temperature responses to the eruptions of Mount Agung in 1963, El Chichón in 1982, and Mount Pinatubo in 1991 are investigated using nine currently available reanalysis data sets (JRA-55, MERRA, ERA-Interim, NCEP-CFSR, JRA-25, ERA-40, NCEP-1, NCEP-2, and 20CR. Multiple linear regression is applied to the zonal and monthly mean time series of temperature for two periods, 1979–2009 (for eight reanalysis data sets and 1958–2001 (for four reanalysis data sets, by considering explanatory factors of seasonal harmonics, linear trends, Quasi-Biennial Oscillation, solar cycle, and El Niño Southern Oscillation. The residuals are used to define the volcanic signals for the three eruptions separately, and common and different responses among the older and newer reanalysis data sets are highlighted for each eruption. In response to the Mount Pinatubo eruption, most reanalysis data sets show strong warming signals (up to 2–3 K for 1-year average in the tropical lower stratosphere and weak cooling signals (down to −1 K in the subtropical upper troposphere. For the El Chichón eruption, warming signals in the tropical lower stratosphere are somewhat smaller than those for the Mount Pinatubo eruption. The response to the Mount Agung eruption is asymmetric about the equator with strong warming in the Southern Hemisphere midlatitude upper troposphere to lower stratosphere. Comparison of the results from several different reanalysis data sets confirms the atmospheric temperature response to these major eruptions qualitatively, but also shows quantitative differences even among the most recent reanalysis data sets. The consistencies and differences among different reanalysis data sets provide a measure of the confidence and uncertainty in our current understanding of the volcanic response. The results of this intercomparison study may be useful for validation of climate model responses to volcanic forcing and for assessing proposed

  14. Global temperature response to the major volcanic eruptions in multiple reanalysis data sets

    Science.gov (United States)

    Fujiwara, M.; Hibino, T.; Mehta, S. K.; Gray, L.; Mitchell, D.; Anstey, J.

    2015-12-01

    The global temperature responses to the eruptions of Mount Agung in 1963, El Chichón in 1982, and Mount Pinatubo in 1991 are investigated using nine currently available reanalysis data sets (JRA-55, MERRA, ERA-Interim, NCEP-CFSR, JRA-25, ERA-40, NCEP-1, NCEP-2, and 20CR). Multiple linear regression is applied to the zonal and monthly mean time series of temperature for two periods, 1979-2009 (for eight reanalysis data sets) and 1958-2001 (for four reanalysis data sets), by considering explanatory factors of seasonal harmonics, linear trends, Quasi-Biennial Oscillation, solar cycle, and El Niño Southern Oscillation. The residuals are used to define the volcanic signals for the three eruptions separately, and common and different responses among the older and newer reanalysis data sets are highlighted for each eruption. In response to the Mount Pinatubo eruption, most reanalysis data sets show strong warming signals (up to 2-3 K for 1-year average) in the tropical lower stratosphere and weak cooling signals (down to -1 K) in the subtropical upper troposphere. For the El Chichón eruption, warming signals in the tropical lower stratosphere are somewhat smaller than those for the Mount Pinatubo eruption. The response to the Mount Agung eruption is asymmetric about the equator with strong warming in the Southern Hemisphere midlatitude upper troposphere to lower stratosphere. Comparison of the results from several different reanalysis data sets confirms the atmospheric temperature response to these major eruptions qualitatively, but also shows quantitative differences even among the most recent reanalysis data sets. The consistencies and differences among different reanalysis data sets provide a measure of the confidence and uncertainty in our current understanding of the volcanic response. The results of this intercomparison study may be useful for validation of climate model responses to volcanic forcing and for assessing proposed geoengineering by stratospheric

  15. An intercomparison of observational precipitation data sets over Northwest India during winter

    Science.gov (United States)

    Nageswararao, M. M.; Mohanty, U. C.; Ramakrishna, S. S. V. S.; Dimri, A. P.

    2017-02-01

    Winter (DJF) precipitation over Northwest India (NWI) is very important for the cultivation of Rabi crops. Thus, an accurate estimation of high-resolution observations, evaluation of high-resolution numerical models, and understanding the local variability trends are essential. The objective of this study is to verify the quality of a new high spatial resolution (0.25° × 0.25°) gridded daily precipitation data set of India Meteorological Department (IMD1) over NWI during winter. An intercomparison with four existing precipitation data sets at 0.5° × 0.5° of IMD (IMD2), 1° × 1° of IMD (IMD3), 0.25° × 0.25° of APHRODITE (APRD1), and 0.5° × 0.5° of APHRODITE (APRD1) resolution during a common period of 1971-2003 is done. The evaluation of data quality of these five data sets against available 26 station observations is carried out, and the results clearly indicate that all the five data sets reasonably agreed with the station observation. However, the errors are relatively more in all the five data sets over Jammu and Kashmir-related four stations (Srinagar, Drass, Banihal top, and Dawar), while these errors are less in the other stations. It may be due to the lack of station observations over the region. The quality of IMD1 data set over NWI for winter precipitation is reasonably well than the other data sets. The intercomparison analysis suggests that the climatological mean, interannual variability, and coefficient of variation from IMD1 are similar with other data sets. Further, the analysis extended to the India meteorological subdivisions over the region. This analysis indicates overestimation in IMD3 and underestimation in APRD1 and APRD2 over Jammu and Kashmir, Himachal Pradesh, and NWI as a whole, whereas IMD2 is closer to IMD1. Moreover, all the five data sets are highly correlated (>0.5) among them at 99.9% confidence level for all subdivisions. It is remarkably noticed that multicategorical (light precipitation, moderate precipitation, heavy

  16. Chicken's Genome Decoded

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    @@ After completing the work on mapping chicken genome sequence and chicken genome variation in early March, 2004, two international research consortiums have made significant progress in reading the maps, shedding new light on the studies into the first bird as well as the first agricultural animal that has its genome sequenced and analyzed in the world.

  17. Transcriptomics Research in Chicken

    NARCIS (Netherlands)

    Yang, D.Y.; Gao, C.; Zhu, L.Q.; Tang, L.G.; Liu, J.; Nie, H.

    2012-01-01

    The chicken (Gallus gallus) is an important model organism in genetics, developmental biology, immunology and evolutionary research. Moreover, besides being an important model organism the chicken is also a very important agricultural species and an important source of food (eggs and meat). The avai

  18. Comparison of three vertically resolved ozone data sets: climatology, trends and radiative forcings

    Directory of Open Access Journals (Sweden)

    B. Hassler

    2013-06-01

    Full Text Available Climate models that do not simulate changes in stratospheric ozone concentrations require the prescription of ozone fields to accurately calculate UV fluxes and stratospheric heating rates. In this study, three different global ozone time series that are available for this purpose are compared: the data set of Randel and Wu (2007 (RW07, Cionni et al. (2011 (SPARC, and Bodeker et al. (2013 (BDBP. All three data sets represent multiple-linear regression fits to vertically resolved ozone observations, resulting in a spatially and temporally continuous stratospheric ozone field covering at least the period from 1979 to 2005. The main differences among the data sets result from regression models, which use different observations and include different basis functions. The data sets are compared against ozonesonde and satellite observations to assess how the data sets represent concentrations, trends and interannual variability. In the Southern Hemisphere polar region, RW07 and SPARC underestimate the ozone depletion in spring ozonesonde measurements. A piecewise linear trend regression is performed to estimate the 1979–1996 ozone decrease globally, covering a period of extreme depletion in most regions. BDBP overestimates Arctic and tropical ozone depletion over this period relative to the available measurements, whereas the depletion is underestimated in RW07 and SPARC. While the three data sets yield ozone concentrations that are within a range of different observations, there is a large spread in their respective ozone trends. One consequence of this is differences of almost a factor of four in the calculated stratospheric ozone radiative forcing between the data sets (RW07: −0.038 Wm−2, SPARC: −0.033 Wm−2, BDBP: −0.119 Wm−2, important in assessing the contribution of stratospheric ozone depletion to the total anthropogenic radiative forcing.

  19. The chicken SLAM family.

    Science.gov (United States)

    Straub, Christian; Viertlboeck, Birgit C; Göbel, Thomas W

    2013-01-01

    The signaling lymphocytic activation molecule (SLAM) family of receptors is critically involved in the immune regulation of lymphocytes but has only been detected in mammals, with one member being present in Xenopus. Here, we describe the identification, cloning, and analysis of the chicken homologues to the mammalian SLAMF1 (CD150), SLAMF2 (CD48), and SLAMF4 (CD244, 2B4). Two additional chicken SLAM genes were identified and designated SLAMF3like and SLAM5like in order to stress that those two receptors have no clear mammalian counterpart but share some features with mammalian SLAMF3 and SLAMF5, respectively. Three of the chicken SLAM genes are located on chromosome 25, whereas two are currently not yet assigned. The mammalian and chicken receptors share a common structure with a V-like domain that lacks conserved cysteine residues and a C2-type Ig domain with four cysteines forming two disulfide bonds. Chicken SLAMF2, like its mammalian counterpart, lacks a transmembrane and cytoplasmic domain and thus represents a glycosyl-phosphatidyl-inositol-anchored protein. The cytoplasmic tails of SLAMF1 and SLAMF4 display two and four conserved immunoreceptor tyrosine-based switch motifs (ITSMs), respectively, whereas both chicken SLAMF3like and SLAMF5like have only a single ITSM. We have also identified the chicken homologues of the SLAM-associated protein family of adaptors (SAP), SAP and EAT-2. Chicken SAP shares about 70 % identity with mammalian SAP, and chicken EAT-2 is homologous to mouse EAT-2, whereas human EAT-2 is much shorter. The characterization of the chicken SLAM family of receptors and the SAP adaptors demonstrates the phylogenetic conservation of this family, in particular, its signaling capacities.

  20. MiniWall Tool for Analyzing CFD and Wind Tunnel Large Data Sets

    Science.gov (United States)

    Schuh, Michael J.; Melton, John E.; Stremel, Paul M.

    2017-01-01

    It is challenging to review and assimilate large data sets created by Computational Fluid Dynamics (CFD) simulations and wind tunnel tests. Over the past 10 years, NASA Ames Research Center has developed and refined a software tool dubbed the "MiniWall" to increase productivity in reviewing and understanding large CFD-generated data sets. Under the recent NASA ERA project, the application of the tool expanded to enable rapid comparison of experimental and computational data. The MiniWall software is browser based so that it runs on any computer or device that can display a web page. It can also be used remotely and securely by using web server software such as the Apache HTTP Server. The MiniWall software has recently been rewritten and enhanced to make it even easier for analysts to review large data sets and extract knowledge and understanding from these data sets. This paper describes the MiniWall software and demonstrates how the different features are used to review and assimilate large data sets.

  1. MiniWall Tool for Analyzing CFD and Wind Tunnel Large Data Sets

    Science.gov (United States)

    Schuh, Michael J.; Melton, John E.; Stremel, Paul M.

    2017-01-01

    It is challenging to review and assimilate large data sets created by Computational Fluid Dynamics (CFD) simulations and wind tunnel tests. Over the past 10 years, NASA Ames Research Center has developed and refined a software tool dubbed the MiniWall to increase productivity in reviewing and understanding large CFD-generated data sets. Under the recent NASA ERA project, the application of the tool expanded to enable rapid comparison of experimental and computational data. The MiniWall software is browser based so that it runs on any computer or device that can display a web page. It can also be used remotely and securely by using web server software such as the Apache HTTP server. The MiniWall software has recently been rewritten and enhanced to make it even easier for analysts to review large data sets and extract knowledge and understanding from these data sets. This paper describes the MiniWall software and demonstrates how the different features are used to review and assimilate large data sets.

  2. Comparison and consolidation of microarray data sets of human tissue expression

    Science.gov (United States)

    2010-01-01

    Background Human tissue displays a remarkable diversity in structure and function. To understand how such diversity emerges from the same DNA, systematic measurements of gene expression across different tissues in the human body are essential. Several recent studies addressed this formidable task using microarray technologies. These large tissue expression data sets have provided us an important basis for biomedical research. However, it is well known that microarray data can be compromised by high noise level and various experimental artefacts. Critical comparison of different data sets can help to reveal such errors and to avoid pitfalls in their application. Results We present here the first comparison and integration of four freely available tissue expression data sets generated using three different microarray platforms and containing a total of 377 microarray hybridizations. When assessing the tissue expression of genes, we found that the results considerably depend on the chosen data set. Nevertheless, the comparison also revealed statistically significant similarity of gene expression profiles across different platforms. This enabled us to construct consolidated lists of platform-independent tissue-specific genes using a set of complementary measures. Follow-up analyses showed that results based on consolidated data tend to be more reliable. Conclusions Our study strongly indicates that the consolidation of the four different tissue expression data sets can increase data quality and can lead to biologically more meaningful results. The provided compendium of platform-independent gene lists should facilitate the identification of novel tissue-specific marker genes. PMID:20465848

  3. Comparison and consolidation of microarray data sets of human tissue expression

    Directory of Open Access Journals (Sweden)

    Futschik Matthias E

    2010-05-01

    Full Text Available Abstract Background Human tissue displays a remarkable diversity in structure and function. To understand how such diversity emerges from the same DNA, systematic measurements of gene expression across different tissues in the human body are essential. Several recent studies addressed this formidable task using microarray technologies. These large tissue expression data sets have provided us an important basis for biomedical research. However, it is well known that microarray data can be compromised by high noise level and various experimental artefacts. Critical comparison of different data sets can help to reveal such errors and to avoid pitfalls in their application. Results We present here the first comparison and integration of four freely available tissue expression data sets generated using three different microarray platforms and containing a total of 377 microarray hybridizations. When assessing the tissue expression of genes, we found that the results considerably depend on the chosen data set. Nevertheless, the comparison also revealed statistically significant similarity of gene expression profiles across different platforms. This enabled us to construct consolidated lists of platform-independent tissue-specific genes using a set of complementary measures. Follow-up analyses showed that results based on consolidated data tend to be more reliable. Conclusions Our study strongly indicates that the consolidation of the four different tissue expression data sets can increase data quality and can lead to biologically more meaningful results. The provided compendium of platform-independent gene lists should facilitate the identification of novel tissue-specific marker genes.

  4. An "Electronic Fluorescent Pictograph" browser for exploring and analyzing large-scale biological data sets.

    Directory of Open Access Journals (Sweden)

    Debbie Winter

    Full Text Available BACKGROUND: The exploration of microarray data and data from other high-throughput projects for hypothesis generation has become a vital aspect of post-genomic research. For the non-bioinformatics specialist, however, many of the currently available tools provide overwhelming amounts of data that are presented in a non-intuitive way. METHODOLOGY/PRINCIPAL FINDINGS: In order to facilitate the interpretation and analysis of microarray data and data from other large-scale data sets, we have developed a tool, which we have dubbed the electronic Fluorescent Pictograph - or eFP - Browser, available at http://www.bar.utoronto.ca/, for exploring microarray and other data for hypothesis generation. This eFP Browser engine paints data from large-scale data sets onto pictographic representations of the experimental samples used to generate the data sets. We give examples of using the tool to present Arabidopsis gene expression data from the AtGenExpress Consortium (Arabidopsis eFP Browser, data for subcellular localization of Arabidopsis proteins (Cell eFP Browser, and mouse tissue atlas microarray data (Mouse eFP Browser. CONCLUSIONS/SIGNIFICANCE: The eFP Browser software is easily adaptable to microarray or other large-scale data sets from any organism and thus should prove useful to a wide community for visualizing and interpreting these data sets for hypothesis generation.

  5. Constructing Isosurfaces from 3D Data Sets Taking Account of Depth Sorting of Polyhedra

    Institute of Scientific and Technical Information of China (English)

    周勇; 唐泽圣

    1994-01-01

    Creating and rendering intermediate geometric primitives is one of the approaches to visualisze data sets in 3D space.Some algorithms have been developed to construct isosurface from uniformly distributed 3D data sets.These algorithms assume that the function value varies linearly along edges of each cell.But to irregular 3D data sets,this assumption is inapplicable.Moreover,the detth sorting of cells is more complicated for irregular data sets,which is indispensable for generating isosurface images or semitransparent isosurface images,if Z-buffer method is not adopted.In this paper,isosurface models based on the assumption that the function value has nonlinear distribution within a tetrahedron are proposed.The depth sorting algorithm and data structures are developed for the irregular data sets in which cells may be subdivided into tetrahedra.The implementation issues of this algorithm are discussed and experimental results are shown to illustrate potentials of this technique.

  6. Assessing effects of variation in global climate data sets on spatial predictions from climate envelope models

    Science.gov (United States)

    Romanach, Stephanie; Watling, James I.; Fletcher, Robert J.; Speroterra, Carolina; Bucklin, David N.; Brandt, Laura A.; Pearlstine, Leonard G.; Escribano, Yesenia; Mazzotti, Frank J.

    2014-01-01

    Climate change poses new challenges for natural resource managers. Predictive modeling of species–environment relationships using climate envelope models can enhance our understanding of climate change effects on biodiversity, assist in assessment of invasion risk by exotic organisms, and inform life-history understanding of individual species. While increasing interest has focused on the role of uncertainty in future conditions on model predictions, models also may be sensitive to the initial conditions on which they are trained. Although climate envelope models are usually trained using data on contemporary climate, we lack systematic comparisons of model performance and predictions across alternative climate data sets available for model training. Here, we seek to fill that gap by comparing variability in predictions between two contemporary climate data sets to variability in spatial predictions among three alternative projections of future climate. Overall, correlations between monthly temperature and precipitation variables were very high for both contemporary and future data. Model performance varied across algorithms, but not between two alternative contemporary climate data sets. Spatial predictions varied more among alternative general-circulation models describing future climate conditions than between contemporary climate data sets. However, we did find that climate envelope models with low Cohen's kappa scores made more discrepant spatial predictions between climate data sets for the contemporary period than did models with high Cohen's kappa scores. We suggest conservation planners evaluate multiple performance metrics and be aware of the importance of differences in initial conditions for spatial predictions from climate envelope models.

  7. Integrated SFM Techniques Using Data Set from Google Earth 3d Model and from Street Level

    Science.gov (United States)

    Inzerillo, L.

    2017-08-01

    Structure from motion (SfM) represents a widespread photogrammetric method that uses the photogrammetric rules to carry out a 3D model from a photo data set collection. Some complex ancient buildings, such as Cathedrals, or Theatres, or Castles, etc. need to implement the data set (realized from street level) with the UAV one in order to have the 3D roof reconstruction. Nevertheless, the use of UAV is strong limited from the government rules. In these last years, Google Earth (GE) has been enriched with the 3D models of the earth sites. For this reason, it seemed convenient to start to test the potentiality offered by GE in order to extract from it a data set that replace the UAV function, to close the aerial building data set, using screen images of high resolution 3D models. Users can take unlimited "aerial photos" of a scene while flying around in GE at any viewing angle and altitude. The challenge is to verify the metric reliability of the SfM model carried out with an integrated data set (the one from street level and the one from GE) aimed at replace the UAV use in urban contest. This model is called integrated GE SfM model (i-GESfM). In this paper will be present a case study: the Cathedral of Palermo.

  8. Evolutionary optimization of PAW data-sets for accurate high pressure simulations

    Science.gov (United States)

    Sarkar, Kanchan; Topsakal, Mehmet; Holzwarth, N. A. W.; Wentzcovitch, Renata M.

    2017-10-01

    We examine the challenge of performing accurate electronic structure calculations at high pressures by comparing the results of all-electron full potential linearized augmented-plane-wave calculations, as implemented in the WIEN2k code, with those of the projector augmented wave (PAW) method, as implemented in Quantum ESPRESSO or Abinit code. In particular, we focus on developing an automated and consistent way of generating transferable PAW data-sets that can closely produce the all electron equation of state defined from zero to arbitrary high pressures. The technique we propose is an evolutionary search procedure that exploits the ATOMPAW code to generate atomic data-sets and the Quantum ESPRESSO software suite for total energy calculations. We demonstrate different aspects of its workability by optimizing PAW basis functions of some elements relatively abundant in planetary interiors. In addition, we introduce a new measure of atomic data-set goodness by considering their performance uniformity over an extended pressure range.

  9. Use of simulated data sets to evaluate the fidelity of Metagenomicprocessing methods

    Energy Technology Data Exchange (ETDEWEB)

    Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerri; Shapiro, Harris; Goltsman, Eugene; McHardy, Alice C.; Rigoutsos, Isidore; Salamov, Asaf; Korzeniewski, Frank; Land, Miriam; Lapidus, Alla; Grigoriev, Igor; Richardson, Paul; Hugenholtz, Philip; Kyrpides, Nikos C.

    2006-12-01

    Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity--based (blast hit distribution) and two sequence composition--based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.

  10. Scalable Algorithms for Unsupervised Classification and Anomaly Detection in Large Geospatiotemporal Data Sets

    Science.gov (United States)

    Mills, R. T.; Hoffman, F. M.; Kumar, J.

    2015-12-01

    The increasing availability of high-resolution geospatiotemporal datasets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery and mining of ecological data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe some unsupervised knowledge discovery and anomaly detection approaches based on highly scalable parallel algorithms for k-means clustering and singular value decomposition, consider a few practical applications thereof to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.

  11. Exploration of data partitioning in an eight-gene data set

    DEFF Research Database (Denmark)

    Rota, Jadranka; Wahlberg, Niklas

    2012-01-01

    Molecular data sets for phylogenetic inference continue to increase in size, especially with respect to the number of genes sampled. As more and more genes are included in analyses, the importance of partitioning the data to avoid problems that can arise from underparameterization becomes more...... apparent. With an eight-gene data set from 38 metalmark moth species (12 genera represented) and three outgroups, we explored different data partitioning strategies and their influence on convergence and mixing of Markov Chains Monte Carlo in a Bayesian setting. We found that in larger data sets......, with an increase in the number of partitions that are made a priori (e.g. by gene and codon position), convergence and mixing become poor. This problem can be overcome by using a recently published algorithm in which homologous sites are grouped into blocks with similar evolutionary rates that can then be modelled...

  12. A proposal to order the neutron data set in neutron spectrometry using the RDANN methodology

    Energy Technology Data Exchange (ETDEWEB)

    Ortiz R, J.M.; Martinez B, M.R.; Vega C, H.R. [UAZ, Av. Ramon Lopez Velarde No. 801, 98000 Zacatecas (Mexico)

    2006-07-01

    A new proposal to order a neutron data set in the design process of artificial neural networks in the neutron spectrometry field is presented for first time. The robust design of artificial neural networks methodology was applied to 187 neutron spectra data set compiled by the International Atomic Energy Agency. Four cases of grouping the neutron spectra were considered and around 1000 different neural networks were designed, trained and tested with different net topologies each one. After carrying out the systematic methodology for all the cases, it was determined that the best neural network topology that produced the best reconstructed neutron spectra was case with 187 neutron spectra data set, determining that the best neural network topology is: 7 entrance neurons, 14 neurons in a hidden layer and 31 neurons in the exit layer, with a value of 0.1 in the learning rate and 0.1 in the moment. (Author)

  13. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods

    Energy Technology Data Exchange (ETDEWEB)

    Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Barry, Kerrie [U.S. Department of Energy, Joint Genome Institute; Shapiro, Harris [U.S. Department of Energy, Joint Genome Institute; Goltsman, Eugene [U.S. Department of Energy, Joint Genome Institute; McHardy, Alice C. [IBM T. J. Watson Research Center; Rigoutsos, Isidore [IBM T. J. Watson Research Center; Salamov, Asaf [U.S. Department of Energy, Joint Genome Institute; Korzeniewski, Frank [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Grigoriev, Igor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute

    2007-01-01

    Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based ( blast hit distribution) and two sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.

  14. Ubiquitous information for ubiquitous computing: expressing clinical data sets with openEHR archetypes.

    Science.gov (United States)

    Garde, Sebastian; Hovenga, Evelyn; Buck, Jasmin; Knaup, Petra

    2006-01-01

    Ubiquitous computing requires ubiquitous access to information and knowledge. With the release of openEHR Version 1.0 there is a common model available to solve some of the problems related to accessing information and knowledge by improving semantic interoperability between clinical systems. Considerable work has been undertaken by various bodies to standardise Clinical Data Sets. Notwithstanding their value, several problems remain unsolved with Clinical Data Sets without the use of a common model underpinning them. This paper outlines these problems like incompatible basic data types and overlapping and incompatible definitions of clinical content. A solution to this based on openEHR archetypes is motivated and an approach to transform existing Clinical Data Sets into archetypes is presented. To avoid significant overlaps and unnecessary effort during archetype development, archetype development needs to be coordinated nationwide and beyond and also across the various health professions in a formalized process.

  15. A description of data sets to determine the innovative diversification capacity of farm households.

    Science.gov (United States)

    Mc Fadden, Terence

    2016-09-01

    This data represents research activities carried out in Co. Offaly and Co. Mayo, Ireland, to identify farm household innovative diversification behavior and policy/institutional actor capacity roles in support. The data sets are overlain with household and agency data from the two study areas to describe levels of innovative diversification capacity by individual socio-economic farm household profile. The data sets summarize the public policy discussions on rural innovation and diversification and policy actor response requirements, and incorporate both qualitative and quantitative data set combinations. The data are used to assess policy/institutional actors' roles and farm households' capacity for innovation at the farm household/institution interface in support of sustainable rural business innovations on-farm and diversification.

  16. Existence and Blowup Results for Asymptotically Euclidean Initial Data Sets Generated by the Conformal Method

    CERN Document Server

    Dilts, James

    2016-01-01

    For each set of (freely chosen) seed data, the conformal method reduces the Einstein constraint equations to a system of elliptic equations, the conformal constraint equations. We prove an admissibility criterion, based on a (conformal) prescribed scalar curvature problem, which provides a necessary condition on the seed data for the conformal constraint equations to (possibly) admit a solution. We then consider sets of asymptotically Euclidean (AE) seed data for which solutions of the conformal constraint equations exist, and examine the blowup properties of these solutions as the seed data sets approach sets for which no solutions exist. We also prove that there are AE seed data sets which include a Yamabe nonpositive metric and lead to solutions of the conformal constraints. These data sets allow the mean curvature function to have zeroes.

  17. Data sets for snow cover monitoring and modelling from the National Snow and Ice Data Center

    Science.gov (United States)

    Holm, M.; Daniels, K.; Scott, D.; McLean, B.; Weaver, R.

    2003-04-01

    A wide range of snow cover monitoring and modelling data sets are pending or are currently available from the National Snow and Ice Data Center (NSIDC). In-situ observations support validation experiments that enhance the accuracy of remote sensing data. In addition, remote sensing data are available in near-real time, providing coarse-resolution snow monitoring capability. Time series data beginning in 1966 are valuable for modelling efforts. NSIDC holdings include SMMR and SSM/I snow cover data, MODIS snow cover extent products, in-situ and satellite data collected for NASA's recent Cold Land Processes Experiment, and soon-to-be-released ASMR-E passive microwave products. The AMSR-E and MODIS sensors are part of NASA's Earth Observing System flying on the Terra and Aqua satellites Characteristics of these NSIDC-held data sets, appropriateness of products for specific applications, and data set access and availability will be presented.

  18. On the measure of sea ice area from sea ice concentration data sets

    Science.gov (United States)

    Boccolari, Mauro; Parmiggiani, Flavio

    2015-10-01

    The measure of sea ice surface variability provides a fundamental information on the climatology of the Arctic region. Sea ice extension is conventionally measured by two parameters, i.e. Sea Ice Extent (SIE) and Sea Ice Area (SIA), both parameters being derived from Sea Ice Concentration (SIC) data sets. In this work a new parameter (CSIA) is introduced, which takes into account only the compact sea-ice, which is defined as the sea-ice having concentration at least equal the 70%. Aim of this study is to compare the performances of the two parameters, SIA and CSIA, in analyzing the trends of three monthly time-series of the whole Arctic region. The SIC data set used in this study was produced by the Institute of Environmental Physics of the University of Bremen and covers the period January 2003 - December 2014, i.e. the period in which the data set is built using the new AMSR passive microwave sensor.

  19. Benchmarking binary classification models on data sets with different degrees of imbalance

    Institute of Scientific and Technical Information of China (English)

    Ligang ZHOU; Kin Keung LAI

    2009-01-01

    In practice, there are many binary classification problems, such as credit risk assessment, medical testing for determining if a patient has a certain disease or not, etc.However, different problems have different characteristics that may lead to different difficulties of the problem. One important characteristic is the degree of imbalance of two classes in data sets. For data sets with different degrees of imbalance, fire the commonly used binary classification methods still feasible? In this study, various binary classifi-cation models, including traditional statistical methods andnewly emerged methods from artificial intelligence, such as linear regression, discriminant analysis, decision tree, neural network, support vector machines, etc., are reviewed, and their performance in terms of the measure of classification accuracy and area under Receiver Operating Characteristic (ROC) curve are tested and compared on fourteen data sets with different imbalance degrees. The results help to select the appropriate methods for problems with different degrees of imbalance.

  20. Evolutionary optimization of PAW data-sets for accurate high pressure simulations

    CERN Document Server

    Sarkar, Kanchan; Holzwarth, N A W; Wentzcovitch, Renata M

    2016-01-01

    We examine the challenge of performing accurate electronic structure calculations at high pressures by comparing the results of all-electron full potential linearized augmented-plane-wave calculations with those of the projector augmented wave (PAW) method. In particular, we focus on developing an automated and consistent way of generating transferable PAW data-sets that can closely produce the all electron equation of state defined from zero to arbitrary high pressures. The technique we propose is an evolutionary search procedure that exploits the ATOMPAW code to generate atomic data-sets and the Quantum ESPRESSO software suite for total energy calculations. We demonstrate different aspects of its workability by optimizing PAW basis functions of some elements relatively abundant in planetary interiors. In addition, we introduce a new measure of atomic data-set goodness by considering their performance uniformity over an enlarged pressure range.

  1. A Comparison of Heuristics with Modularity Maximization Objective using Biological Data Sets

    Directory of Open Access Journals (Sweden)

    Pirim Harun

    2016-01-01

    Full Text Available Finding groups of objects exhibiting similar patterns is an important data analytics task. Many disciplines have their own terminologies such as cluster, group, clique, community etc. defining the similar objects in a set. Adopting the term community, many exact and heuristic algorithms are developed to find the communities of interest in available data sets. Here, three heuristic algorithms to find communities are compared using five gene expression data sets. The heuristics have a common objective function of maximizing the modularity that is a quality measure of a partition and a reflection of objects’ relevance in communities. Partitions generated by the heuristics are compared with the real ones using the adjusted rand index, one of the most commonly used external validation measures. The paper discusses the results of the partitions on the mentioned biological data sets.

  2. Comparison of drought indicators derived from multiple data sets over Africa

    Science.gov (United States)

    Naumann, G.; Dutra, E.; Barbosa, P.; Pappenberger, F.; Wetterhall, F.; Vogt, J. V.

    2014-05-01

    Drought monitoring is a key component to mitigate impacts of droughts. Lack of reliable and up-to-date precipitation data sets is a common challenge across the globe. This study investigates different data sets and drought indicators on their capability to improve drought monitoring in Africa. The study was performed for four river basins located in different climatic regions (the Oum er-Rbia in Morocco, the Blue Nile in eastern Africa, the Upper Niger in western Africa, and the Limpopo in southeastern Africa) as well as the Greater Horn of Africa. The five precipitation data sets compared are the ECMWF ERA-Interim reanalysis, the Tropical Rainfall Measuring Mission satellite monthly rainfall product 3B-43, the Global Precipitation Climatology Centre gridded precipitation data set, the Global Precipitation Climatology Project Global Monthly Merged Precipitation Analyses, and the Climate Prediction Center Merged Analysis of Precipitation. The set of drought indicators used includes the Standardized Precipitation Index, the Standardized Precipitation-Evaporation Index, and Soil Moisture Anomalies. A comparison of the annual cycle and monthly precipitation time series shows a good agreement in the timing of the rainy seasons. The main differences between the data sets are in the ability to represent the magnitude of the wet seasons and extremes. Moreover, for the areas affected by drought, all the drought indicators agree on the time of drought onset and recovery although there is disagreement on the extent of the affected area. In regions with limited rain gauge data the estimation of the different drought indicators is characterized by a higher uncertainty. Further comparison suggests that the main source of differences in the computation of the drought indicators is the uncertainty in the precipitation data sets rather than the estimation of the distribution parameters of the drought indicators.

  3. The questioning of stationarity or not about a discharge data set on a small basin

    Science.gov (United States)

    Barbet, D.; Trevisan, D.; Sauquet, E.; Bourgeois, H.

    2009-04-01

    INRA Thonon-les-Bains has instrumented a small tributary basin of Lake Geneva over the past fifteen years, the Mercube river. The purpose of the instrumentation of this small 3 km2 basin is understanding about the transfer of agricultural pollutants. Hydrometric data collected include the water level at a 30 minutes time step , and rainfall at the station of INRA (10 km of the basin) at a one hour time step time. One problem to be solved on this set of data was to consolidate a flows data set, with a station that was moved in 2003. Uncertainty still post on the choice of height water-flows curves and the validity of registrations heights of water. The statistical tools used to validate or not the data set were on average (at monthly and daily time steps) stationarity tests and extreme values (on average two largest floods per year) a flow-duration frequency, using the method and the tools of Cemagref Lyon The results showed for the monthly and daily averages values the non stationarity of the data set before and after 2003. However, a similar study on two other near rivers (the Foron and Redon) ones show that they do not have no more a stationary on the data set over the same periods. As 2003 was a particularly dry water balance we can not at this stage to involve the change in station for non stationarity of the flow data set. For the study against flow duration frequency on the values of flood shows a sampling of stationarity and could be completed. The two approaches to the average values and extreme allowed despite the change of station to choose a good water height-discharge curves and secondly to validate the flows data sets for use later for the study transfers of pollutants. The question of a regional non-stationarity before and after 2003 remains open and deserves further investigations over many rivers in the region.

  4. Data set for diffusion coefficients of alloying elements in dilute Mg alloys from first-principles

    Directory of Open Access Journals (Sweden)

    Bi-Cheng Zhou

    2015-12-01

    Full Text Available Diffusion coefficients of alloying elements in Mg are critical for the development of new Mg alloys for lightweight applications. Here we present the data set of the temperature-dependent dilute tracer diffusion coefficients for 47 substitutional alloying elements in hexagonal closed packed (hcp Mg calculated from first-principles calculations based on density functional theory (DFT by combining transition state theory and an 8-frequency model. Benchmark for the DFT calculations and systematic comparison with experimental diffusion data are also presented. The data set refers to “Diffusion coefficients of alloying elements in dilute Mg alloys: A comprehensive first-principles study” by Zhou et al. [1].

  5. Comparison of the Hadley cells calculated from two reanalysis data sets

    Institute of Scientific and Technical Information of China (English)

    QIN Yujing; WANG Panxing; GUAN Zhaoyong; YUE Yang

    2006-01-01

    The mass stream function of mean meridional circulation is calculated from the ECMWF and NCEP/NCAR reanalysis data sets using a superposition computation scheme. The comparison of results shows that the common ascending leg of the Hadley cell calculated from the ECMWF data is strong and narrow, and averagely lies more north of the equator in comparison with its counterpart from the NCEP/NCAR data, and furthermore the Hadley cell from the ECMWF data shows an obvious double-layer structure. Therefore, there are obvious differences between Hadley cells displayed by the two objective analysis data sets.

  6. "Trends" and variations of global oceanic evaporation data sets from remote sensing

    Institute of Scientific and Technical Information of China (English)

    CHIU LongS; CHOKNGAMWONG R; XING Yukun; YANG Ruixin; SHIE Chung-Lin

    2008-01-01

    The variability in global oceanic evaporation data sets was examined for the period 1988--2000. These data sets are satellite esti-mates based on bulk aerodynamic formulations and include the NASA/Coddard Space Flight Center Satellite-based Surface Turbu-lent Flux version 2 (GSSTF2), the Japanese-ocean flux using remote sensing observations (J-OFURO), and the Hamburg Ocean-Atmosphere Parameters and Fluxes from Satellite version 2 (HOAPS2). The National Center for Environmental Prediction (NCEP) reanalysis is also included for comparison. An increase in global average surface latent heat flux (SLHF) can be ob-served in all the data sets. Empirical mode decomposition (EMD) shows long-term increases that started around 1990 for all re-mote sensing data sets. The effect of Mr. Pinatubo eruption in 1991 is clearly evident in HOAPS2 but is independent of the long-term increase. Linear regression analyses show increases of 9.4%, 13.0%, 7.3%, and 3.9% for GSSTF2,J-OFURO, HOAPS2 and NCEP, for the periods of the data sets. Empirical orthogonal function (EOF) analyses show that the pattern of the first EOF of all data sets is consistent with a decadal variation associated with the enhancement of the tropical Hadley circulation, which is supported by other satellite observations. The second EOF of all four data sets is an ENSO mode, and the correlations be-tween their time series and an SOI are 0.74, 0.71, 0.59, and 0.61 for GSSTF2, J-OFURO, HOAPS2, and NCEP in that order. When the Hadley modes are removed from the remote sensing data, the residue global increases are reduced to 2.2%, 7.3%,and < 1% for GSSTF2, J-OFURO and HOAPS, respectively. If the ENSO mode is used as a calibration standard for the data sets, the Hadley mode is at least comparable to, if not larger than, the ENSO mode during our study period.

  7. Current Climate Data Set Documentation Standards: Somewhere between Anagrams and Full Disclosure

    Science.gov (United States)

    Fleig, A. J.

    2008-12-01

    In the 17th century scientists, concerned with establishing primacy for their discoveries while maintaining control of their intellectual property, often published their results as anagrams. Robert Hooke's initial publication in 1676 of his law of elasticity in the form ceiiinossttuv which he revealed two years later as "Ut tension sic vis" or "of the extension, so the force" is one of the better known examples although Galileo, Newton, and many others used the same approach. Fortunately the idea of open publication in scientific journals subject to peer review as a cornerstone of the scientific method gradually became established and is now the norm. Unfortunately though even peer reviewed publication does not necessarily lead to full disclosure. One example of this occurs in the production, review and distribution of large scale data sets of climate variables. Validation papers describe how the data was made in concept but do not provide adequate documentation of the process. Complete provenance of the resulting data sets including description of the exact input files, processing environment, and actual processing code are not required as part of the production and archival effort. A user of the data may be assured by the publication and peer review that the data is considered to be good and usable for scientific investigation but will not know exactly how the data set was made. The problem with this lack of knowledge may be most apparent when considering questions of climate change. Future measurements of the same geophysical parameter will surely be derived from a different observational system than the one used in creating today's data sets. An obvious task in assessing change between the present and the future data set will be to determine how much of the change is because the parameter changed and how much is because the measurement system changed. This will be hard to do without complete knowledge of how the predecessor data set was made. Automated

  8. Analysis of matches and partial-matches in a Danish STR data set

    DEFF Research Database (Denmark)

    Tvedebrink, Torben; Eriksen, Poul Svante; Curan, James Michael

    2012-01-01

    . [3] to compare the observed and expected number of matches and near matches in the data set. We extended the methods by computing the covariance matrix of the summary statistic and used it for the estimation of the identical-by-descent parameter, θ. The analysis demonstrated a number of close...... relatives in the Danish data set and substructure. The main contribution to the substructure comes from close relatives. An overall θ-value of 1% compensated for the observed substructure, when close familial relationships were accounted for....

  9. Interactive exploration and modeling of large data sets: a case study with Venus light scattering data

    NARCIS (Netherlands)

    Wijk, J.J. van; Spoelder, H.J.W.; Knibbe, W.-J.J.; Shahroudi, K.E.

    1997-01-01

    We present a system where visualization and the control of the simulation are integrated to facilitate interactive exploration and modeling of large data sets. The system was developed to estimate properties of the atmosphere of Venus from comparison between measured and simulated data. Reuse of res

  10. Euclidean skeletons of 3D data sets in linear time by the integer medial axis transform

    NARCIS (Netherlands)

    Hesselink, Wim H.; Visser, Menno; Roerdink, Jos B.T.M.; Ronse, C; Najman, L; Decenciere, E

    2005-01-01

    A general algorithm for computing Euclidean skeletons of 3D data sets in linear time is presented. These skeletons are defined in terms of a new concept, called the integer medial axis (IMA) transform. The algorithm is based upon the computation of 3D feature transforms, using a modification of an a

  11. Estimating Pay Gaps for Workers with Disabilities: Implications from Broadening Definitions and Data Sets

    Science.gov (United States)

    Hallock, Kevin F.; Jin, Xin; Barrington, Linda

    2014-01-01

    Purpose: To compare pay gap estimates across 3 different national survey data sets for people with disabilities relative to those without disabilities when pay is measured as wage and salary alone versus a (total compensation) definition that includes an estimate of the value of benefits. Method: Estimates of the cost to the employers of employee…

  12. A long-term data set for hydrologic modeling in a snow-dominated mountain catchment

    Science.gov (United States)

    An hourly modeling data set is presented for the water years 1984 through 2008 for a snow-dominated headwater catchment. Meteorological forcing data and GIS watershed characteristics are described and provided. The meteorological data are measured at two sites within the catchment, and include pre...

  13. Goodness of Fit of Skills Assessment Approaches: Insights from Patterns of Real vs. Synthetic Data Sets

    Science.gov (United States)

    Beheshti, Behzad; Desmarais, Michel C.

    2015-01-01

    This study investigates the issue of the goodness of fit of different skills assessment models using both synthetic and real data. Synthetic data is generated from the different skills assessment models. The results show wide differences of performances between the skills assessment models over synthetic data sets. The set of relative performances…

  14. Mass Spectrometry Data Set for Renal Cell Carcinoma and Polycystic Kidney Disease Cell Models

    Energy Technology Data Exchange (ETDEWEB)

    Stewart, B. J. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2017-01-05

    This data set will be evaluated by collaborators at UC Davis for possible inclusion in a research paper for publication in a scientific journal and to assist in the design of additional experiments. Researchers from UC Davis and LLNL will contribute to the manuscript.

  15. Fast computation of categorical richness on raster data sets and related problems

    DEFF Research Database (Denmark)

    de Berg, Mark; Tsirogiannis, Constantinos; Wilkinson, Bryan T.

    2015-01-01

    In many scientific fields, it is common to encounter raster data sets consisting of categorical data, such as soil type or land usage of a terrain. A problem that arises in the presence of such data is the following: given a raster G of n cells storing categorical data, compute for every cell c...

  16. Rarity in large data sets: Singletons, model values and the location of the species abundance distribution

    NARCIS (Netherlands)

    Straatsma, G.; Egli, S.

    2012-01-01

    Species abundance data in 12 large data sets, holding 10 × 103 to 125 × 106 individuals in 350 to 10 × 103 samples, were studied. Samples and subsets, for instance the summarized data of samples over years, and whole sets were analysed. Two methods of the binning of data, assigning abundance values

  17. Nursing Minimum Data Set for School Nursing Practice. Position Statement. Revised

    Science.gov (United States)

    Denehy, Janice

    2012-01-01

    It is the position of the National Association of School Nurses (NASN) to support the collection of essential nursing data as listed in the Nursing Minimum Data Set (NMDS). The NMDS provides a basic structure to identify the data needed to delineate nursing care delivered to clients as well as relevant characteristics of those clients. Structure…

  18. InfVis--platform-independent visual data mining of multidimensional chemical data sets.

    Science.gov (United States)

    Oellien, Frank; Ihlenfeldt, Wolf-Dietrich; Gasteiger, Johann

    2005-01-01

    The tremendous increase of chemical data sets, both in size and number, and the simultaneous desire to speed up the drug discovery process has resulted in an increasing need for a new generation of computational tools that assist in the extraction of information from data and allow for rapid and in-depth data mining. During recent years, visual data mining has become an important tool within the life sciences and drug discovery area with the potential to help avoiding data analysis from turning into a bottleneck. In this paper, we present InfVis, a platform-independent visual data mining tool for chemists, who usually only have little experience with classical data mining tools, for the visualization, exploration, and analysis of multivariate data sets. InfVis represents multidimensional data sets by using intuitive 3D glyph information visualization techniques. Interactive and dynamic tools such as dynamic query devices allow real-time, interactive data set manipulations and support the user in the identification of relationships and patterns. InfVis has been implemented in Java and Java3D and can be run on a broad range of platforms and operating systems. It can also be embedded as an applet in Web-based interfaces. We will present in this paper examples detailing the analysis of a reaction database that demonstrate how InfVis assists chemists in identifying and extracting hidden information.

  19. The International Spinal Cord Injury Pain Extended Data Set (Version 1.0)

    DEFF Research Database (Denmark)

    Widerström-Noga, E; Biering-Sørensen, F; Bryce, T N;

    2016-01-01

    OBJECTIVES: The objective of this study was to develop the International Spinal Cord Injury Pain Extended Data Set (ISCIPEDS) with the purpose of guiding the assessment and treatment of pain after spinal cord injury (SCI). SETTING: International. METHODS: The ISCIPEDS was reviewed by members...

  20. The international geosphere biosphere programme data and information system global land cover data set (DIScover)

    Science.gov (United States)

    Loveland, T.R.; Belward, A.S.

    1997-01-01

    The International Geosphere Biosphere Programme Data and Information System (IGBP-DIS), through the mapping expertise of the U.S. Geological Survey and the European Commission's Joint Research Centre, recently guided the completion of a 1-km resolution global land cover data set from advanced very high resolution radiometer (AVHRR) data. The 1-km resolution land cover product, 'DISCover,' was based on monthly normalized difference vegetation index composites from 1992 and 1993. The development of DISCover was coordinated by the IGBP-DIS Land Cover Working Group as part of the IGBP-DIS Focus 1 activity. DISCover is a 17-class land cover data set based on the scientific requirements of IGBP elements. The mapping used unsupervised classification and postclassification refinement using ancillary data. The development of this data set was motivated by the need for global land cover data with higher spatial resolution, improved temporal specificity, and known classification accuracy. The completed DISCover data set will soon be validated to determine the accuracy of the global classification.

  1. Multiple data sets and modelling choices in a comparative LCA of disposable beverage cups

    NARCIS (Netherlands)

    Harst, van der E.J.M.; Potting, J.; Kroeze, C.

    2014-01-01

    This study used multiple data sets and modelling choices in an environmental life cycle assessment (LCA) to compare typical disposable beverage cups made from polystyrene (PS), polylactic acid (PLA; bioplastic) and paper lined with bioplastic (biopaper). Incineration and recycling were considered as

  2. Agreement evaluation of AVHRR and MODIS 16-day composite NDVI data sets

    Science.gov (United States)

    Ji, L.; Gallo, K.; Eidenshink, J.C.; Dwyer, J.

    2008-01-01

    Satellite-derived normalized difference vegetation index (NDVI) data have been used extensively to detect and monitor vegetation conditions at regional and global levels. A combination of NDVI data sets derived from AVHRR and MODIS can be used to construct a long NDVI time series that may also be extended to VIIRS. Comparative analysis of NDVI data derived from AVHRR and MODIS is critical to understanding the data continuity through the time series. In this study, the AVHRR and MODIS 16-day composite NDVI products were compared using regression and agreement analysis methods. The analysis shows a high agreement between the AVHRR-NDVI and MODIS-NDVI observed from 2002 and 2003 for the conterminous United States, but the difference between the two data sets is appreciable. Twenty per cent of the total difference between the two data sets is due to systematic difference, with the remainder due to unsystematic difference. The systematic difference can be eliminated with a linear regression-based transformation between two data sets, and the unsystematic difference can be reduced partially by applying spatial filters to the data. We conclude that the continuity of NDVI time series from AVHRR to MODIS is satisfactory, but a linear transformation between the two sets is recommended.

  3. Building the Forest Inventory and Analysis Tree-Ring Data set

    Science.gov (United States)

    Robert J. DeRose; John D. Shaw; James N. Long

    2017-01-01

    The Interior West Forest Inventory and Analysis (IW-FIA) program measures forestland conditions at great extent with relatively high spatial resolution, including the collection of tree-ring data. We describe the development of an unprecedented spatial tree-ring data set for the IW-FIA that enhances the baseline plot data by incorporating ring-width increment measured...

  4. The Evolution of School Nursing Data Indicators in Massachusetts: Recommendations for a National Data Set

    Science.gov (United States)

    Gapinski, Mary Ann; Sheetz, Anne H.

    2014-01-01

    The National Association of School Nurses' research priorities include the recommendation that data reliability, quality, and availability be addressed to advance research in child and school health. However, identifying a national school nursing data set has remained a challenge for school nurses, school nursing leaders, school nurse professional…

  5. Structural break or long memory: an empirical survey on daily rainfall data sets across Malaysia

    Science.gov (United States)

    Yusof, F.; Kane, I. L.; Yusop, Z.

    2013-04-01

    A short memory process that encounters occasional structural breaks in mean can show a slower rate of decay in the autocorrelation function and other properties of fractional integrated I (d) processes. In this paper we employed a procedure for estimating the fractional differencing parameter in semiparametric contexts proposed by Geweke and Porter-Hudak (1983) to analyse nine daily rainfall data sets across Malaysia. The results indicate that all the data sets exhibit long memory. Furthermore, an empirical fluctuation process using the ordinary least square (OLS)-based cumulative sum (CUSUM) test for the break date was applied. Break dates were detected in all data sets. The data sets were partitioned according to their respective break date, and a further test for long memory was applied for all subseries. Results show that all subseries follows the same pattern as the original series. The estimate of the fractional parameters d1 and d2 on the subseries obtained by splitting the original series at the break date confirms that there is a long memory in the data generating process (DGP). Therefore this evidence shows a true long memory not due to structural break.

  6. Categorizing Biases in High-Confidence High-Throughput Protein-Protein Interaction Data Sets

    Science.gov (United States)

    2011-01-01

    Interactions Functional Diversity in Protein Interaction Data Sets—Al- though genomic-scale protein-protein interaction detection campaigns are by design...mapped out in Fig. 2 show that the different data sets covered distinct parts of the interaction space, with some FIG. 1. Functional diversity among

  7. Structural break or long memory: an empirical survey on daily rainfall data sets across Malaysia

    Directory of Open Access Journals (Sweden)

    F. Yusof

    2013-04-01

    Full Text Available A short memory process that encounters occasional structural breaks in mean can show a slower rate of decay in the autocorrelation function and other properties of fractional integrated I (d processes. In this paper we employed a procedure for estimating the fractional differencing parameter in semiparametric contexts proposed by Geweke and Porter-Hudak (1983 to analyse nine daily rainfall data sets across Malaysia. The results indicate that all the data sets exhibit long memory. Furthermore, an empirical fluctuation process using the ordinary least square (OLS-based cumulative sum (CUSUM test for the break date was applied. Break dates were detected in all data sets. The data sets were partitioned according to their respective break date, and a further test for long memory was applied for all subseries. Results show that all subseries follows the same pattern as the original series. The estimate of the fractional parameters d1 and d2 on the subseries obtained by splitting the original series at the break date confirms that there is a long memory in the data generating process (DGP. Therefore this evidence shows a true long memory not due to structural break.

  8. International Spinal Cord Injury Core Data Set (version 2.0)-including standardization of reporting

    NARCIS (Netherlands)

    Biering-Sorensen, F.; DeVivo, M. J.; Charlifue, S.; Chen, Y.; New, P. W.; Noonan, V.; Post, M. W. M.; Vogel, L.

    2017-01-01

    Study design: The study design includes expert opinion, feedback, revisions and final consensus. Objectives: The objective of the study was to present the new knowledge obtained since the International Spinal Cord Injury (SCI) Core Data Set (Version 1.0) published in 2006, and describe the adjustmen

  9. Analysis and classification of data sets for calibration and validation of agro-ecosystem models

    DEFF Research Database (Denmark)

    Kersebaum, K C; Boote, K J; Jorgenson, J S;

    2015-01-01

    to their relevance for modelling and possible uncertainties. Examples are given for data sets of the different classes. The framework helps to assemble high quality data bases, to select data from data bases according to modellers requirements and gives guidelines to experimentalists for experimental design...

  10. Data Literacy: Real-World Learning through Problem-Solving with Data Sets

    Science.gov (United States)

    Erwin, Robin W., Jr.

    2015-01-01

    The achievement of deep learning by secondary students requires teaching approaches that draw students into task commitment, integrated curricula, and analytical thinking. By using real-world data sets in project based instructional units, teachers can guide students in analyzing, interpreting, and reporting quantitative data. Working with…

  11. The Evolution of School Nursing Data Indicators in Massachusetts: Recommendations for a National Data Set

    Science.gov (United States)

    Gapinski, Mary Ann; Sheetz, Anne H.

    2014-01-01

    The National Association of School Nurses' research priorities include the recommendation that data reliability, quality, and availability be addressed to advance research in child and school health. However, identifying a national school nursing data set has remained a challenge for school nurses, school nursing leaders, school nurse professional…

  12. The Challenge of Handling Big Data Sets in the Sensor Web

    Science.gov (United States)

    Autermann, Christian; Stasch, Christoph; Jirka, Simon

    2016-04-01

    More and more Sensor Web components are deployed in different domains such as hydrology, oceanography or air quality in order to make observation data accessible via the Web. However, besides variability of data formats and protocols in environmental applications, the fast growing volume of data with high temporal and spatial resolution is imposing new challenges for Sensor Web technologies when sharing observation data and metadata about sensors. Variability, volume and velocity are the core issues that are addressed by Big Data concepts and technologies. Most solutions in the geospatial sector focus on remote sensing and raster data, whereas big in-situ observation data sets relying on vector features require novel approaches. Hence, in order to deal with big data sets in infrastructures for observational data, the following questions need to be answered: 1. How can big heterogeneous spatio-temporal datasets be organized, managed, and provided to Sensor Web applications? 2. How can views on big data sets and derived information products be made accessible in the Sensor Web? 3. How can big observation data sets be processed efficiently? We illustrate these challenges with examples from the marine domain and outline how we address these challenges. We therefore show how big data approaches from mainstream IT can be re-used and applied to Sensor Web application scenarios.

  13. Management of a Large Qualitative Data Set: Establishing Trustworthiness of the Data

    Directory of Open Access Journals (Sweden)

    Debbie Elizabeth White RN, PhD

    2012-07-01

    Full Text Available Health services research is multifaceted and impacted by the multiple contexts and stakeholders involved. Hence, large data sets are necessary to fully understand the complex phenomena (e.g., scope of nursing practice being studied. The management of these large data sets can lead to numerous challenges in establishing trustworthiness of the study. This article reports on strategies utilized in data collection and analysis of a large qualitative study to establish trustworthiness. Specific strategies undertaken by the research team included training of interviewers and coders, variation in participant recruitment, consistency in data collection, completion of data cleaning, development of a conceptual framework for analysis, consistency in coding through regular communication and meetings between coders and key research team members, use of N6™ software to organize data, and creation of a comprehensive audit trail with internal and external audits. Finally, we make eight recommendations that will help ensure rigour for studies with large qualitative data sets: organization of the study by a single person; thorough documentation of the data collection and analysis process; attention to timelines; the use of an iterative process for data collection and analysis; internal and external audits; regular communication among the research team; adequate resources for timely completion; and time for reflection and diversion. Following these steps will enable researchers to complete a rigorous, qualitative research study when faced with large data sets to answer complex health services research questions.

  14. A Decomposition Model for HPLC-DAD Data Set and Its Solution by Particle Swarm Optimization

    Directory of Open Access Journals (Sweden)

    Lizhi Cui

    2014-01-01

    Full Text Available This paper proposes a separation method, based on the model of Generalized Reference Curve Measurement and the algorithm of Particle Swarm Optimization (GRCM-PSO, for the High Performance Liquid Chromatography with Diode Array Detection (HPLC-DAD data set. Firstly, initial parameters are generated to construct reference curves for the chromatogram peaks of the compounds based on its physical principle. Then, a General Reference Curve Measurement (GRCM model is designed to transform these parameters to scalar values, which indicate the fitness for all parameters. Thirdly, rough solutions are found by searching individual target for every parameter, and reinitialization only around these rough solutions is executed. Then, the Particle Swarm Optimization (PSO algorithm is adopted to obtain the optimal parameters by minimizing the fitness of these new parameters given by the GRCM model. Finally, spectra for the compounds are estimated based on the optimal parameters and the HPLC-DAD data set. Through simulations and experiments, following conclusions are drawn: (1 the GRCM-PSO method can separate the chromatogram peaks and spectra from the HPLC-DAD data set without knowing the number of the compounds in advance even when severe overlap and white noise exist; (2 the GRCM-PSO method is able to handle the real HPLC-DAD data set.

  15. INTEGRATION OF THE OLD AND NEW LAKE SUIGETSU (JAPAN) TERRESTRIAL RADIOCARBON CALIBRATION DATA SETS

    NARCIS (Netherlands)

    Staff, Richard A.; Schlolaut, Gordon; Ramsey, Christopher Bronk; Brock, Fiona; Bryant, Charlotte L.; Kitagawa, Hiroyuki; van der Plicht, Johannes; Marshall, Michael H.; Brauer, Achim; Lamb, Henry F.; Payne, Rebecca L.; Tarasov, Pavel E.; Haraguchi, Tsuyoshi; Gotanda, Katsuya; Yonenobu, Hitoshi; Yokoyama, Yusuke; Nakagawa, Takeshi; Reimer, Paula J.

    2013-01-01

    The varved sediment profile of Lake Suigetsu, central Japan, offers an ideal opportunity from which to derive a terrestrial record of atmospheric radiocarbon across the entire range of the C-14 dating method. Previous work by Kitagawa and van der Plicht (1998a,b, 2000) provided such a data set; howe

  16. Development of the Nursing Minimum Data Set for the Netherlands (NMDSN) : identification of categories and items

    NARCIS (Netherlands)

    Goossen, WTF; Epping, PJMM; Van den Heuvel, WJA; Feuth, T; Frederiks, CMA; Hasman, A

    Rationale Currently, there is no systematic collection of nursing care data in the Netherlands, while pressure is growing from the profession, policy-makers and society to justify the contribution of nursing and its costs. A nursing minimum data set can provide data to demonstrate nursing's

  17. Rarity in large data sets: Singletons, model values and the location of the species abundance distribution

    NARCIS (Netherlands)

    Straatsma, G.; Egli, S.

    2012-01-01

    Species abundance data in 12 large data sets, holding 10 × 103 to 125 × 106 individuals in 350 to 10 × 103 samples, were studied. Samples and subsets, for instance the summarized data of samples over years, and whole sets were analysed. Two methods of the binning of data, assigning abundance values

  18. Hyperview-fast, economical access to large data sets: a system for the archiving and distribution of hyperspectral data sets and derived products

    Science.gov (United States)

    Lurie, Joan B.

    1996-12-01

    TRW, under a Small Satellite Technology Initiative (SSTI) contract, is building the Lewis satellite. The principal sensor on Lewis is a hyperspectral imaging spectrometer. Part of the SSTI mission is to establish the commercial and educational utility of this data and the hyperspectral data already being acquired on airborne platforms. Essential requirements are rapid availability (after data acquisition) and easy accessibility to a catalog of images and imagery products. Each image is approximately 256 by 512 pixels with 384 bands of data acquired at each pixel. For some applications, some users will want the entire data sets; in other cases partial data sets (e.g. three band images) will be all that a user can handle or need for a given application. In order to make the most effective use of this new imagery and justify the cost of collecting it, we must find ways to make the information it contains more readily accessible to an ever broadening community of potential users. Tools are needed to store, access, and communicate the data more efficiently, to place it in context, and to derive both qualitative and quantitative information from it. A variety of information products which address the specific needs of particular user communities will be derived from the imagery. The data is unique in its ability to provide high spatial and spectral resolution simultaneously, and shows great promise in both military and civilian applications. A data management and analysis system has been built at TRW. This development has been prompted by the business opportunities, by the series of instruments built here and by the availability of data from other instruments. The products of the processing system have been shown to prospective customers in the U.S. and abroad. The system has been used to process data produced by TRW sensors and other instruments. This paper provides an overview of the TRW hyperspectral collection, data handling and exploitation capability.

  19. Comparison of climate data sets for the analysis of biological time series

    Science.gov (United States)

    Conversi, A.; Magaldi, M.

    2003-04-01

    In the last decade, the link between plankton and climate variability has been recognized through several studies in the Atlantic and Pacific Oceans. In the Mediterranean Sea such studies have begun more recently. An important question is which climate data sets and variables should be utilized for this analysis. In the case of the Mediterranean Sea, although connections to the North Atlantic Oscillation and to the monsoon regimes have been found, no specific Mediterranean dominant climate modes have been yet identified, thus several climatic variables can be used as proxies for these studies. In this work, within the program SINAPSI (Seasonal, INterannual and decAdal variability of the atmosPhere, oceanS and related marIne ecosystems), we have approached the problem of the choice of data sets suitable for the analysis of the climate-plankton relationship, reviewing the available climatic data. We have compared the three most complete climate data sets in the Mediterranean: ECMWF, NCEP (data-assimilation data sets), and COADS (observed data, using the Gridded Data 1°x 1°). We have selected variables (SST, sea level pressure, wind stress, cloud cover) which are: a) either proxies of circulation changes or possibly related to changes in plankton productivity, and b) common to at least two of the three data sets. We have then compared these variables utilizing different scales: basin, regional and local. The local and regional areas have been chosen around the location of long term (greater than 10 years) planktonic time series in the Italian seas: off Naples (Thyrrenian Sea), Trieste, and Senigallia (Adriatic Sea). This preliminary analysis suggests that the choice of data sets for the study of biological variability cannot be univocal at this time, but must be made taking into account different issues, such as the temporal coverage of the different data sets, the temporal frequency of the data, the spatial coverage of the data, the quality of prediction of the

  20. History and evaluation of national-scale geochemical data sets for the United States

    Science.gov (United States)

    Smith, David B.; Smith, Steven M.; Horton, John D.

    2013-01-01

    Six national-scale, or near national-scale, geochemical data sets for soils or stream sediments exist for the United States. The earliest of these, here termed the ‘Shacklette’ data set, was generated by a U.S. Geological Survey (USGS) project conducted from 1961 to 1975. This project used soil collected from a depth of about 20 cm as the sampling medium at 1323 sites throughout the conterminous U.S. The National Uranium Resource Evaluation Hydrogeochemical and Stream Sediment Reconnaissance (NURE-HSSR) Program of the U.S. Department of Energy was conducted from 1975 to 1984 and collected either stream sediments, lake sediments, or soils at more than 378,000 sites in both the conterminous U.S. and Alaska. The sampled area represented about 65% of the nation. The Natural Resources Conservation Service (NRCS), from 1978 to 1982, collected samples from multiple soil horizons at sites within the major crop-growing regions of the conterminous U.S. This data set contains analyses of more than 3000 samples. The National Geochemical Survey, a USGS project conducted from 1997 to 2009, used a subset of the NURE-HSSR archival samples as its starting point and then collected primarily stream sediments, with occasional soils, in the parts of the U.S. not covered by the NURE-HSSR Program. This data set contains chemical analyses for more than 70,000 samples. The USGS, in collaboration with the Mexican Geological Survey and the Geological Survey of Canada, initiated soil sampling for the North American Soil Geochemical Landscapes Project in 2007. Sampling of three horizons or depths at more than 4800 sites in the U.S. was completed in 2010, and chemical analyses are currently ongoing. The NRCS initiated a project in the 1990s to analyze the various soil horizons from selected pedons throughout the U.S. This data set currently contains data from more than 1400 sites. This paper (1) discusses each data set in terms of its purpose, sample collection protocols, and analytical

  1. History and evaluation of national-scale geochemical data sets for the United States

    Directory of Open Access Journals (Sweden)

    David B. Smith

    2013-03-01

    Full Text Available Six national-scale, or near national-scale, geochemical data sets for soils or stream sediments exist for the United States. The earliest of these, here termed the ‘Shacklette’ data set, was generated by a U.S. Geological Survey (USGS project conducted from 1961 to 1975. This project used soil collected from a depth of about 20 cm as the sampling medium at 1323 sites throughout the conterminous U.S. The National Uranium Resource Evaluation Hydrogeochemical and Stream Sediment Reconnaissance (NURE-HSSR Program of the U.S. Department of Energy was conducted from 1975 to 1984 and collected either stream sediments, lake sediments, or soils at more than 378,000 sites in both the conterminous U.S. and Alaska. The sampled area represented about 65% of the nation. The Natural Resources Conservation Service (NRCS, from 1978 to 1982, collected samples from multiple soil horizons at sites within the major crop-growing regions of the conterminous U.S. This data set contains analyses of more than 3000 samples. The National Geochemical Survey, a USGS project conducted from 1997 to 2009, used a subset of the NURE-HSSR archival samples as its starting point and then collected primarily stream sediments, with occasional soils, in the parts of the U.S. not covered by the NURE-HSSR Program. This data set contains chemical analyses for more than 70,000 samples. The USGS, in collaboration with the Mexican Geological Survey and the Geological Survey of Canada, initiated soil sampling for the North American Soil Geochemical Landscapes Project in 2007. Sampling of three horizons or depths at more than 4800 sites in the U.S. was completed in 2010, and chemical analyses are currently ongoing. The NRCS initiated a project in the 1990s to analyze the various soil horizons from selected pedons throughout the U.S. This data set currently contains data from more than 1400 sites. This paper (1 discusses each data set in terms of its purpose, sample collection protocols

  2. Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets

    Science.gov (United States)

    Shuryak, Igor

    2017-01-01

    The ecological effects of accidental or malicious radioactive contamination are insufficiently understood because of the hazards and difficulties associated with conducting studies in radioactively-polluted areas. Data sets from severely contaminated locations can therefore be small. Moreover, many potentially important factors, such as soil concentrations of toxic chemicals, pH, and temperature, can be correlated with radiation levels and with each other. In such situations, commonly-used statistical techniques like generalized linear models (GLMs) may not be able to provide useful information about how radiation and/or these other variables affect the outcome (e.g. abundance of the studied organisms). Ensemble machine learning methods such as random forests offer powerful alternatives. We propose that analysis of small radioecological data sets by GLMs and/or machine learning can be made more informative by using the following techniques: (1) adding synthetic noise variables to provide benchmarks for distinguishing the performances of valuable predictors from irrelevant ones; (2) adding noise directly to the predictors and/or to the outcome to test the robustness of analysis results against random data fluctuations; (3) adding artificial effects to selected predictors to test the sensitivity of the analysis methods in detecting predictor effects; (4) running a selected machine learning method multiple times (with different random-number seeds) to test the robustness of the detected “signal”; (5) using several machine learning methods to test the “signal’s” sensitivity to differences in analysis techniques. Here, we applied these approaches to simulated data, and to two published examples of small radioecological data sets: (I) counts of fungal taxa in samples of soil contaminated by the Chernobyl nuclear power plan accident (Ukraine), and (II) bacterial abundance in soil samples under a ruptured nuclear waste storage tank (USA). We show that the

  3. Multisource data set integration and characterization of uranium mineralization for the Montrose Quadrangle, Colorado

    Energy Technology Data Exchange (ETDEWEB)

    Bolivar, S.L.; Balog, S.H.; Campbell, K.; Fugelso, L.E.; Weaver, T.A.; Wecksung, G.W.

    1981-04-01

    Several data-classification schemes were developed by the Los Alamos National Laboratory to detect potential uranium mineralization in the Montrose 1/sup 0/ x 2/sup 0/ quadrangle, Colorado. A first step was to develop and refine the techniques necessary to digitize, integrate, and register various large geological, geochemical, and geophysical data sets, including Landsat 2 imagery, for the Montrose quadrangle, Colorado, using a grid resolution of 1 km. All data sets for the Montrose quadrangle were registered to the Universal Transverse Mercator projection. The data sets include hydrogeochemical and stream sediment analyses for 23 elements, uranium-to-thorium ratios, airborne geophysical survey data, the locations of 90 uranium occurrences, a geologic map and Landsat 2 (bands 4 through 7) imagery. Geochemical samples were collected from 3965 locations in the 19 200 km/sup 2/ quadrangle; aerial data were collected on flight lines flown with 3 to 5 km spacings. These data sets were smoothed by universal kriging and interpolated to a 179 x 119 rectangular grid. A mylar transparency of the geologic map was prepared and digitized. Locations for the known uranium occurrences were also digitized. The Landsat 2 imagery was digitally manipulated and rubber-sheet transformed to quadrangle boundaries and bands 4 through 7 were resampled to both a 1-km and 100-m resolution. All possible combinations of three, for all data sets, were examined for general geologic correlations by utilizing a color microfilm output. Subsets of data were further examined for selected test areas. Two classification schemes for uranium mineralization, based on selected test areas in both the Cochetopa and Marshall Pass uranium districts, are presented. Areas favorable for uranium mineralization, based on these schemes, were identified and are discussed.

  4. A consistent data set of Antarctic ice sheet topography, cavity geometry, and global bathymetry

    Directory of Open Access Journals (Sweden)

    R. Timmermann

    2010-12-01

    Full Text Available Sub-ice shelf circulation and freezing/melting rates in ocean general circulation models depend critically on an accurate and consistent representation of cavity geometry. Existing global or pan-Antarctic topography data sets have turned out to contain various inconsistencies and inaccuracies. The goal of this work is to compile independent regional surveys and maps into a global data set. We use the S-2004 global 1-min bathymetry as the backbone and add an improved version of the BEDMAP topography (ALBMAP bedrock topography for an area that roughly coincides with the Antarctic continental shelf. The position of the merging line is individually chosen in different sectors in order to capture the best of both data sets. High-resolution gridded data for ice shelf topography and cavity geometry of the Amery, Fimbul, Filchner-Ronne, Larsen C and George VI Ice Shelves, and for Pine Island Glacier are carefully merged into the ambient ice and ocean topographies. Multibeam survey data for bathymetry in the former Larsen B cavity and the southeastern Bellingshausen Sea have been obtained from the data centers of Alfred Wegener Institute (AWI, British Antarctic Survey (BAS and Lamont-Doherty Earth Observatory (LDEO, gridded, and blended into the existing bathymetry map. The resulting global 1-min Refined Topography data set (RTopo-1 contains self-consistent maps for upper and lower ice surface heights, bedrock topography, and surface type (open ocean, grounded ice, floating ice, bare land surface. The data set is available in NetCDF format from the PANGAEA database at doi:10.1594/pangaea.741917.

  5. Comparison of ligand- and structure-based virtual screening on the DUD data set.

    Science.gov (United States)

    von Korff, Modest; Freyss, Joel; Sander, Thomas

    2009-02-01

    Several in-house developed descriptors and our in-house docking tool ActDock were compared with virtual screening on the data set of useful decoys (DUD). The results were compared with the chemical fingerprint descriptor from ChemAxon and with the docking results of the original DUD publication. The DUD is the first published data set providing active molecules, decoys, and references for crystal structures of ligand-target complexes. The DUD was designed for the purpose of evaluating docking programs. It contains 2950 active compounds against a total of 40 target proteins. Furthermore, for every ligand the data set contains 36 structurally dissimilar decoy compounds with similar physicochemical properties. We extracted the ligands from the target proteins to extend the applicability of the data set to include ligand based virtual screening. From the 40 target proteins, 37 contained ligands that we used as query molecules for virtual screening evaluation. With this data set a large comparison was done between four different chemical fingerprints, a topological pharmacophore descriptor, the Flexophore descriptor, and ActDock. The Actelion docking tool relies on a MM2 forcefield and a pharmacophore point interaction statistic for scoring; the details are described in this publication. In terms of enrichment rates the chemical fingerprint descriptors performed better than the Flexophore and the docking tool. After removing molecules chemically similar to the query molecules the Flexophore descriptor outperformed the chemical descriptors and the topological pharmacophore descriptors. With the similarity matrix calculations used in this study it was shown that the Flexophore is well suited to find new chemical entities via "scaffold hopping". The Flexophore descriptor can be explored with a Java applet at http://www.cheminformatics.ch in the submenu Tools-->Flexophore. Its usage is free of charge and does not require registration.

  6. Reliable pre-eclampsia pathways based on multiple independent microarray data sets.

    Science.gov (United States)

    Kawasaki, Kaoru; Kondoh, Eiji; Chigusa, Yoshitsugu; Ujita, Mari; Murakami, Ryusuke; Mogami, Haruta; Brown, J B; Okuno, Yasushi; Konishi, Ikuo

    2015-02-01

    Pre-eclampsia is a multifactorial disorder characterized by heterogeneous clinical manifestations. Gene expression profiling of preeclamptic placenta have provided different and even opposite results, partly due to data compromised by various experimental artefacts. Here we aimed to identify reliable pre-eclampsia-specific pathways using multiple independent microarray data sets. Gene expression data of control and preeclamptic placentas were obtained from Gene Expression Omnibus. Single-sample gene-set enrichment analysis was performed to generate gene-set activation scores of 9707 pathways obtained from the Molecular Signatures Database. Candidate pathways were identified by t-test-based screening using data sets, GSE10588, GSE14722 and GSE25906. Additionally, recursive feature elimination was applied to arrive at a further reduced set of pathways. To assess the validity of the pre-eclampsia pathways, a statistically-validated protocol was executed using five data sets including two independent other validation data sets, GSE30186, GSE44711. Quantitative real-time PCR was performed for genes in a panel of potential pre-eclampsia pathways using placentas of 20 women with normal or severe preeclamptic singleton pregnancies (n = 10, respectively). A panel of ten pathways were found to discriminate women with pre-eclampsia from controls with high accuracy. Among these were pathways not previously associated with pre-eclampsia, such as the GABA receptor pathway, as well as pathways that have already been linked to pre-eclampsia, such as the glutathione and CDKN1C pathways. mRNA expression of GABRA3 (GABA receptor pathway), GCLC and GCLM (glutathione metabolic pathway), and CDKN1C was significantly reduced in the preeclamptic placentas. In conclusion, ten accurate and reliable pre-eclampsia pathways were identified based on multiple independent microarray data sets. A pathway-based classification may be a worthwhile approach to elucidate the pathogenesis of pre-eclampsia.

  7. Bioactivities of chicken essence.

    Science.gov (United States)

    Li, Y F; He, R R; Tsoi, B; Kurihara, H

    2012-04-01

    The special flavor and health effects of chicken essence are being widely accepted by people. Scientific researches are revealing its truth as a tonic food in traditional health preservation. Chicken essence has been found to possess many bioactivities including relief of stress and fatigue, amelioration of anxiety, promotion of metabolisms and post-partum lactation, improvement on hyperglycemia and hypertension, enhancement of immune, and so on. These activities of chicken essence are suggested to be related with its active components, including proteins, dipeptides (such as carnosine and anserine), polypeptides, minerals, trace elements, and multiple amino acids, and so on. Underlying mechanisms responsible for the bioactivities of chicken essence are mainly related with anti-stress, anti-oxidant, and neural regulation effects. However, the mechanisms are complicated and may be mediated via the combined actions of many active components, more than the action of 1 or 2 components alone. © 2012 Institute of Food Technologists®

  8. Eggcited about Chickens

    Science.gov (United States)

    Jones, Carolyn; Brown, Paul

    2012-01-01

    In this article, the authors describe St Peter's Primary School's and Honiton Primary School's experiences of keeping chickens. The authors also describe the benefits they bring and the reactions of the children. (Contains 5 figures.)

  9. The Chicken Problem.

    Science.gov (United States)

    Reeves, Charles A.

    2000-01-01

    Uses the chicken problem for sixth grade students to scratch the surface of systems of equations using intuitive approaches. Provides students responses to the problem and suggests similar problems for extensions. (ASK)

  10. Empirical Evaluation of genetic clustering methods using multilocus genotypes from 20 chicken breeds

    NARCIS (Netherlands)

    Rosenberg, N.A.; Burke, T.; Elo, K.; Feldman, M.W.; Friedlin, P.J.; Groenen, M.A.M.; Hillel, J.; Maki-Tanila, A.; Tixier-Boichard, M.; Vignal, A.; Wimmers, K.

    2001-01-01

    We tested the utility of genetic cluster analysis in ascertaining population structure of a large data set for which population structure was previously known. Each of 600 individuals representing 20 distinct chicken breeds was genotyped for 27 microsatellite loci, and individual multilocus

  11. Central European high-resolution gridded daily data sets (HYRAS: Mean temperature and relative humidity

    Directory of Open Access Journals (Sweden)

    Claudia Frick

    Full Text Available High-resolution (5×5km2$5\\times5\\,\\text{km}^2$ gridded daily data sets of surface air temperature (DWD/BfG-HYRAS-TAS and relative humidity (DWD/BfG-HYRAS-HURS are presented in this study. The data sets cover Germany and the bordering river catchments and last from 1951 to 2006. Their data bases consist of daily station observations from Austria, Belgium, Czech Republic, France, Germany, Luxembourg, the Netherlands and Switzerland. The interpolation of the measurement data to the regular grid is performed using a method based upon Optimal Interpolation. A first climatological analysis for Germany and Central European river catchments of first and second order is performed. For the Rhine river catchment a summer mean temperature of 16.1 °C and relative humidity of 74 % are found. In contrast, the mean temperature of heat summer 2003 amounts to 19.9 °C with a related relative humidity of 65 % in this river catchment. The extreme character of this summer is also remarkable in the presented climate indices, e.g., the increased amount of summer hot days. The first validations of both data sets reveal a bias within the range of the provided data precisions. In addition, an elevation dependency of error scores is identified for temperature. Error scores increase with an increasing station height because height differences between station and grid cell increases with height. A comparison of HYRAS-TAS to another gridded temperature data set reveals a good agreement with again fewer differences at lower altitudes. The presented DWD/BfG-HYRAS data sets have a high spatial and temporal resolution which is unique for Germany and the bordering river catchments so far. They have a high potential for detailed studies of smaller scale structures in Central Europe and are already used as input for hydrological impact modelling, as climatological reference and for bias correction of regional climate models within the German research project KLIWAS

  12. CLaSPS: A NEW METHODOLOGY FOR KNOWLEDGE EXTRACTION FROM COMPLEX ASTRONOMICAL DATA SETS

    Energy Technology Data Exchange (ETDEWEB)

    D' Abrusco, R.; Fabbiano, G.; Laurino, O. [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States); Djorgovski, G.; Donalek, C.; Longo, G. [Department of Astronomy, California Institute of Technology, MC 249-17 1200 East California Blvd, Pasadena, CA 91125 (United States)

    2012-08-20

    In this paper, we present the Clustering-Labels-Score Patterns Spotter (CLaSPS), a new methodology for the determination of correlations among astronomical observables in complex data sets, based on the application of distinct unsupervised clustering techniques. The novelty in CLaSPS is the criterion used for the selection of the optimal clusterings, based on a quantitative measure of the degree of correlation between the cluster memberships and the distribution of a set of observables, the labels, not employed for the clustering. CLaSPS has been primarily developed as a tool to tackle the challenging complexity of the multi-wavelength complex and massive astronomical data sets produced by the federation of the data from modern automated astronomical facilities. In this paper, we discuss the applications of CLaSPS to two simple astronomical data sets, both composed of extragalactic sources with photometric observations at different wavelengths from large area surveys. The first data set, CSC+, is composed of optical quasars spectroscopically selected in the Sloan Digital Sky Survey data, observed in the x-rays by Chandra and with multi-wavelength observations in the near-infrared, optical, and ultraviolet spectral intervals. One of the results of the application of CLaSPS to the CSC+ is the re-identification of a well-known correlation between the {alpha}{sub OX} parameter and the near-ultraviolet color, in a subset of CSC+ sources with relatively small values of the near-ultraviolet colors. The other data set consists of a sample of blazars for which photometric observations in the optical, mid-, and near-infrared are available, complemented for a subset of the sources, by Fermi {gamma}-ray data. The main results of the application of CLaSPS to such data sets have been the discovery of a strong correlation between the multi-wavelength color distribution of blazars and their optical spectral classification in BL Lac objects and flat-spectrum radio quasars, and a

  13. Pathogenicity of Shigella in chickens.

    Science.gov (United States)

    Shi, Run; Yang, Xia; Chen, Lu; Chang, Hong-tao; Liu, Hong-ying; Zhao, Jun; Wang, Xin-wei; Wang, Chuan-qing

    2014-01-01

    Shigellosis in chickens was first reported in 2004. This study aimed to determine the pathogenicity of Shigella in chickens and the possibility of cross-infection between humans and chickens. The pathogenicity of Shigella in chickens was examined via infection of three-day-old SPF chickens with Shigella strain ZD02 isolated from a human patient. The virulence and invasiveness were examined by infection of the chicken intestines and primary chicken intestinal epithelial cells. The results showed Shigella can cause death via intraperitoneal injection in SPF chickens, but only induce depression via crop injection. Immunohistochemistry and transmission electron microscopy revealed the Shigella can invade the intestinal epithelia. Immunohistochemistry of the primary chicken intestinal epithelial cells infected with Shigella showed the bacteria were internalized into the epithelial cells. Electron microscopy also confirmed that Shigella invaded primary chicken intestinal epithelia and was encapsulated by phagosome-like membranes. Our data demonstrate that Shigella can invade primary chicken intestinal epithelial cells in vitro and chicken intestinal mucosa in vivo, resulting in pathogenicity and even death. The findings suggest Shigella isolated from human or chicken share similar pathogenicity as well as the possibility of human-poultry cross-infection, which is of public health significance.

  14. Enhancement in resistivity resolution based on the data sets amalgamation technique at Bukit Bunuh, Perak, Malaysia

    Science.gov (United States)

    Anderson Bery, Andy; Saad, Rosli; Hidayah, I. N. E.; Azwin, I. N.; Saidin, Mokhtar

    2015-01-01

    In this paper, we have carried out a study with the main objective to enhance the resolution of the electrical resistivity inversion model by introducing the data sets amalgamation technique to be used in the data processing stage. Based on the model resistivity with topography results, the data sets amalgamation technique for pole-dipole and wenner- schlumberger arrays are successful in identifying the boundary or interface of the overburden and weathered granite. Although the electrical resistivity method is well known, the proper selection of an array and appropriate inversion parameters setting such as damping factors are important in order to achieve the study objective and to image the target at the Earth's subsurface characterizations.

  15. Robust autonomous model learning from 2D and 3D data sets.

    Science.gov (United States)

    Langs, Georg; Donner, René; Peloschek, Philipp; Bischof, Horst

    2007-01-01

    In this paper we propose a weakly supervised learning algorithm for appearance models based on the minimum description length (MDL) principle. From a set of training images or volumes depicting examples of an anatomical structure, correspondences for a set of landmarks are established by group-wise registration. The approach does not require any annotation. In contrast to existing methods no assumptions about the topology of the data are made, and the topology can change throughout the data set. Instead of a continuous representation of the volumes or images, only sparse finite sets of interest points are used to represent the examples during optimization. This enables the algorithm to efficiently use distinctive points, and to handle texture variations robustly. In contrast to standard elasticity based deformation constraints the MDL criterion accounts for systematic deformations typical for training sets stemming from medical image data. Experimental results are reported for five different 2D and 3D data sets.

  16. Global segmentation and curvature analysis of volumetric data sets using trivariate B-spline functions.

    Science.gov (United States)

    Soldea, Octavian; Elber, Gershon; Rivlin, Ehud

    2006-02-01

    This paper presents a method to globally segment volumetric images into regions that contain convex or concave (elliptic) iso-surfaces, planar or cylindrical (parabolic) iso-surfaces, and volumetric regions with saddle-like (hyperbolic) iso-surfaces, regardless of the value of the iso-surface level. The proposed scheme relies on a novel approach to globally compute, bound, and analyze the Gaussian and mean curvatures of an entire volumetric data set, using a trivariate B-spline volumetric representation. This scheme derives a new differential scalar field for a given volumetric scalar field, which could easily be adapted to other differential properties. Moreover, this scheme can set the basis for more precise and accurate segmentation of data sets targeting the identification of primitive parts. Since the proposed scheme employs piecewise continuous functions, it is precise and insensitive to aliasing.

  17. Representation and display of vector field topology in fluid flow data sets

    Science.gov (United States)

    Helman, James; Hesselink, Lambertus

    1989-01-01

    The visualization of physical processes in general and of vector fields in particular is discussed. An approach to visualizing flow topology that is based on the physics and mathematics underlying the physical phenomenon is presented. It involves determining critical points in the flow where the velocity vector vanishes. The critical points, connected by principal lines or planes, determine the topology of the flow. The complexity of the data is reduced without sacrificing the quantitative nature of the data set. By reducing the original vector field to a set of critical points and their connections, a representation of the topology of a two-dimensional vector field that is much smaller than the original data set but retains with full precision the information pertinent to the flow topology is obtained. This representation can be displayed as a set of points and tangent curves or as a graph. Analysis (including algorithms), display, interaction, and implementation aspects are discussed.

  18. New and Improved GLDAS Data Sets and Data Services at NASA GES DISC

    Science.gov (United States)

    Rui, Hualan; Beaudoing, Hiroko; Teng, William; Vollmer, Bruce; Rodell, Matthew; Lei, Guang-Dih

    2012-01-01

    The goal of a Land Data Assimilation System (LDAS) is to ingest satellite- and ground-based observational data products, using advanced land surface modeling and data assimilation techniques, in order to generate optimal fields of land surface states and fluxes data and, thereby, facilitate hydrology and climate modeling, research, and forecast. With the motivation of creating more climatologically consistent data sets, NASA GSFC's Hydrological Sciences Laboratory has generated more than 60 years (Jan. 1948-- Dec. 2008) of Global LDAS Version 2 (GLDAS-2) data, by using the Princeton Forcing Data Set and upgraded versions of Land Surface Models (LSMs). GLDAS data and data services are provided at NASA GES DISC Hydrology Data and Information Services Center (HDISC), in collaboration with HSL and LDAS.

  19. Relative incapacitation contributions of pressure wave and wound channel in the Marshall and Sanow data set

    CERN Document Server

    Courtney, M; Courtney, Amy; Courtney, Michael

    2007-01-01

    The Marshall and Sanow data set is the largest and most comprehensive data set available quantifying handgun bullet effectiveness in humans. This article presents an empirical model for relative incapacitation probability in humans hit in the thoracic cavity by handgun bullets. The model is constructed by employing the hypothesis that the wound channel and ballistic pressure wave effects each have an associated independent probability of incapacitation. Combining models for these two independent probabilities using the elementary rules of probability and performing a least-squares fit to the Marshall and Sanow data provides an empirical model with only two adjustable parameters for modeling bullet effectiveness with a standard error of 5.6% and a correlation coefficient R = 0.939. This supports the hypothesis that wound channel and pressure wave effects are independent (within the experimental error), and it also allows assignment of the relative contribution of each effect for a given handgun load. This mode...

  20. Benchmarking methods and data sets for ligand enrichment assessment in virtual screening.

    Science.gov (United States)

    Xia, Jie; Tilahun, Ermias Lemma; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

    2015-01-01

    Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. "analogue bias", "artificial enrichment" and "false negative". In addition, we introduce our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylases (HDACs) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The leave-one-out cross-validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased as measured by property matching, ROC curves and AUCs.

  1. Conclusions from a NAIVE Bayes Operator Predicting the Medicare 2011 Transaction Data Set

    CERN Document Server

    Williams, Nick

    2014-01-01

    Introduction: The United States Federal Government operates one of the worlds largest medical insurance programs, Medicare, to ensure payment for clinical services for the elderly, illegal aliens and those without the ability to pay for their care directly. This paper evaluates the Medicare 2011 Transaction Data Set which details the transfer of funds from Medicare to private and public clinical care facilities for specific clinical services for the operational year 2011. Methods: Data mining was conducted to establish the relationships between reported and computed transaction values in the data set to better understand the drivers of Medicare transactions at a programmatic level. Results: The models averaged 88 for average model accuracy and 38 for average Kappa during training. Some reported classes are highly independent from the available data as their predictability remains stable regardless of redaction of supporting and contradictory evidence. DRG or procedure type appears to be unpredictable from the...

  2. Acceso a Datos con DataSets en Visual Web Developer 2008/2010.

    Directory of Open Access Journals (Sweden)

    Jorge Alberto Rivera Guerra

    2011-01-01

    Full Text Available Para tener acceso a una base de datos almacenada en un servidor SQL utilizando Visual Studio .Net 2008 o 2010 con DataSets, existen diversas maneras de realizarlo, la mayor parte de ellas basadas en programación compleja. En este artículo se describe una manera de realizarlo. Solo basta con agregar la carpeta ASP.NET App_Code, añadir un nuevo elemento del tipo DataSet, anexar la o las tablas de la bases de datos, crear las consultas SQL, diseñar los Webforms y escribir el código para llamar a las consultas para el usuario.

  3. Development of a daily gridded precipitation data set for the Middle East

    Directory of Open Access Journals (Sweden)

    A. Yatagai

    2008-03-01

    Full Text Available We show an algorithm to construct a rain-gauge-based analysis of daily precipitation for the Middle East. One of the key points of our algorithm is to construct an accurate distribution of climatology. One possible advantage of this product is to validate high-resolution climate models and/or to diagnose the impact of climate changes on local hydrological resources. Many users are familiar with a monthly precipitation dataset (New et al., 1999 and a satellite-based daily precipitation dataset (Huffman et al., 2001, yet our data set, unlike theirs, clearly shows the effect of orography on daily precipitation and other extreme events, especially over the Fertile Crescent region. Currently the Middle-East precipitation analysis product is consisting of a 25-year data set for 1979–2003 based on more than 1300 stations.

  4. Analyzing large data sets from XGC1 magnetic fusion simulations using apache spark

    Energy Technology Data Exchange (ETDEWEB)

    Churchill, R. Michael [Princeton Plasma Physics Lab. (PPPL), Princeton, NJ (United States)

    2016-11-21

    Apache Spark is explored as a tool for analyzing large data sets from the magnetic fusion simulation code XGCI. Implementation details of Apache Spark on the NERSC Edison supercomputer are discussed, including binary file reading, and parameter setup. Here, an unsupervised machine learning algorithm, k-means clustering, is applied to XGCI particle distribution function data, showing that highly turbulent spatial regions do not have common coherent structures, but rather broad, ring-like structures in velocity space.

  5. Initial data sets and the topology of closed three-manifolds in general relativity

    Science.gov (United States)

    Carfora, M.

    1983-10-01

    The interaction between the matter content of a closed physical space associated with a generic gravitational configuration and the topology of the underlying closed three-manifold is discussed. Within the context of the conformal approach to the initial value problem, it is shown that the presence of enough matter and radiation favors the three-sphere topology or the worm-hole topology. It is argued that such topologies leave more room for possible gravitational initial data sets for the field equations.

  6. Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets

    OpenAIRE

    Shuryak, Igor

    2017-01-01

    The ecological effects of accidental or malicious radioactive contamination are insufficiently understood because of the hazards and difficulties associated with conducting studies in radioactively-polluted areas. Data sets from severely contaminated locations can therefore be small. Moreover, many potentially important factors, such as soil concentrations of toxic chemicals, pH, and temperature, can be correlated with radiation levels and with each other. In such situations, commonly-used st...

  7. The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems

    OpenAIRE

    Jia, Zhen; Zhou, Runlin; Zhu, Chunge; Wang, Lei; Gao, Wanling; Shi, Yingjie; Zhan, Jianfeng; Zhang, Lixin

    2013-01-01

    Now we live in an era of big data, and big data applications are becoming more and more pervasive. How to benchmark data center computer systems running big data applications (in short big data systems) is a hot topic. In this paper, we focus on measuring the performance impacts of diverse applications and scalable volumes of data sets on big data systems. For four typical data analysis applications---an important class of big data applications, we find two major results through experiments: ...

  8. PCAN: Probabilistic correlation analysis of two non-normal data sets.

    Science.gov (United States)

    Zoh, Roger S; Mallick, Bani; Ivanov, Ivan; Baladandayuthapani, Veera; Manyam, Ganiraju; Chapkin, Robert S; Lampe, Johanna W; Carroll, Raymond J

    2016-12-01

    Most cancer research now involves one or more assays profiling various biological molecules, e.g., messenger RNA and micro RNA, in samples collected on the same individuals. The main interest with these genomic data sets lies in the identification of a subset of features that are active in explaining the dependence between platforms. To quantify the strength of the dependency between two variables, correlation is often preferred. However, expression data obtained from next-generation sequencing platforms are integer with very low counts for some important features. In this case, the sample Pearson correlation is not a valid estimate of the true correlation matrix, because the sample correlation estimate between two features/variables with low counts will often be close to zero, even when the natural parameters of the Poisson distribution are, in actuality, highly correlated. We propose a model-based approach to correlation estimation between two non-normal data sets, via a method we call Probabilistic Correlations ANalysis, or PCAN. PCAN takes into consideration the distributional assumption about both data sets and suggests that correlations estimated at the model natural parameter level are more appropriate than correlations estimated directly on the observed data. We demonstrate through a simulation study that PCAN outperforms other standard approaches in estimating the true correlation between the natural parameters. We then apply PCAN to the joint analysis of a microRNA (miRNA) and a messenger RNA (mRNA) expression data set from a squamous cell lung cancer study, finding a large number of negative correlation pairs when compared to the standard approaches.

  9. Multiple data sets and modelling choices in a comparative LCA of disposable beverage cups.

    Science.gov (United States)

    van der Harst, Eugenie; Potting, José; Kroeze, Carolien

    2014-10-01

    This study used multiple data sets and modelling choices in an environmental life cycle assessment (LCA) to compare typical disposable beverage cups made from polystyrene (PS), polylactic acid (PLA; bioplastic) and paper lined with bioplastic (biopaper). Incineration and recycling were considered as waste processing options, and for the PLA and biopaper cup also composting and anaerobic digestion. Multiple data sets and modelling choices were systematically used to calculate average results and the spread in results for each disposable cup in eleven impact categories. The LCA results of all combinations of data sets and modelling choices consistently identify three processes that dominate the environmental impact: (1) production of the cup's basic material (PS, PLA, biopaper), (2) cup manufacturing, and (3) waste processing. The large spread in results for impact categories strongly overlaps among the cups, however, and therefore does not allow a preference for one type of cup material. Comparison of the individual waste treatment options suggests some cautious preferences. The average waste treatment results indicate that recycling is the preferred option for PLA cups, followed by anaerobic digestion and incineration. Recycling is slightly preferred over incineration for the biopaper cups. There is no preferred waste treatment option for the PS cups. Taking into account the spread in waste treatment results for all cups, however, none of these preferences for waste processing options can be justified. The only exception is composting, which is least preferred for both PLA and biopaper cups. Our study illustrates that using multiple data sets and modelling choices can lead to considerable spread in LCA results. This makes comparing products more complex, but the outcomes more robust.

  10. The International Spinal Cord Injury Pain Basic Data Set (version 2.0)

    DEFF Research Database (Denmark)

    Widerström-Noga, E; Biering-Sørensen, Fin; Bryce, T N

    2014-01-01

    OBJECTIVES: To revise the International Spinal Cord Injury Pain Basic Data Set (ISCIPBDS) based on new developments in the field and on suggestions from the spinal cord injury (SCI) and pain clinical and research community. SETTING: International. METHODS: The ISCIPBDS working group evaluated sug...... three pain interference questions concern perceived interference with activities, mood and sleep for overall pain rather than for individual pain problems and are scored on a 0 to 10 scale....

  11. Meta-Analysis of Pathway Enrichment: Combining Independent and Dependent Omics Data Sets

    OpenAIRE

    Kaever, Alexander; Landesfeind, Manuel; Feussner, Kirstin; Morgenstern, Burkhard; Feussner, Ivo; Meinicke, Peter

    2014-01-01

    A major challenge in current systems biology is the combination and integrative analysis of large data sets obtained from different high-throughput omics platforms, such as mass spectrometry based Metabolomics and Proteomics or DNA microarray or RNA-seq-based Transcriptomics. Especially in the case of non-targeted Metabolomics experiments, where it is often impossible to unambiguously map ion features from mass spectrometry analysis to metabolites, the integration of more reliable omics techn...

  12. A Decomposition Model for HPLC-DAD Data Set and Its Solution by Particle Swarm Optimization

    OpenAIRE

    Lizhi Cui; Zhihao Ling; Josiah Poon; Poon, Simon K.; Junbin Gao; Paul Kwan

    2014-01-01

    This paper proposes a separation method, based on the model of Generalized Reference Curve Measurement and the algorithm of Particle Swarm Optimization (GRCM-PSO), for the High Performance Liquid Chromatography with Diode Array Detection (HPLC-DAD) data set. Firstly, initial parameters are generated to construct reference curves for the chromatogram peaks of the compounds based on its physical principle. Then, a General Reference Curve Measurement (GRCM) model is designed to transform these p...

  13. MEDIASTINAL HEMORRHAGE MANAGEMENT FOLLOWING CARDIAC SURGERY: IMPLEMENTATION OF THE PERIORATIVE NURSING DATA SET

    OpenAIRE

    Giakoumidakis, Konstantinos; Katzilieri,Christina

    2015-01-01

    ntroduction: Standardized nursing terminologies (SNT) provide a common language among nurses, contributing to standardized and evidence based nursing care plans Aim: The development of a standardized nursing care plan for the effective management of postoperative mediastinal hemorrhage of cardiac surgery patients Material and Method: The SNT Perioperative Nursing Data Set (PNDS), 3rd edition, was used for a care plan formation, which is consisted of a coding system of nursing diagnoses...

  14. Exploring the SDSS Data Set with Linked Scatter Plots. I. EMP, CEMP, and CV Stars

    Science.gov (United States)

    Carbon, Duane F.; Henze, Christopher; Nelson, Bron C.

    2017-02-01

    We present the results of a search for extremely metal-poor (EMP), carbon-enhanced metal-poor (CEMP), and cataclysmic variable (CV) stars using a new exploration tool based on linked scatter plots (LSPs). Our approach is especially designed to work with very large spectrum data sets such as the SDSS, LAMOST, RAVE, and Gaia data sets, and it can be applied to stellar, galaxy, and quasar spectra. As a demonstration, we conduct our search using the SDSS DR10 data set. We first created a 3326-dimensional phase space containing nearly 2 billion measures of the strengths of over 1600 spectral features in 569,738 SDSS stars. These measures capture essentially all the stellar atomic and molecular species visible at the resolution of SDSS spectra. We show how LSPs can be used to quickly isolate and examine interesting portions of this phase space. To illustrate, we use LSPs coupled with cuts in selected portions of phase space to extract EMP stars, CEMP stars, and CV stars. We present identifications for 59 previously unrecognized candidate EMP stars and 11 previously unrecognized candidate CEMP stars. We also call attention to 2 candidate He ii emission CV stars found by the LSP approach that have not yet been discussed in the literature.

  15. Digital data sets that describe aquifer characteristics of the Enid isolated terrace aquifer in northwestern Oklahoma

    Science.gov (United States)

    Becker, C.J.; Runkle, D.L.; Rea, Alan

    1997-01-01

    ARC/INFO export and nonproprietary format files The data sets in this report include digitized aquifer boundaries and maps of hydraulic conductivity, recharge, and ground-water level elevation contours for the Enid isolated terrace aquifer in northwestern Oklahoma. The Enid isolated terrace aquifer covers approximately 82 square miles and supplies water for irrigation, domestic, municipal, and industrial use for the City of Enid and western Garfield County. The Quaternary-age Enid isolated terrace aquifer is composed of terrace deposits that consist of discontinuous layers of clay, sandy clay, sand, and gravel. The aquifer is unconfined and is bounded by the underlying Permian-age Hennessey Group on the east and the Cedar Hills Sandstone Formation of the Permian-age El Reno Group on the west. The Cedar Hills Sandstone Formation fills a channel beneath the thickest section of the Enid isolated terrace aquifer in the midwestern part of the aquifer. All of the data sets were digitized and created from information and maps in a ground-water modeling thesis and report of the Enid isolated terrace aquifer. The maps digitized were published at a scale of 1:62,500. Ground-water flow models are numerical representations that simplify and aggregate natural systems. Models are not unique; different combinations of aquifer characteristics may produce similar results. Therefore, values of hydraulic conductivity and recharge used in the model and presented in this data set are not precise, but are within a reasonable range when compared to independently collected data.

  16. Comparison of the CMAM30 data set with ACE-FTS and OSIRIS: polar regions

    Directory of Open Access Journals (Sweden)

    D. Pendlebury

    2015-04-01

    Full Text Available CMAM30 is a 30 year data set extending from 1979 to 2010 that is generated using a version of the Canadian Middle Atmosphere Model (CMAM in which the winds and temperatures are relaxed to the Interim Reanalysis product from the European Centre Medium-Range for Weather Forecasts (ERA-Interim. The data set has dynamical fields that are very close to the reanalysis below 1 hPa and chemical tracers that are self-consistent with respect to the model winds and temperature. The chemical tracers are expected to be close to actual observations. The data set is here compared to two satellite records – the Atmospheric Chemistry Experiment Fourier Transform Spectometer and the Odin Optical Spectrograph and InfraRed Imaging System – for the purpose of validating the temperature, ozone, water vapour and methane fields. Data from the Aura Microwave Limb Sounder is also used for validation of the chemical processing in the polar vortex. It is found that the CMAM30 temperature is warm by up to 5 K in the stratosphere, with a low bias in the mesosphere of ~ 5–15 K. Ozone is reasonable (± 15% except near the tropopause globally, and in the Southern Hemisphere winter polar vortex. Water vapour is consistently low by 10–20%, with corresponding high methane of 10–20%, except in the Southern Hemisphere polar vortex. Discrepancies in this region are shown to stem from the treatment of polar stratospheric cloud formation in the model.

  17. Parallel group independent component analysis for massive fMRI data sets

    Science.gov (United States)

    Huang, Lei; Qiu, Huitong; Nebel, Mary Beth; Mostofsky, Stewart H.; Pekar, James J.; Lindquist, Martin A.; Eloyan, Ani; Caffo, Brian S.

    2017-01-01

    Independent component analysis (ICA) is widely used in the field of functional neuroimaging to decompose data into spatio-temporal patterns of co-activation. In particular, ICA has found wide usage in the analysis of resting state fMRI (rs-fMRI) data. Recently, a number of large-scale data sets have become publicly available that consist of rs-fMRI scans from thousands of subjects. As a result, efficient ICA algorithms that scale well to the increased number of subjects are required. To address this problem, we propose a two-stage likelihood-based algorithm for performing group ICA, which we denote Parallel Group Independent Component Analysis (PGICA). By utilizing the sequential nature of the algorithm and parallel computing techniques, we are able to efficiently analyze data sets from large numbers of subjects. We illustrate the efficacy of PGICA, which has been implemented in R and is freely available through the Comprehensive R Archive Network, through simulation studies and application to rs-fMRI data from two large multi-subject data sets, consisting of 301 and 779 subjects respectively. PMID:28278208

  18. A NOVEL, FULLY AUTOMATED PIPELINE FOR PERIOD ESTIMATION IN THE EROS 2 DATA SET

    Energy Technology Data Exchange (ETDEWEB)

    Protopapas, Pavlos [Institute for Applied Computational Science, Harvard University, Cambridge, MA 02138 (United States); Huijse, Pablo; Estévez, Pablo A. [Millennium Institute of Astrophysics (Chile); Zegers, Pablo [Universidad de los Andes, Facultad de Ingeniería y Ciencias Aplicadas, Monseñor Álvaro del Portillo 12455, Las Condes, Santiago (Chile); Príncipe, José C. [Computational Neuroengineering Laboratory, University of Florida, Gainesville, FL 32611 (United States); Marquette, Jean-Baptiste, E-mail: pavlos@seas.harvard.edu [UPMC-CNRS, UMR7095, Institut d' Astrophysique de Paris, F-75014 Paris (France)

    2015-02-01

    We present a new method to discriminate periodic from nonperiodic irregularly sampled light curves. We introduce a periodic kernel and maximize a similarity measure derived from information theory to estimate the periods and a discriminator factor. We tested the method on a data set containing 100,000 synthetic periodic and nonperiodic light curves with various periods, amplitudes, and shapes generated using a multivariate generative model. We correctly identified periodic and nonperiodic light curves with a completeness of ∼90% and a precision of ∼95%, for light curves with a signal-to-noise ratio (S/N) larger than 0.5. We characterize the efficiency and reliability of the model using these synthetic light curves and apply the method on the EROS-2 data set. A crucial consideration is the speed at which the method can be executed. Using a hierarchical search and some simplification on the parameter search, we were able to analyze 32.8 million light curves in ∼18 hr on a cluster of GPGPUs. Using the sensitivity analysis on the synthetic data set, we infer that 0.42% of the sources in the LMC and 0.61% of the sources in the SMC show periodic behavior. The training set, catalogs, and source code are all available at http://timemachine.iic.harvard.edu.

  19. A full scale approximation of covariance functions for large spatial data sets

    KAUST Repository

    Sang, Huiyan

    2011-10-10

    Gaussian process models have been widely used in spatial statistics but face tremendous computational challenges for very large data sets. The model fitting and spatial prediction of such models typically require O(n 3) operations for a data set of size n. Various approximations of the covariance functions have been introduced to reduce the computational cost. However, most existing approximations cannot simultaneously capture both the large- and the small-scale spatial dependence. A new approximation scheme is developed to provide a high quality approximation to the covariance function at both the large and the small spatial scales. The new approximation is the summation of two parts: a reduced rank covariance and a compactly supported covariance obtained by tapering the covariance of the residual of the reduced rank approximation. Whereas the former part mainly captures the large-scale spatial variation, the latter part captures the small-scale, local variation that is unexplained by the former part. By combining the reduced rank representation and sparse matrix techniques, our approach allows for efficient computation for maximum likelihood estimation, spatial prediction and Bayesian inference. We illustrate the new approach with simulated and real data sets. © 2011 Royal Statistical Society.

  20. Three- and four-dimensional visualization of magnetic resonance imaging data sets in pediatric cardiology.

    Science.gov (United States)

    Vick, G W

    2000-01-01

    The purpose of medical imaging technology in pediatric cardiology is to provide clear representations of the underlying anatomy and physiology of the cardiovascular system--representations that are easily understood and that facilitate clinical decision making. However, standard projective and tomographic imaging methods often yield results that are intelligible only to imaging specialists. Three- and four-dimensional reconstructions from projective and tomographic data sets are an alternative form of image display. Often, these reconstructions are more readily comprehensible as representations of the reality apparent in the operating room or the pathology laboratory than are the original data sets. Furthermore, viewing of these reconstructions is much more time efficient than viewing hundreds of separate tomographic images. Magnetic resonance imaging inherently provides three-, four-, and even higher dimensional data, and magnetic resonance data sets are commonly used to generate volumetric reconstructions. This review will focus on the practical application of magnetic resonance imaging to yield three- and four-dimensional reconstructions of pediatric cardiovascular disorders.

  1. Widespread Contamination of Arabidopsis Embryo and Endosperm Transcriptome Data Sets[OPEN

    Science.gov (United States)

    2017-01-01

    A major goal of global gene expression profiling in plant seeds has been to investigate the parental contributions to the transcriptomes of early embryos and endosperm. However, consistency between independent studies has been poor, leading to considerable debate. We have developed a statistical tool that reveals the presence of substantial RNA contamination from maternal tissues in nearly all published Arabidopsis thaliana endosperm and early embryo transcriptomes generated in these studies. We demonstrate that maternal RNA contamination explains the poor reproducibility of these transcriptomic data sets. Furthermore, we found that RNA contamination from maternal tissues has been repeatedly misinterpreted as epigenetic phenomena, which has resulted in inaccurate conclusions regarding the parental contributions to both the endosperm and early embryo transcriptomes. After accounting for maternal RNA contamination, no published genome-wide data set supports the concept of delayed paternal genome activation in plant embryos. Moreover, our analysis suggests that maternal and paternal genomic imprinting are equally rare events in Arabidopsis endosperm. Our publicly available software (https://github.com/Gregor-Mendel-Institute/tissue-enrichment-test) can help the community assess the level of contamination in transcriptome data sets generated from both seed and non-seed tissues. PMID:28314828

  2. On application of constitutional descriptors for merging of quinoxaline data sets using linear statistical methods.

    Science.gov (United States)

    Ghosh, Payel; Vracko, Marjan; Chattopadhyay, Asis Kumar; Bagchi, Manish C

    2008-08-01

    The present paper is an attempt for unifying two different quinoxaline data sets with a wide range of substituents in 2, 3, 7, and 8 positions having excellent antitubercular activities with a view to developing robust and reliable structure-activity relationships. The merging has been performed for these two sets of quinoxaline 1,4-di-N-oxides derivatives comprising 29 and 18 compounds, respectively, on the basis of constitutional descriptors, which denotes the structural characterization of the molecules. Principal component analysis was performed to see the distribution of the compounds from two data sets for the constitutional descriptors. The distribution of compounds in score plot based on constitutional descriptors suggests unification of quinoxaline data sets which is useful for the model development. Outlier detection was performed from the standpoint of residual analysis of the partial least squares regression models. The superiority of the constitutional descriptors over other calculated molecular descriptors has been established from the standpoint of leave-one-out cross-validation technique associated with partial least squares regression analysis. Internal validation through the leave-many-out methodology was also performed with good results, assuring the stability of the models. The results obtained from linear partial least squares regression analysis lead to a statistically significant and robust quantitative structure-activity relationship modeling.

  3. The Global Fire Emissions Database (GFED3) Global Burned Area Data Set

    Science.gov (United States)

    Giglio, L.; van der Werf, G. R.; Randerson, J. T.; Collatz, G. J.; Kasibhatla, P.; Morton, D. C.; Defries, R. S.

    2008-12-01

    We discuss major enhancements to the burned area component of the Global Fire Emissions Database (GFED3) over previous versions (GFED1 and GFED2), which now provides global, monthly burned area estimates at 0.5-degree spatial resolution for the time period 1997-2008. Estimates are produced by calibrating Terra MODIS active fire data with 500-m MODIS burned area maps via geographically weighted regression. Cross-calibration with fire observations from the Tropical Rainfall Measuring Mission Visible and Infrared Scanner (VIRS) and the Along-Track Scanning Radiometer (ATSR) allows the data set to be extended further back in time. We then discuss the spatially-explicit uncertainty estimates accompanying our data set, and the use of these estimates within atmospheric and biogeochemical models. We compare our GFED3 burned area estimates with other recent global burned area data sets, including GFED2, L3JRC, and GLOBCARBON. We quantify areas and time periods in which the different products diverge, and conclude with explanations for some of the discrepancies.

  4. An Analysis Framework Addressing the Scale and Legibility of Large Scientific Data Sets

    Energy Technology Data Exchange (ETDEWEB)

    Childs, Hank R. [Univ. of California, Davis, CA (United States)

    2006-01-01

    Much of the previous work in the large data visualization area has solely focused on handling the scale of the data. This task is clearly a great challenge and necessary, but it is not sufficient. Applying standard visualization techniques to large scale data sets often creates complicated pictures where meaningful trends are lost. A second challenge, then, is to also provide algorithms that simplify what an analyst must understand, using either visual or quantitative means. This challenge can be summarized as improving the legibility or reducing the complexity of massive data sets. Fully meeting both of these challenges is the work of many, many PhD dissertations. In this dissertation, we describe some new techniques to address both the scale and legibility challenges, in hope of contributing to the larger solution. In addition to our assumption of simultaneously addressing both scale and legibility, we add an additional requirement that the solutions considered fit well within an interoperable framework for diverse algorithms, because a large suite of algorithms is often necessary to fully understand complex data sets. For scale, we present a general architecture for handling large data, as well as details of a contract-based system for integrating advanced optimizations into a data flow network design. We also describe techniques for volume rendering and performing comparisons at the extreme scale. For legibility, we present several techniques. Most noteworthy are equivalence class functions, a technique to drive visualizations using statistical methods, and line-scan based techniques for characterizing shape.

  5. Tiny videos: a large data set for nonparametric video retrieval and frame classification.

    Science.gov (United States)

    Karpenko, Alexandre; Aarabi, Parham

    2011-03-01

    In this paper, we present a large database of over 50,000 user-labeled videos collected from YouTube. We develop a compact representation called "tiny videos" that achieves high video compression rates while retaining the overall visual appearance of the video as it varies over time. We show that frame sampling using affinity propagation-an exemplar-based clustering algorithm-achieves the best trade-off between compression and video recall. We use this large collection of user-labeled videos in conjunction with simple data mining techniques to perform related video retrieval, as well as classification of images and video frames. The classification results achieved by tiny videos are compared with the tiny images framework [24] for a variety of recognition tasks. The tiny images data set consists of 80 million images collected from the Internet. These are the largest labeled research data sets of videos and images available to date. We show that tiny videos are better suited for classifying scenery and sports activities, while tiny images perform better at recognizing objects. Furthermore, we demonstrate that combining the tiny images and tiny videos data sets improves classification precision in a wider range of categories.

  6. Developing consistent Landsat data sets for large area applications: the MRLC 2001 protocol

    Science.gov (United States)

    Chander, G.; Huang, C.; Yang, L.; Homer, C.; Larson, C.

    2009-01-01

    One of the major efforts in large area land cover mapping over the last two decades was the completion of two U.S. National Land Cover Data sets (NLCD), developed with nominal 1992 and 2001 Landsat imagery under the auspices of the MultiResolution Land Characteristics (MRLC) Consortium. Following the successful generation of NLCD 1992, a second generation MRLC initiative was launched with two primary goals: (1) to develop a consistent Landsat imagery data set for the U.S. and (2) to develop a second generation National Land Cover Database (NLCD 2001). One of the key enhancements was the formulation of an image preprocessing protocol and implementation of a consistent image processing method. The core data set of the NLCD 2001 database consists of Landsat 7 Enhanced Thematic Mapper Plus (ETM+) images. This letter details the procedures for processing the original ETM+ images and more recent scenes added to the database. NLCD 2001 products include Anderson Level II land cover classes, percent tree canopy, and percent urban imperviousness at 30-m resolution derived from Landsat imagery. The products are freely available for download to the general public from the MRLC Consortium Web site at http://www.mrlc.gov.

  7. The integrated water balance and soil data set of the Rollesbroich hydrological observatory

    Science.gov (United States)

    Qu, Wei; Bogena, Heye R.; Huisman, Johan A.; Schmidt, Marius; Kunkel, Ralf; Weuthen, Ansgar; Schiedung, Henning; Schilling, Bernd; Sorg, Jürgen; Vereecken, Harry

    2016-10-01

    The Rollesbroich headwater catchment located in western Germany is a densely instrumented hydrological observatory and part of the TERENO (Terrestrial Environmental Observatories) initiative. The measurements acquired in this observatory present a comprehensive data set that contains key hydrological fluxes in addition to important hydrological states and properties. Meteorological data (i.e., precipitation, air temperature, air humidity, radiation components, and wind speed) are continuously recorded and actual evapotranspiration is measured using the eddy covariance technique. Runoff is measured at the catchment outlet with a gauging station. In addition, spatiotemporal variations in soil water content and temperature are measured at high resolution with a wireless sensor network (SoilNet). Soil physical properties were determined using standard laboratory procedures from samples taken at a large number of locations in the catchment. This comprehensive data set can be used to validate remote sensing retrievals and hydrological models, to improve the understanding of spatial temporal dynamics of soil water content, to optimize data assimilation and inverse techniques for hydrological models, and to develop upscaling and downscaling procedures of soil water content information. The complete data set is freely available online (http://www.tereno.net, doi:10.5880/TERENO.2016.001, doi:10.5880/TERENO.2016.004, doi:10.5880/TERENO.2016.003) and additionally referenced by three persistent identifiers securing the long-term data and metadata availability.

  8. Application of Advanced Master Curve Approaches to the EURO Fracture Toughness Data Set

    Energy Technology Data Exchange (ETDEWEB)

    Lucon, E.; Scibetta, M.

    2007-01-15

    The so-called EURO data set is the largest set ever assembled, consisting of fracture toughness results obtained in the ductile-to-brittle transition region. It was the outcome of a large EU sponsored project which involved ten European laboratories in the second half of the 90ies. Several post-project investigations have identified one of the blocks from which specimens were extracted (block SX9) as macroscopically inhomogeneous and significantly tougher than the remaining blocks. In this study, the variability of block SX9 has been investigated using the conventional Master Curve (MC) methodology and some recent MC extensions, namely the SINTAP lower tail, the single point estimation, the bi-modal Master Curve and the multi-modal Master Curve. The basic MC method is intended for macroscopically homogeneous ferritic steels only, and the alternative approaches have been developed for the investigation of inhomogeneous materials. Therefore, these methods can be used to study the behaviour of block SX9 within the EURO data set. It has been found that the bi-modal and multi-modal MC approaches are quite effective in detecting the anomaly represented by block SX9, but only when analyses are performed on data sets of comparable size.

  9. Moving Large Data Sets Over High-Performance Long Distance Networks

    Energy Technology Data Exchange (ETDEWEB)

    Hodson, Stephen W [ORNL; Poole, Stephen W [ORNL; Ruwart, Thomas [ORNL; Settlemyer, Bradley W [ORNL

    2011-04-01

    In this project we look at the performance characteristics of three tools used to move large data sets over dedicated long distance networking infrastructure. Although performance studies of wide area networks have been a frequent topic of interest, performance analyses have tended to focus on network latency characteristics and peak throughput using network traffic generators. In this study we instead perform an end-to-end long distance networking analysis that includes reading large data sets from a source file system and committing large data sets to a destination file system. An evaluation of end-to-end data movement is also an evaluation of the system configurations employed and the tools used to move the data. For this paper, we have built several storage platforms and connected them with a high performance long distance network configuration. We use these systems to analyze the capabilities of three data movement tools: BBcp, GridFTP, and XDD. Our studies demonstrate that existing data movement tools do not provide efficient performance levels or exercise the storage devices in their highest performance modes. We describe the device information required to achieve high levels of I/O performance and discuss how this data is applicable in use cases beyond data movement performance.

  10. Classifying Data Sets Using Support Vector Machines Based on Geometric Distance

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Support vector machines (SVMs) are not as favored for large-scale data mining as for pattern recognition and machine learning because the training complexity of SVMs is highly dependent on the size of data set. This paper presents a geometric distance-based SVM (GDB-SVM). It takes the distance between a point and classified hyperplane as classification rule,and is designed on the basis of theoretical analysis and geometric intuition. Experimental code is derived from LibSVM with Microsoft Visual C ++ 6.0 as system of translating and editing. Four predicted results of five of GDB-SVM are better than those of the method of one against all (OAA). Three predicted results of five of GDB-SVM are better than those of the method of one against one (OAO). Experiments on real data sets show that GDB-SVM is not only superior to the methods of OAA and OAO,but highly scalable for large data sets while generating high classification accuracy.

  11. Comparison of three airborne laser bathymetry data sets for monitoring the German Baltic Sea Coast

    Science.gov (United States)

    Song, Yujin; Niemeyer, Joachim; Ellmer, Wilfried; Soergel, Uwe; Heipke, Christian

    2015-10-01

    Airborne laser bathymetry (ALB) can be used for hydrographic surveying with relative high resolution in shallow water. In this paper, we examine the applicability of this technique based on three flight campaigns. These were conducted between 2012 and 2014 close to the island of Poel in the German Baltic Sea. The first data set was acquired by a Riegl VQ-820-G sensor in November 2012. The second and third data sets were acquired by a Chiroptera sensor of Airborne Hydrography AB in September 2013 and May 2014, respectively. We examine the 3D points classified as seabed under different conditions during data acquisition, e.g. the turbidity level of the water and the flight altitude. The analysis comprises the point distribution, point density, and the area coverage in several depth levels. In addition, we determine the vertical accuracy of the 3D seabed points by computing differences to echo sounding data. Finally, the results of the three flight campaigns are compared to each other and analyzed with respect to the different conditions during data acquisition. For each campaign only small differences in elevation between the laser and the echo sounding data set are observed. The ALB results satisfy the requirements of IHO Standards for Hydrographic Surveys (S-44) Order 1b for several depth intervals.

  12. Taming outliers in pulsar-timing data sets with hierarchical likelihoods and Hamiltonian sampling

    Science.gov (United States)

    Vallisneri, Michele; van Haasteren, Rutger

    2017-04-01

    Pulsar-timing data sets have been analysed with great success using probabilistic treatments based on Gaussian distributions, with applications ranging from studies of neutron-star structure to tests of general relativity and searches for nanosecond gravitational waves. As for other applications of Gaussian distributions, outliers in timing measurements pose a significant challenge to statistical inference, since they can bias the estimation of timing and noise parameters, and affect reported parameter uncertainties. We describe and demonstrate a practical end-to-end approach to perform Bayesian inference of timing and noise parameters robustly in the presence of outliers, and to identify these probabilistically. The method is fully consistent (i.e. outlier-ness probabilities vary in tune with the posterior distributions of the timing and noise parameters), and it relies on the efficient sampling of the hierarchical form of the pulsar-timing likelihood. Such sampling has recently become possible with a 'no-U-turn' Hamiltonian sampler coupled to a highly customized reparametrization of the likelihood; this code is described elsewhere, but it is already available online. We recommend our method as a standard step in the preparation of pulsar-timing-array data sets: even if statistical inference is not affected, follow-up studies of outlier candidates can reveal unseen problems in radio observations and timing measurements; furthermore, confidence in the results of gravitational-wave searches will only benefit from stringent statistical evidence that data sets are clean and outlier-free.

  13. Parallel group independent component analysis for massive fMRI data sets.

    Science.gov (United States)

    Chen, Shaojie; Huang, Lei; Qiu, Huitong; Nebel, Mary Beth; Mostofsky, Stewart H; Pekar, James J; Lindquist, Martin A; Eloyan, Ani; Caffo, Brian S

    2017-01-01

    Independent component analysis (ICA) is widely used in the field of functional neuroimaging to decompose data into spatio-temporal patterns of co-activation. In particular, ICA has found wide usage in the analysis of resting state fMRI (rs-fMRI) data. Recently, a number of large-scale data sets have become publicly available that consist of rs-fMRI scans from thousands of subjects. As a result, efficient ICA algorithms that scale well to the increased number of subjects are required. To address this problem, we propose a two-stage likelihood-based algorithm for performing group ICA, which we denote Parallel Group Independent Component Analysis (PGICA). By utilizing the sequential nature of the algorithm and parallel computing techniques, we are able to efficiently analyze data sets from large numbers of subjects. We illustrate the efficacy of PGICA, which has been implemented in R and is freely available through the Comprehensive R Archive Network, through simulation studies and application to rs-fMRI data from two large multi-subject data sets, consisting of 301 and 779 subjects respectively.

  14. Exploratory Analysis in Time-Varying Data Sets: a Healthcare Network Application.

    Science.gov (United States)

    Manukyan, Narine; Eppstein, Margaret J; Horbar, Jeffrey D; Leahy, Kathleen A; Kenny, Michael J; Mukherjee, Shreya; Rizzo, Donna M

    2013-07-01

    We introduce a new method for exploratory analysis of large data sets with time-varying features, where the aim is to automatically discover novel relationships between features (over some time period) that are predictive of any of a number of time-varying outcomes (over some other time period). Using a genetic algorithm, we co-evolve (i) a subset of predictive features, (ii) which attribute will be predicted (iii) the time period over which to assess the predictive features, and (iv) the time period over which to assess the predicted attribute. After validating the method on 15 synthetic test problems, we used the approach for exploratory analysis of a large healthcare network data set. We discovered a strong association, with 100% sensitivity, between hospital participation in multi-institutional quality improvement collaboratives during or before 2002, and changes in the risk-adjusted rates of mortality and morbidity observed after a 1-2 year lag. The proposed approach is a potentially powerful and general tool for exploratory analysis of a wide range of time-series data sets.

  15. A learning method for the class imbalance problem with medical data sets.

    Science.gov (United States)

    Li, Der-Chiang; Liu, Chiao-Wen; Hu, Susan C

    2010-05-01

    In medical data sets, data are predominately composed of "normal" samples with only a small percentage of "abnormal" ones, leading to the so-called class imbalance problems. In class imbalance problems, inputting all the data into the classifier to build up the learning model will usually lead a learning bias to the majority class. To deal with this, this paper uses a strategy which over-samples the minority class and under-samples the majority one to balance the data sets. For the majority class, this paper builds up the Gaussian type fuzzy membership function and alpha-cut to reduce the data size; for the minority class, we use the mega-trend diffusion membership function to generate virtual samples for the class. Furthermore, after balancing the data size of classes, this paper extends the data attribute dimension into a higher dimension space using classification related information to enhance the classification accuracy. Two medical data sets, Pima Indians' diabetes and the BUPA liver disorders, are employed to illustrate the approach presented in this paper. The results indicate that the proposed method has better classification performance than SVM, C4.5 decision tree and two other studies. 2010 Elsevier Ltd. All rights reserved.

  16. The gradient boosting algorithm and random boosting for genome-assisted evaluation in large data sets.

    Science.gov (United States)

    González-Recio, O; Jiménez-Montero, J A; Alenda, R

    2013-01-01

    In the next few years, with the advent of high-density single nucleotide polymorphism (SNP) arrays and genome sequencing, genomic evaluation methods will need to deal with a large number of genetic variants and an increasing sample size. The boosting algorithm is a machine-learning technique that may alleviate the drawbacks of dealing with such large data sets. This algorithm combines different predictors in a sequential manner with some shrinkage on them; each predictor is applied consecutively to the residuals from the committee formed by the previous ones to form a final prediction based on a subset of covariates. Here, a detailed description is provided and examples using a toy data set are included. A modification of the algorithm called "random boosting" was proposed to increase predictive ability and decrease computation time of genome-assisted evaluation in large data sets. Random boosting uses a random selection of markers to add a subsequent weak learner to the predictive model. These modifications were applied to a real data set composed of 1,797 bulls genotyped for 39,714 SNP. Deregressed proofs of 4 yield traits and 1 type trait from January 2009 routine evaluations were used as dependent variables. A 2-fold cross-validation scenario was implemented. Sires born before 2005 were used as a training sample (1,576 and 1,562 for production and type traits, respectively), whereas younger sires were used as a testing sample to evaluate predictive ability of the algorithm on yet-to-be-observed phenotypes. Comparison with the original algorithm was provided. The predictive ability of the algorithm was measured as Pearson correlations between observed and predicted responses. Further, estimated bias was computed as the average difference between observed and predicted phenotypes. The results showed that the modification of the original boosting algorithm could be run in 1% of the time used with the original algorithm and with negligible differences in accuracy

  17. Modelling tunnel jet emissions with LASAT: evaluation study with two Austrian data sets (Ehrentalerbergtunnel and Kaisermuhlentunnel

    Directory of Open Access Journals (Sweden)

    Marcus Hirtl

    2011-02-01

    Full Text Available Two comprehensive data sets are used to investigate the ability of the Lagrangian particle diffusion model LASAT to simulate the dispersion of plumes emitted from tunnel jets. The data sets differ in traffic volume, tunnel geometry and temporal resolution of the measurement data. In the framework of the measurement campaign at the Ehrentalerbergtunnel in Carinthia, seven trace gas experiments with SF6 were conducted in 2001. Short term averages (30 minutes of concentrations were measured at 25 air quality stations in the vicinity of the tunnel portal during different meteorological conditions. In general the dispersion of the plume depends on the meteorological conditions (wind, stability and the modification of the flow by terrain and buildings in the vicinity of the portal. The influence of the exit velocity of the tunnel jet must also be considered as well as the difference between the exhaust temperature and the ambient air temperature to account for buoyancy effects. The temperature increment cannot be provided directly as input parameter to LASAT as in case of the tunnel jet velocity although it is an important parameter. With LASAT, the model user can adjust two empirical input parameters to the tunnel specifications. Relationships between these model parameters and the tunnel parameters are developed in this study. They are based on the data set Ehrentalerbergtunnel and provide reasonable input values for the model user. The simulations with LASAT show that the model is able to reproduce the location and the height of the observed peak concentrations very well. The second data set was generated from January to October 2001 at the Kaisermühlentunnel in Vienna. Measurements of NOx at four air quality stations near the portal are available. Because of uncertainties in the emission data caused by vehicle counts in only one direction, only long term averages of concentrations are compared for this data set. The functions between tunnel and

  18. Regionalisation of statistical model outputs creating gridded data sets for Germany

    Science.gov (United States)

    Höpp, Simona Andrea; Rauthe, Monika; Deutschländer, Thomas

    2016-04-01

    The goal of the German research program ReKliEs-De (regional climate projection ensembles for Germany, http://.reklies.hlug.de) is to distribute robust information about the range and the extremes of future climate for Germany and its neighbouring river catchment areas. This joint research project is supported by the German Federal Ministry of Education and Research (BMBF) and was initiated by the German Federal States. The Project results are meant to support the development of adaptation strategies to mitigate the impacts of future climate change. The aim of our part of the project is to adapt and transfer the regionalisation methods of the gridded hydrological data set (HYRAS) from daily station data to the station based statistical regional climate model output of WETTREG (regionalisation method based on weather patterns). The WETTREG model output covers the period of 1951 to 2100 with a daily temporal resolution. For this, we generate a gridded data set of the WETTREG output for precipitation, air temperature and relative humidity with a spatial resolution of 12.5 km x 12.5 km, which is common for regional climate models. Thus, this regionalisation allows comparing statistical to dynamical climate model outputs. The HYRAS data set was developed by the German Meteorological Service within the German research program KLIWAS (www.kliwas.de) and consists of daily gridded data for Germany and its neighbouring river catchment areas. It has a spatial resolution of 5 km x 5 km for the entire domain for the hydro-meteorological elements precipitation, air temperature and relative humidity and covers the period of 1951 to 2006. After conservative remapping the HYRAS data set is also convenient for the validation of climate models. The presentation will consist of two parts to present the actual state of the adaptation of the HYRAS regionalisation methods to the statistical regional climate model WETTREG: First, an overview of the HYRAS data set and the regionalisation

  19. Ground Truth Observations of the Interior of a Rockglacier as Validation for Geophysical Monitoring Data Sets

    Science.gov (United States)

    Hilbich, C.; Roer, I.; Hauck, C.

    2007-12-01

    Monitoring the permafrost evolution in mountain regions is currently one of the important tasks in cryospheric studies as little data on past and present changes of the ground thermal regime and its material properties are available. In addition to recently established borehole temperature monitoring networks, techniques to determine and monitor the ground ice content have to be developed. A reliable quantification of ground ice is especially important for modelling the thermal evolution of frozen ground and for assessing the hazard potential due to thawing permafrost induced slope instability. Near surface geophysical methods are increasingly applied to detect and monitor ground ice occurrences in permafrost areas. Commonly, characteristic values of electrical resistivity and seismic velocity are used as indicators for the presence of frozen material. However, validation of the correct interpretation of the geophysical parameters can only be obtained through boreholes, and only regarding vertical temperature profiles. Ground truth of the internal structure and the ice content is usually not available. In this contribution we will present a unique data set from a recently excavated rockglacier near Zermatt/Valais in the Swiss Alps, where an approximately 5 m deep trench was cut across the rockglacier body for the construction of a ski track. Longitudinal electrical resistivity tomography (ERT) and refraction seismic tomography profiles were conducted prior to the excavation, yielding data sets for cross validation of commonly applied geophysical interpretation approaches in the context of ground ice detection. A recently developed 4-phase model was applied to calculate ice-, air- and unfrozen water contents from the geophysical data sets, which were compared to the ground truth data from the excavated trench. The obtained data sets will be discussed in the context of currently established geophysical monitoring networks in permafrost areas. In addition to the

  20. Genetic analysis of local Vietnamese chickens provides evidence of gene flow from wild to domestic populations

    Directory of Open Access Journals (Sweden)

    Chi C Vu

    2009-01-01

    Full Text Available Abstract Background Previous studies suggested that multiple domestication events in South and South-East Asia (Yunnan and surrounding areas and India have led to the genesis of modern domestic chickens. Ha Giang province is a northern Vietnamese region, where local chickens, such as the H'mong breed, and wild junglefowl coexist. The assumption was made that hybridisation between wild junglefowl and Ha Giang chickens may have occurred and led to the high genetic diversity previously observed. The objectives of this study were i to clarify the genetic structure of the chicken population within the Ha Giang province and ii to give evidence of admixture with G. gallus. A large survey of the molecular polymorphism for 18 microsatellite markers was conducted on 1082 chickens from 30 communes of the Ha Giang province (HG chickens. This dataset was combined with a previous dataset of Asian breeds, commercial lines and samples of Red junglefowl from Thailand and Vietnam (Ha Noï. Measurements of genetic diversity were estimated both within-population and between populations, and a step-by-step Bayesian approach was performed on the global data set. Results The highest value for expected heterozygosity (> 0.60 was found in HG chickens and in the wild junglefowl populations from Thailand. HG chickens exhibited the highest allelic richness (mean A = 2.9. No significant genetic subdivisions of the chicken population within the Ha Giang province were found. As compared to other breeds, HG chickens clustered with wild populations. Furthermore, the neighbornet tree and the Bayesian clustering analysis showed that chickens from 4 communes were closely related to the wild ones and showed an admixture pattern. Conclusion In the absence of any population structuring within the province, the H'mong chicken, identified from its black phenotype, shared a common gene pool with other chickens from the Ha Giang population. The large number of alleles shared exclusively

  1. Chicken NK cell receptors.

    Science.gov (United States)

    Straub, Christian; Neulen, Marie-Luise; Sperling, Beatrice; Windau, Katharina; Zechmann, Maria; Jansen, Christine A; Viertlboeck, Birgit C; Göbel, Thomas W

    2013-11-01

    Natural killer cells are innate immune cells that destroy virally infected or transformed cells. They recognize these altered cells by a plethora of diverse receptors and thereby differ from other lymphocytes that use clonally distributed antigen receptors. To date, several receptor families that play a role in either activating or inhibiting NK cells have been identified in mammals. In the chicken, NK cells have been functionally and morphologically defined, however, a conclusive analysis of receptors involved in NK cell mediated functions has not been available. This is partly due to the low frequencies of NK cells in blood or spleen that has hampered their intensive characterization. Here we will review recent progress regarding the diverse NK cell receptor families, with special emphasis on novel families identified in the chicken genome with potential as chicken NK cell receptors. Copyright © 2013 Elsevier Ltd. All rights reserved.

  2. On the fall 2010 Enhancements of the Global Precipitation Climatology Centre's Data Sets

    Science.gov (United States)

    Becker, A. W.; Schneider, U.; Meyer-Christoffer, A.; Ziese, M.; Finger, P.; Rudolf, B.

    2010-12-01

    Precipitation is meanwhile a top listed parameter on the WMO GCOS list of 44 essential climate variables (ECV). This is easily justified by its crucial role to sustain any form of life on earth as major source of fresh water, its major impact on weather, climate, climate change and related issues of society’s adaption to the latter. Finally its occurrence is highly variable in space and time thus bearing the potential to trigger major flood and draught related disasters. Since its start in 1989 the Global precipitation Climatology Centre (GPCC) performs global analyses of monthly precipitation for the earth’s land-surface on the basis of in-situ measurements. The effort was inaugurated as part of the Global Precipitation Climatology Project of the WMO World Climate Research Program (WCRP). Meanwhile, the data set has continuously grown both in temporal coverage (original start of the evaluation period was 1986), as well as extent and quality of the underlying data base. The number of stations involved in the related data base has approximately doubled in the past 8 years by trespassing the 40, 60 and 80k thresholds in 2002, 2006 and 2010. Core data source of the GPCC analyses are the data from station networks operated by the National Meteorological Services worldwide; data deliveries have been received from ca. 190 countries. The GPCC integrates also other global precipitation data collections (i.e. FAO, CRU and GHCN), as well as regional data sets. Currently the Africa data set from S. Nicholson (Univ. Tallahassee) is integrated. As a result of these efforts the GPCC holds the worldwide largest and most comprehensive collection of precipitation data, which is continuously updated and extended. Due to the high spatial-temporal variability of precipitation, even its global analysis requires this high number of stations to provide for a sufficient density of measurement data on almost any place on the globe. The acquired data sets are pre-checked, reformatted

  3. Kernel Density Estimation, Kernel Methods, and Fast Learning in Large Data Sets.

    Science.gov (United States)

    Wang, Shitong; Wang, Jun; Chung, Fu-lai

    2014-01-01

    Kernel methods such as the standard support vector machine and support vector regression trainings take O(N(3)) time and O(N(2)) space complexities in their naïve implementations, where N is the training set size. It is thus computationally infeasible in applying them to large data sets, and a replacement of the naive method for finding the quadratic programming (QP) solutions is highly desirable. By observing that many kernel methods can be linked up with kernel density estimate (KDE) which can be efficiently implemented by some approximation techniques, a new learning method called fast KDE (FastKDE) is proposed to scale up kernel methods. It is based on establishing a connection between KDE and the QP problems formulated for kernel methods using an entropy-based integrated-squared-error criterion. As a result, FastKDE approximation methods can be applied to solve these QP problems. In this paper, the latest advance in fast data reduction via KDE is exploited. With just a simple sampling strategy, the resulted FastKDE method can be used to scale up various kernel methods with a theoretical guarantee that their performance does not degrade a lot. It has a time complexity of O(m(3)) where m is the number of the data points sampled from the training set. Experiments on different benchmarking data sets demonstrate that the proposed method has comparable performance with the state-of-art method and it is effective for a wide range of kernel methods to achieve fast learning in large data sets.

  4. A New National MODIS-Derived Phenology Data Set Every 16 Days, 2002 through 2006

    Energy Technology Data Exchange (ETDEWEB)

    HargroveJr., William Walter [USDA Forest Service; Spruce, Joe [NASA Stennis Space Center; Gasser, Gerry [NASA Stennis Space Center; Hoffman, Forrest M [ORNL; Lee, Danny C [USDA Forest Service

    2008-01-01

    A new national phenology data set has been developed, comprised of a series of seamless 231 m national maps, every 16 days from 2001 through 2006. The data set was developed jointly by the Eastern Forest Environmental Threat Assessment Center (EFETAC) of the USDA Forest Service, and contractors of the NASA Stennis Space Center. The data are available now for dissemination and use. The first half of the National Phenology Data Set is the cumulative area under the NDVI curve since Jan 1, and increases monotonically every 16 days until the end of the year. These cumulative data values 'latch' in the event of clouds or snow, remaining at the value when we last saw this cell. The second half is a set of diagnostic parameters fit to the annual NDVI function. The spring minimum, the 20% rise, the 80% rise, the leaf-on maximum, the 80% fall, the 20% fall, and the trailing fall minimum are determined for each map cell. For each parameter, we produce both a national map of the NDVI value, and a map of the day-of-year when that NDVI value was reached. Length of growing season, as the difference between the spring and fall 20% DOYs, and date of middle of growing season can be mapped as well. The new dataset has permitted the development of a set of national phonological ecoregions, and has also proven useful for mapping Gypsy Moth defoliation, simultaneously delineating the aftermath of three Gulf Coast hurricanes, and quantifying suburban/ex-urban development surrounding metro Atlanta.

  5. Iron in galaxy groups and clusters: confronting galaxy evolution models with a newly homogenized data set

    Science.gov (United States)

    Yates, Robert M.; Thomas, Peter A.; Henriques, Bruno M. B.

    2017-01-01

    We present an analysis of the iron abundance in the hot gas surrounding galaxy groups and clusters. To do this, we first compile and homogenize a large data set of 79 low-redshift (tilde{z} = 0.03) systems (159 individual measurements) from the literature. Our analysis accounts for differences in aperture size, solar abundance, and cosmology, and scales all measurements using customized radial profiles for the temperature (T), gas density (ρgas), and iron abundance (ZFe). We then compare this data set to groups and clusters in the L-GALAXIES galaxy evolution model. Our homogenized data set reveals a tight T-ZFe relation for clusters, with a scatter in ZFe of only 0.10 dex and a slight negative gradient. After examining potential measurement biases, we conclude that some of this negative gradient has a physical origin. Our model suggests greater accretion of hydrogen in the hottest systems, via stripping from infalling satellites, as a cause. In groups, L-GALAXIES over-estimates ZFe, indicating that metal-rich gas removal (via e.g. AGN feedback) is required. L-GALAXIES is consistent with the observed ZFe in the intracluster medium (ICM) of the hottest clusters at z = 0, and shows a similar rate of ICM enrichment as that observed from at least z ˜ 1.3 to the present day. This is achieved without needing to modify any of the galactic chemical evolution (GCE) model parameters. However, the ZFe in intermediate-T clusters could be under-estimated in our model. We caution that modifications to the GCE modelling to correct this disrupt the agreement with observations of galaxies' stellar components.

  6. Monitoring of environmental change in Dzungar basin by the analysis of multi temporal satellite data sets

    Science.gov (United States)

    Nakayama, Y.; Yanagi, T.; Nishimura, J.

    In recent 40-50 years, rapid environmental changes are shown in the arid and semi-arid regions of the inland areas in each continent. The environment change situation is especially remarkable at closed lakes and their vicinity of the Asian continent inland. This study aimed to investigate the environmental change and its cause in Dzungar basin of the central Asia through the analysis of multi-temporal satellite data sets. The multi temporal and multi stage satellite data sets were firstly created by using high spatial resolution satellite data such as LANDSAT/MSS TM, Terra/ASTER, and JERS-1/OPS, and wide observation satellite data such as NOAA/AVHRR and Terra/MODIS. Next, the fluctuations of the past about 50 years in water area of lakes were investigated in detail by analyzing the data sets, and also changes in the irrigated agricultural lands along the inflow rivers, and the snow and glacier covering the mountainous district were investigated. Finally, hydrological change situation and its cause in the object area were examined by comparing the analyzed results with meteorological data and auxiliary sources. The results of this study are summarized as follows; Most of closed lakes in Dzungar basin have shown the rapid shrinkages in the past about 50 years. However, it changed into the remarkable expansion of the water area since 2001. According to the analysis results of changes in the irrigated agricultural lands, snow and glacier extents, it was shown that the influence of human activities such as development of irrigation lands was bigger than the influence of the nature fluctuation based on the global warming as a cause of the change in closed lakes.

  7. Three-dimensional atlas of lymph node topography based on the visible human data set.

    Science.gov (United States)

    Qatarneh, Sharif M; Kiricuta, Ion-Christian; Brahme, Anders; Tiede, Ulf; Lind, Bengt K

    2006-05-01

    Comprehensive atlases of lymph node topography are necessary tools to provide a detailed description of the lymphatic distribution in relation to other organs and structures. Despite the recent developments of atlases and guidelines focusing on definitions of lymphatic regions, a comprehensive and detailed description of the three-dimensional (3D) nodal distribution is lacking. This article describes a new 3D atlas of lymph node topography based on the digital images of the Visible Human Male Anatomical (VHMA) data set. About 1,200 lymph nodes were localized in the data set and their distribution was compared with data from current cross-sectional lymphatic atlases. The identified nodes were delineated and then labeled with different colors that corresponded to their anatomical locations. A series of 2D illustrations, showing discrete locations, description, and distribution of major lymph nodes, was compiled to form a cross-sectional atlas. The resultant contours of all localized nodes in the VHMA data set were superimposed to develop a volumetric model. A 3D reconstruction was generated for the lymph nodes and surrounding structures. The volumetric lymph node topography was also integrated into the existing VOXEL-MAN digital atlas to obtain an interactive and photo-realistic visualization of the lymph nodes showing their proximity to blood vessels and surrounding organs. The lymph node topography forms part of our whole body atlas database, which includes organs, definitions, and parameters that are related to radiation therapy. The lymph node topography atlas could be utilized for visualization and exploration of the 3D lymphatic distribution to assist in defining the target volume for treatment based on the lymphatic spread surrounding the primary tumor.

  8. Fifteen years of dual polarimetric observations of tropical convection: The CPOL data set.

    Science.gov (United States)

    Collis, Scott; Protat, Alain; Jackson, Robert; Helmus, Jonathan; Giangrande, Scott; Louf, Valentin; Lang, Timothy; May, Peter; Glasson, Ken; Atkinson, Brad; Whimpey, Michael; Keenan, Tom

    2017-04-01

    The use of polarization diversity to measure properties of hydrometeors is not new and was first mentioned by Seliga and Bringi from an engineering perspective in 1976 and by Hendry et al (again in 1976) from a measurement perspective shortly thereafter. In the forty years that have passed since these accomplishments there have been several key data-sets that have guided the development of retrieval science and the use of polarimetry in understanding the nature of precipitation. One such data set has been collected using the C-Band POLarimetric radar (Keenan et al, 1998), which collected 15 years of observations of break/buildup and monsoon season phenomena when it was sited 23km from Darwin, Australia. This presentation will report on the progress of a collaboration aimed at producing a quality controlled set of polarimetric measurements and microphysical retrievals for this 15-year data set. Techniques such as calibration offset, specific differential phase and attenuation retrieval and comparison with disdrometer measurements (via scattering calculations on collected drop size distributions) will be covered including contrasting several different open source approaches. Seliga, T.A., Bringi, V.N., 1976. Potential Use of Radar Differential Reflectivity Measurements at Orthogonal Polarizations for Measuring Precipitation. J. Appl. Meteor. 15, 69-76. doi:10.1175/1520-0450(1976)0152.0.CO;2 Hendry, A., McCormick, G.C., 1976. Radar observations of the alignment of precipitation particles by electrostatic fields in thunderstorms. Journal of Geophysical Research 81, 5353-5357. doi:10.1029/JC081i030p05353 Keenan, T., Glasson, K., Cummings, F., Bird, T.S., Keeler, J., Lutz, J., 1998. The BMRC/NCAR C-Band Polarimetric (C-POL) Radar System. Journal of Atmospheric and Oceanic Technology 15, 871-886. doi:10.1175/1520-0426(1998)0152.0.CO;2

  9. International Spinal Cord Injury Core Data Set (version 2.0)-including standardization of reporting.

    Science.gov (United States)

    Biering-Sørensen, F; DeVivo, M J; Charlifue, S; Chen, Y; New, P W; Noonan, V; Post, M W M; Vogel, L

    2017-08-01

    The study design includes expert opinion, feedback, revisions and final consensus. The objective of the study was to present the new knowledge obtained since the International Spinal Cord Injury (SCI) Core Data Set (Version 1.0) published in 2006, and describe the adjustments made in Version 2.0, including standardization of data reporting. International. Comments received from the SCI community were discussed in a working group (WG); suggestions from the WG were reviewed and revisions were made. All suggested revisions were considered, and a final version was circulated for final approval. The International SCI Core Data Set (Version 2.0) consists of 25 variables. Changes made to this version include the deletion of one variable 'Total Days Hospitalized' and addition of two variables 'Date of Rehabilitation Admission' and 'Date of Death.' The variable 'Injury Etiology' was extended with six non-traumatic categories, and corresponding 'Date of Injury' for non-traumatic cases, was defined as the date of first physician visit for symptoms related to spinal cord dysfunction. A category reflecting transgender was added. A response category was added to the variable on utilization of ventilatory assistance to document the use of continuous positive airway pressure for sleep apnea. Other clarifications were made to the text. The reporting of the pediatric SCI population was updated as age groups 0-5, 6-12, 13-14, 15-17 and 18-21. Collection of the core data set should be a basic requirement of all studies of SCI to facilitate accurate descriptions of patient populations and comparison of results across published studies from around the world.

  10. Reconstruction of incomplete satellite SST data sets based on EOF method

    Institute of Scientific and Technical Information of China (English)

    DING Youzhuan; WEI Zhihui; MAO Zhihua; WANG Xiaofei; PAN Delu

    2009-01-01

    As for the satellite remote sensing data obtained by the visible and infrared bands inversion, the clouds coverage in the sky over the ocean often results in missing data of inversion products on a large scale, and thin clouds difficult to be detected would cause the data of the inversion products to be abnormal. Alvera et al.(2005) proposed a method for the reconstruction of missing data based on an Empirical Orthogonal Functions (EOF) decomposition, but his method couldn't process these images presenting extreme cloud coverage(more than 95%), and required a long time for reconstruction. Besides, the abnormal data in the images had a great effect on the reconstruction result.Therefore, this paper tries to improve the study result. It has reconstructed missing data sets by twice applying EOF decomposition method. Firstly, the abnormity time has been detected by analyzing the temporal modes of EOF decomposition, and the abnormal data have been eliminated.Secondly, the data sets, excluding the abnormal data, are analyzed by using EOF decomposition,and then the temporal modes undergo a filtering process so as to enhance the ability of reconstructing the images which are of no or just a little data, by using EOF. At last, this method has been applied to a large data set, i.e. 43 Sea Surface Temperature (SST) satellite images of the Changjiang River (Yangtze River) estuary and its adjacent areas, and the total reconstruction root mean square error (RMSE) is 0.82℃. And it has been proved that this improved EOF reconstruction method is robust for reconstructing satellite missing data and unreliable data.

  11. Structure of the tropical lower stratosphere as revealed by three reanalysis data sets

    Energy Technology Data Exchange (ETDEWEB)

    Pawson, S. [Free Univ. of Berlin (Germany). Institute for Meteorology; Fiorino, M. [Lawrence Livermore National Lab., CA (United States)

    1996-05-01

    While the skill of climate simulation models has advanced over the last decade, mainly through improvements in modeling, further progress will depend on the availability and the quality of comprehensive validation data sets covering long time periods. A new source of such validation data is atmospheric {open_quotes}reanalysis{close_quotes} where a fixed, state-of-the-art global atmospheric model/data assimilation system is run through archived and recovered observations to produce a consistent set of atmospheric analyses. Although reanalysis will be free of non-physical variability caused by changes in the models and/or the assimilation procedure, it is necessary to assess its quality. A region for stringent testing of the quality of reanalysis is the tropical lower stratosphere. This portion of the atmosphere is sparse in observations but displays the prominent quasi-biennial oscillation (QBO) and an annual cycle, neither of which is fully understood, but which are likely coupled dynamically. We first consider the performance of three reanalyses, from NCEP/NCAR, NASA and ECMWF, against rawinsonde data in depicting the QBO and then examine the structure of the tropical lower stratosphere in NCEP and ECMWF data sets in detail. While the annual cycle and the QBO in wind and temperature are quite successfully represented, the mean meridional circulations in NCEP and ECMWF data sets contain unusual features which may be due to the assimilation process rather than being physically based. Further, the models capture the long-term temperature fluctuations associated with volcanic eruptions, even though the physical mechanisms are not included, thus implying that the model does not mask prominent stratospheric signals in the observational data. We conclude that reanalysis offers a unique opportunity to better understand the dynamics of QBO and can be applied to climate model validation.

  12. Security Optimization for Distributed Applications Oriented on Very Large Data Sets

    Directory of Open Access Journals (Sweden)

    Mihai DOINEA

    2010-01-01

    Full Text Available The paper presents the main characteristics of applications which are working with very large data sets and the issues related to security. First section addresses the optimization process and how it is approached when dealing with security. The second section describes the concept of very large datasets management while in the third section the risks related are identified and classified. Finally, a security optimization schema is presented with a cost-efficiency analysis upon its feasibility. Conclusions are drawn and future approaches are identified.

  13. Purging putative siblings from population genetic data sets: a cautionary view.

    Science.gov (United States)

    Waples, Robin S; Anderson, Eric C

    2017-03-01

    Interest has surged recently in removing siblings from population genetic data sets before conducting downstream analyses. However, even if the pedigree is inferred correctly, this has the potential to do more harm than good. We used computer simulations and empirical samples of coho salmon to evaluate strategies for adjusting samples to account for family structure. We compared performance in full samples and sibling-reduced samples of estimators of allele frequency (P^), population differentiation (F^ST) and effective population size (N^e).

  14. Evaluation of Remote Sensing Data Sets for Improving Prediction of Biodiversity in South America

    Science.gov (United States)

    McDonald, K. C.; Carnaval, A.; Waltari, E.; Podest, E.; Schröder, R.; Norman, J.; Nkansah-Dwamena, E.; Vorosmarty, C. J.

    2011-12-01

    For the last 40 years, the fields of evolutionary biogeography and conservation biology witnessed substantial improvement and usage of correlative species distribution models (also known as ecological niche models) in studies of biodiversity patterns and their underlying processes. Thanks to a suite of new algorithms and the integration of maximum entropy concepts into the biological sciences, field scientists are now able to better predict species ranges from environmental surrogates. To date, biologists have been relying on a single major set of global environmental grids for the purpose of both delimitating species environmental envelopes and projecting envelopes through space and time. The data set, also known as the WorldClim database, relies on measurements of elevation, precipitation, mean, maximum and minimum temperature collected from weather-stations across the world. These data were used to derive worldwide grids, at 1 km resolution, through interpolation of average monthly climate data from stations. Nineteen bioclimatic grids have been derived from the air temperature and precipitation values and have been used extensively in predictive studies. Although these WorldClim grids have been successfully applied to a suite of ecological and evolutionary research questions, their performance can be suboptimal. This is especially true in topographically complex areas, where interpolation methods fail to capture true variation in local climate, in biological systems impacted by environmental phenomena occurring at finer temporal or spatial scales, and in regions with few weather stations. Products derived from remote sensing data offer a unique insight into land surface processes and biophysical drivers not provided by interpolated or model-based data sets. Regionally contiguous and temporally consistent remote sensing measurements of key environmental fields respond directly to variables potentially affecting biome physiology. Such consistent synoptic data

  15. Envision: An interactive system for the management and visualization of large geophysical data sets

    Science.gov (United States)

    Searight, K. R.; Wojtowicz, D. P.; Walsh, J. E.; Pathi, S.; Bowman, K. P.; Wilhelmson, R. B.

    1995-01-01

    Envision is a software project at the University of Illinois and Texas A&M, funded by NASA's Applied Information Systems Research Project. It provides researchers in the geophysical sciences convenient ways to manage, browse, and visualize large observed or model data sets. Envision integrates data management, analysis, and visualization of geophysical data in an interactive environment. It employs commonly used standards in data formats, operating systems, networking, and graphics. It also attempts, wherever possible, to integrate with existing scientific visualization and analysis software. Envision has an easy-to-use graphical interface, distributed process components, and an extensible design. It is a public domain package, freely available to the scientific community.

  16. Use of transformed data sets in examination of relationship between growth of trees and weather parameters

    Directory of Open Access Journals (Sweden)

    Márton Edelényi

    2011-11-01

    Full Text Available Analysing and stability improving methods were studied for the examination of relationships between tree growth and meteorological factors according to our requirements. In order to explore more complex relations from primary data sets secondary data series had to be systematically transformed and a uniform analysis process was developed for their investigation. The structure of the Systematic Transformation Analysing Method (STAM has three main components. The first module derives input data from the original series without any essential changes. The transformation unit produces secondary data series using a moving window technique. The thirds component performs the examinations. STAM also allows the application in several other research fields.

  17. Political dreams, practical boundaries: the case of the Nursing Minimum Data Set, 1983-1990.

    Science.gov (United States)

    Hobbs, Jennifer

    2011-01-01

    The initial development of the Nursing Minimum Data Set (NMDS) was analyzed based on archival material from Harriet Werley and Norma Lang, two nurses involved with the project, and American Nurses Association materials. The process of identifying information to be included in the NMDS was contentious. Individual nurses argued on behalf of particular data because of a strong belief in how nursing practice (through information collection) should be structured. Little attention was paid to existing practice conditions that would ultimately determine whether the NMDS would be used.

  18. Comparison of pathway analysis approaches using lung cancer GWAS data sets.

    Directory of Open Access Journals (Sweden)

    Gordon Fehringer

    Full Text Available Pathway analysis has been proposed as a complement to single SNP analyses in GWAS. This study compared pathway analysis methods using two lung cancer GWAS data sets based on four studies: one a combined data set from Central Europe and Toronto (CETO; the other a combined data set from Germany and MD Anderson (GRMD. We searched the literature for pathway analysis methods that were widely used, representative of other methods, and had available software for performing analysis. We selected the programs EASE, which uses a modified Fishers Exact calculation to test for pathway associations, GenGen (a version of Gene Set Enrichment Analysis (GSEA, which uses a Kolmogorov-Smirnov-like running sum statistic as the test statistic, and SLAT, which uses a p-value combination approach. We also included a modified version of the SUMSTAT method (mSUMSTAT, which tests for association by averaging χ(2 statistics from genotype association tests. There were nearly 18000 genes available for analysis, following mapping of more than 300,000 SNPs from each data set. These were mapped to 421 GO level 4 gene sets for pathway analysis. Among the methods designed to be robust to biases related to gene size and pathway SNP correlation (GenGen, mSUMSTAT and SLAT, the mSUMSTAT approach identified the most significant pathways (8 in CETO and 1 in GRMD. This included a highly plausible association for the acetylcholine receptor activity pathway in both CETO (FDR≤0.001 and GRMD (FDR = 0.009, although two strong association signals at a single gene cluster (CHRNA3-CHRNA5-CHRNB4 drive this result, complicating its interpretation. Few other replicated associations were found using any of these methods. Difficulty in replicating associations hindered our comparison, but results suggest mSUMSTAT has advantages over the other approaches, and may be a useful pathway analysis tool to use alongside other methods such as the commonly used GSEA (GenGen approach.

  19. Measurement of the ZZ production cross section using the full CDF II data set

    Science.gov (United States)

    Aaltonen, T.; Amerio, S.; Amidei, D.; Anastassov, A.; Annovi, A.; Antos, J.; Apollinari, G.; Appel, J. A.; Arisawa, T.; Artikov, A.; Asaadi, J.; Ashmanskas, W.; Auerbach, B.; Aurisano, A.; Azfar, F.; Badgett, W.; Bae, T.; Barbaro-Galtieri, A.; Barnes, V. E.; Barnett, B. A.; Barria, P.; Bartos, P.; Bauce, M.; Bedeschi, F.; Behari, S.; Bellettini, G.; Bellinger, J.; Benjamin, D.; Beretvas, A.; Bhatti, A.; Bland, K. R.; Blumenfeld, B.; Bocci, A.; Bodek, A.; Bortoletto, D.; Boudreau, J.; Boveia, A.; Brigliadori, L.; Bromberg, C.; Brucken, E.; Budagov, J.; Budd, H. S.; Burkett, K.; Busetto, G.; Bussey, P.; Butti, P.; Buzatu, A.; Calamba, A.; Camarda, S.; Campanelli, M.; Canelli, F.; Carls, B.; Carlsmith, D.; Carosi, R.; Carrillo, S.; Casal, B.; Casarsa, M.; Castro, A.; Catastini, P.; Cauz, D.; Cavaliere, V.; Cavalli-Sforza, M.; Cerri, A.; Cerrito, L.; Chen, Y. C.; Chertok, M.; Chiarelli, G.; Chlachidze, G.; Cho, K.; Chokheli, D.; Clark, A.; Clarke, C.; Convery, M. E.; Conway, J.; Corbo, M.; Cordelli, M.; Cox, C. A.; Cox, D. J.; Cremonesi, M.; Cruz, D.; Cuevas, J.; Culbertson, R.; d'Ascenzo, N.; Datta, M.; de Barbaro, P.; Demortier, L.; Deninno, M.; D'Errico, M.; Devoto, F.; Di Canto, A.; Di Ruzza, B.; Dittmann, J. R.; Donati, S.; D'Onofrio, M.; Dorigo, M.; Driutti, A.; Ebina, K.; Edgar, R.; Elagin, A.; Erbacher, R.; Errede, S.; Esham, B.; Farrington, S.; Fernández Ramos, J. P.; Field, R.; Flanagan, G.; Forrest, R.; Franklin, M.; Freeman, J. C.; Frisch, H.; Funakoshi, Y.; Galloni, C.; Garfinkel, A. F.; Garosi, P.; Gerberich, H.; Gerchtein, E.; Giagu, S.; Giakoumopoulou, V.; Gibson, K.; Ginsburg, C. M.; Giokaris, N.; Giromini, P.; Giurgiu, G.; Glagolev, V.; Glenzinski, D.; Gold, M.; Goldin, D.; Golossanov, A.; Gomez, G.; Gomez-Ceballos, G.; Goncharov, M.; González López, O.; Gorelov, I.; Goshaw, A. T.; Goulianos, K.; Gramellini, E.; Grinstein, S.; Grosso-Pilcher, C.; Group, R. C.; Guimaraes da Costa, J.; Hahn, S. R.; Han, J. Y.; Happacher, F.; Hara, K.; Hare, M.; Harr, R. F.; Harrington-Taber, T.; Hatakeyama, K.; Hays, C.; Heinrich, J.; Herndon, M.; Hocker, A.; Hong, Z.; Hopkins, W.; Hou, S.; Hughes, R. E.; Husemann, U.; Hussein, M.; Huston, J.; Introzzi, G.; Iori, M.; Ivanov, A.; James, E.; Jang, D.; Jayatilaka, B.; Jeon, E. J.; Jindariani, S.; Jones, M.; Joo, K. K.; Jun, S. Y.; Junk, T. R.; Kambeitz, M.; Kamon, T.; Karchin, P. E.; Kasmi, A.; Kato, Y.; Ketchum, W.; Keung, J.; Kilminster, B.; Kim, D. H.; Kim, H. S.; Kim, J. E.; Kim, M. J.; Kim, S. H.; Kim, S. B.; Kim, Y. J.; Kim, Y. K.; Kimura, N.; Kirby, M.; Knoepfel, K.; Kondo, K.; Kong, D. J.; Konigsberg, J.; Kotwal, A. V.; Kreps, M.; Kroll, J.; Kruse, M.; Kuhr, T.; Kurata, M.; Laasanen, A. T.; Lammel, S.; Lancaster, M.; Lannon, K.; Latino, G.; Lee, H. S.; Lee, J. S.; Leo, S.; Leone, S.; Lewis, J. D.; Limosani, A.; Lipeles, E.; Lister, A.; Liu, H.; Liu, Q.; Liu, T.; Lockwitz, S.; Loginov, A.; Lucchesi, D.; Lucà, A.; Lueck, J.; Lujan, P.; Lukens, P.; Lungu, G.; Lys, J.; Lysak, R.; Madrak, R.; Maestro, P.; Malik, S.; Manca, G.; Manousakis-Katsikakis, A.; Marchese, L.; Margaroli, F.; Marino, P.; Martínez, M.; Matera, K.; Mattson, M. E.; Mazzacane, A.; Mazzanti, P.; McNulty, R.; Mehta, A.; Mehtala, P.; Mesropian, C.; Miao, T.; Mietlicki, D.; Mitra, A.; Miyake, H.; Moed, S.; Moggi, N.; Moon, C. S.; Moore, R.; Morello, M. J.; Mukherjee, A.; Muller, Th.; Murat, P.; Mussini, M.; Nachtman, J.; Nagai, Y.; Naganoma, J.; Nakano, I.; Napier, A.; Nett, J.; Neu, C.; Nigmanov, T.; Nodulman, L.; Noh, S. Y.; Norniella, O.; Oakes, L.; Oh, S. H.; Oh, Y. D.; Oksuzian, I.; Okusawa, T.; Orava, R.; Ortolan, L.; Pagliarone, C.; Palencia, E.; Palni, P.; Papadimitriou, V.; Parker, W.; Pauletta, G.; Paulini, M.; Paus, C.; Phillips, T. J.; Piacentino, G.; Pianori, E.; Pilot, J.; Pitts, K.; Plager, C.; Pondrom, L.; Poprocki, S.; Potamianos, K.; Pranko, A.; Prokoshin, F.; Ptohos, F.; Punzi, G.; Ranjan, N.; Redondo Fernández, I.; Renton, P.; Rescigno, M.; Rimondi, F.; Ristori, L.; Robson, A.; Rodriguez, T.; Rolli, S.; Ronzani, M.; Roser, R.; Rosner, J. L.; Ruffini, F.; Ruiz, A.; Russ, J.; Rusu, V.; Sakumoto, W. K.; Sakurai, Y.; Santi, L.; Sato, K.; Saveliev, V.; Savoy-Navarro, A.; Schlabach, P.; Schmidt, E. E.; Schwarz, T.; Scodellaro, L.; Scuri, F.; Seidel, S.; Seiya, Y.; Semenov, A.; Sforza, F.; Shalhout, S. Z.; Shears, T.; Shepard, P. F.; Shimojima, M.; Shochet, M.; Shreyber-Tecker, I.; Simonenko, A.; Sliwa, K.; Smith, J. R.; Snider, F. D.; Song, H.; Sorin, V.; St. Denis, R.; Stancari, M.; Stentz, D.; Strologas, J.; Sudo, Y.; Sukhanov, A.; Suslov, I.; Takemasa, K.; Takeuchi, Y.; Tang, J.; Tecchio, M.; Teng, P. K.; Thom, J.; Thomson, E.; Thukral, V.; Toback, D.; Tokar, S.; Tollefson, K.; Tomura, T.; Tonelli, D.; Torre, S.; Torretta, D.; Totaro, P.; Trovato, M.; Ukegawa, F.; Uozumi, S.; Velev, G.; Vellidis, C.; Vernieri, C.; Vidal, M.; Vilar, R.; Vizán, J.; Vogel, M.; Volpi, G.; Vázquez, F.; Wagner, P.; Wallny, R.; Wang, S. M.; Waters, D.; Wester, W. C.; Whiteson, D.; Wicklund, A. B.; Wilbur, S.; Williams, H. H.; Wilson, J. S.; Wilson, P.; Winer, B. L.; Wittich, P.; Wolbers, S.; Wolfe, H.; Wright, T.; Wu, X.; Wu, Z.; Yamamoto, K.; Yamato, D.; Yang, T.; Yang, U. K.; Yang, Y. C.; Yao, W.-M.; Yeh, G. P.; Yi, K.; Yoh, J.; Yorita, K.; Yoshida, T.; Yu, G. B.; Yu, I.; Zanetti, A. M.; Zeng, Y.; Zhou, C.; Zucchelli, S.; CDF Collaboration

    2014-06-01

    We present a measurement of the ZZ-boson pair-production cross section in 1.96 TeV center-of-mass energy pp¯ collisions. We reconstruct final states incorporating four charged leptons or two charged leptons and two neutrinos from the full data set collected by the Collider Detector experiment at the Fermilab Tevatron, corresponding to 9.7 fb-1 of integrated luminosity. Combining the results obtained from each final state, we measure a cross section of 1.04-0.25+0.32 pb, in good agreement with the standard model prediction at next-to-leading order in the strong-interaction coupling.

  20. PrestoPronto: a code devoted to handling large data sets

    Science.gov (United States)

    Figueroa, S. J. A.; Prestipino, C.

    2016-05-01

    The software PrestoPronto consist to a full graphical user interface (GUI) program aimed to execute the analysis of large X-ray Absorption Spectroscopy data sets. Written in Python is free and open source. The code is able to read large datasets, apply calibration, alignment corrections and perform classical data analysis, from the extraction of the signal to EXAFS fit. The package includes also programs with GUIs] to perform, Principal Component Analysis and Linear Combination Fits. The main benefit of this program is allow to follow quickly the evolution of time resolved experiments coming from Quick-EXAFS (QEXAFS) and dispersive EXAFS beamlines.

  1. Barcoding of aphids (Hemiptera, Aphididae and Adelgidae): proper usage of the global data set.

    Science.gov (United States)

    Rakauskas, Rimantas; Bašilova, Jekaterina

    2013-01-01

    Basics of DNA barcoding suppose the creation and operation of an extensive library based on reliably (including possibility for validation) identified specimens. Therefore, information concerning morphological identification of the individual samples used for DNA barcoding, for example, identification keys and descriptions used, must be clearly explained. In addition, the maximum available data set of sequences must be used. Access to currently private data appears to be of special interest, especially when such possibility is provided by the database regulations, because it encourages the cooperation of research and saves both time and resources. The cryptic aphid species complexes Aphis oenotherae-holoenotherae and A. pomi-spiraecola are used to illustrate the above statements.

  2. Measurement of the ZZ production cross section using the full CDF II data set

    CERN Document Server

    Aaltonen, Timo Antero; Amidei, Dante E; Anastassov, Anton Iankov; Annovi, Alberto; Antos, Jaroslav; Apollinari, Giorgio; Appel, Jeffrey A; Arisawa, Tetsuo; Artikov, Akram Muzafarovich; Asaadi, Jonathan A; Ashmanskas, William Joseph; Auerbach, Benjamin; Aurisano, Adam J; Azfar, Farrukh A; Badgett, William Farris; Bae, Taegil; Barbaro-Galtieri, Angela; Barnes, Virgil E; Barnett, Bruce Arnold; Barria, Patrizia; Bartos, Pavol; Bauce, Matteo; Bedeschi, Franco; Behari, Satyajit; Bellettini, Giorgio; Bellinger, James Nugent; Benjamin, Douglas P; Beretvas, Andrew F; Bhatti, Anwar Ahmad; Bland, Karen Renee; Blumenfeld, Barry J; Bocci, Andrea; Bodek, Arie; Bortoletto, Daniela; Boudreau, Joseph Francis; Boveia, Antonio; Brigliadori, Luca; Bromberg, Carl Michael; Brucken, Erik; Budagov, Ioulian A; Budd, Howard Scott; Burkett, Kevin Alan; Busetto, Giovanni; Bussey, Peter John; Butti, Pierfrancesco; Buzatu, Adrian; Calamba, Aristotle; Camarda, Stefano; Campanelli, Mario; Canelli, Florencia; Carls, Benjamin; Carlsmith, Duncan L; Carosi, Roberto; Carrillo Moreno, Salvador; Casal Larana, Bruno; Casarsa, Massimo; Castro, Andrea; Catastini, Pierluigi; Cauz, Diego; Cavaliere, Viviana; Cavalli-Sforza, Matteo; Cerri, Alessandro; Cerrito, Lucio; Chen, Yen-Chu; Chertok, Maxwell Benjamin; Chiarelli, Giorgio; Chlachidze, Gouram; Cho, Kihyeon; Chokheli, Davit; Clark, Allan Geoffrey; Clarke, Christopher Joseph; Convery, Mary Elizabeth; Conway, John Stephen; Corbo, Matteo; Cordelli, Marco; Cox, Charles Alexander; Cox, David Jeremy; Cremonesi, Matteo; Cruz Alonso, Daniel; Cuevas Maestro, Javier; Culbertson, Raymond Lloyd; D'Ascenzo, Nicola; Datta, Mousumi; de Barbaro, Pawel; Demortier, Luc M; Deninno, Maria Maddalena; D'Errico, Maria; Devoto, Francesco; Di Canto, Angelo; Di Ruzza, Benedetto; Dittmann, Jay Richard; Donati, Simone; D'Onofrio, Monica; Dorigo, Mirco; Driutti, Anna; Ebina, Koji; Edgar, Ryan Christopher; Elagin, Andrey L; Erbacher, Robin D; Errede, Steven Michael; Esham, Benjamin; Farrington, Sinead Marie; Fernández Ramos, Juan Pablo; Field, Richard D; Flanagan, Gene U; Forrest, Robert David; Franklin, Melissa EB; Freeman, John Christian; Frisch, Henry J; Funakoshi, Yujiro; Galloni, Camilla; Garfinkel, Arthur F; Garosi, Paola; Gerberich, Heather Kay; Gerchtein, Elena A; Giagu, Stefano; Giakoumopoulou, Viktoria Athina; Gibson, Karen Ruth; Ginsburg, Camille Marie; Giokaris, Nikos D; Giromini, Paolo; Giurgiu, Gavril A; Glagolev, Vladimir; Glenzinski, Douglas Andrew; Gold, Michael S; Goldin, Daniel; Golossanov, Alexander; Gomez, Gervasio; Gomez-Ceballos, Guillelmo; Goncharov, Maxim T; González López, Oscar; Gorelov, Igor V; Goshaw, Alfred T; Goulianos, Konstantin A; Gramellini, Elena; Grinstein, Sebastian; Grosso-Pilcher, Carla; Group, Robert Craig; Guimaraes da Costa, Joao; Hahn, Stephen R; Han, Ji-Yeon; Happacher, Fabio; Hara, Kazuhiko; Hare, Matthew Frederick; Harr, Robert Francis; Harrington-Taber, Timothy; Hatakeyama, Kenichi; Hays, Christopher Paul; Heinrich, Joel G; Herndon, Matthew Fairbanks; Hocker, James Andrew; Hong, Ziqing; Hopkins, Walter Howard; Hou, Suen Ray; Hughes, Richard Edward; Husemann, Ulrich; Hussein, Mohammad; Huston, Joey Walter; Introzzi, Gianluca; Iori, Maurizio; Ivanov, Andrew Gennadievich; James, Eric B; Jang, Dongwook; Jayatilaka, Bodhitha Anjalike; Jeon, Eun-Ju; Jindariani, Sergo Robert; Jones, Matthew T; Joo, Kyung Kwang; Jun, Soon Yung; Junk, Thomas R; Kambeitz, Manuel; Kamon, Teruki; Karchin, Paul Edmund; Kasmi, Azeddine; Kato, Yukihiro; Ketchum, Wesley Robert; Keung, Justin Kien; Kilminster, Benjamin John; Kim, DongHee; Kim, Hyunsoo; Kim, Jieun; Kim, Min Jeong; Kim, Shin-Hong; Kim, Soo Bong; Kim, Young-Jin; Kim, Young-Kee; Kimura, Naoki; Kirby, Michael H; Knoepfel, Kyle James; Kondo, Kunitaka; Kong, Dae Jung; Konigsberg, Jacobo; Kotwal, Ashutosh Vijay; Kreps, Michal; Kroll, IJoseph; Kruse, Mark Charles; Kuhr, Thomas; Kurata, Masakazu; Laasanen, Alvin Toivo; Lammel, Stephan; Lancaster, Mark; Lannon, Kevin Patrick; Latino, Giuseppe; Lee, Hyun Su; Lee, Jaison; Leo, Sabato; Leone, Sandra; Lewis, Jonathan D; Limosani, Antonio; Lipeles, Elliot David; Lister, Alison; Liu, Hao; Liu, Qiuguang; Liu, Tiehui Ted; Lockwitz, Sarah E; Loginov, Andrey Borisovich; Lucchesi, Donatella; Lucà, Alessandra; Lueck, Jan; Lujan, Paul Joseph; Lukens, Patrick Thomas; Lungu, Gheorghe; Lys, Jeremy E; Lysak, Roman; Madrak, Robyn Leigh; Maestro, Paolo; Malik, Sarah Alam; Manca, Giulia; Manousakis-Katsikakis, Arkadios; Marchese, Luigi Marchese; Margaroli, Fabrizio; Marino, Christopher Phillip; Martínez-Perez, Mario; Matera, Keith; Mattson, Mark Edward; Mazzacane, Anna; Mazzanti, Paolo; McNulty, Ronan; Mehta, Andrew; Mehtala, Petteri; Mesropian, Christina; Miao, Ting; Mietlicki, David John; Mitra, Ankush; Miyake, Hideki; Moed, Shulamit; Moggi, Niccolo; Moon, Chang-Seong; Moore, Ronald Scott; Morello, Michael Joseph; Mukherjee, Aseet; Muller, Thomas; Murat, Pavel A; Mussini, Manuel; Nachtman, Jane Marie; Nagai, Yoshikazu; Naganoma, Junji; Nakano, Itsuo; Napier, Austin; Nett, Jason Michael; Neu, Christopher Carl; Nigmanov, Turgun S; Nodulman, Lawrence J; Noh, Seoyoung; Norniella Francisco, Olga; Oakes, Louise Beth; Oh, Seog Hwan; Oh, Young-do; Oksuzian, Iuri Artur; Okusawa, Toru; Orava, Risto Olavi; Ortolan, Lorenzo; Pagliarone, Carmine Elvezio; Palencia, Jose Enrique; Palni, Prabhakar; Papadimitriou, Vaia; Parker, William Chesluk; Pauletta, Giovanni; Paulini, Manfred; Paus, Christoph Maria Ernst; Phillips, Thomas J; Piacentino, Giovanni M; Pianori, Elisabetta; Pilot, Justin Robert; Pitts, Kevin T; Plager, Charles; Pondrom, Lee G; Poprocki, Stephen; Potamianos, Karolos Jozef; Pranko, Aliaksandr Pavlovich; Prokoshin, Fedor; Ptohos, Fotios K; Punzi, Giovanni; Ranjan, Niharika; Redondo Fernández, Ignacio; Renton, Peter B; Rescigno, Marco; Rimondi, Franco; Ristori, Luciano; Robson, Aidan; Rodriguez, Tatiana Isabel; Rolli, Simona; Ronzani, Manfredi; Roser, Robert Martin; Rosner, Jonathan L; Ruffini, Fabrizio; Ruiz Jimeno, Alberto; Russ, James S; Rusu, Vadim Liviu; Sakumoto, Willis Kazuo; Sakurai, Yuki; Santi, Lorenzo; Sato, Koji; Saveliev, Valeri; Savoy-Navarro, Aurore; Schlabach, Philip; Schmidt, Eugene E; Schwarz, Thomas A; Scodellaro, Luca; Scuri, Fabrizio; Seidel, Sally C; Seiya, Yoshihiro; Semenov, Alexei; Sforza, Federico; Shalhout, Shalhout Zaki; Shears, Tara G; Shepard, Paul F; Shimojima, Makoto; Shochet, Melvyn J; Shreyber-Tecker, Irina; Simonenko, Alexander V; Sliwa, Krzysztof Jan; Smith, John Rodgers; Snider, Frederick Douglas; Song, Hao; Sorin, Maria Veronica; St Denis, Richard Dante; Stancari, Michelle Dawn; Stentz, Dale James; Strologas, John; Sudo, Yuji; Sukhanov, Alexander I; Suslov, Igor M; Takemasa, Ken-ichi; Takeuchi, Yuji; Tang, Jian; Tecchio, Monica; Teng, Ping-Kun; Thom, Julia; Thomson, Evelyn Jean; Thukral, Vaikunth; Toback, David A; Tokar, Stanislav; Tollefson, Kirsten Anne; Tomura, Tomonobu; Tonelli, Diego; Torre, Stefano; Torretta, Donatella; Totaro, Pierluigi; Trovato, Marco; Ukegawa, Fumihiko; Uozumi, Satoru; Velev, Gueorgui; Vellidis, Konstantinos; Vernieri, Caterina; Vidal Marono, Miguel; Vilar Cortabitarte, Rocio; Vizán Garcia, Jesus Manuel; Vogel, Marcelo; Volpi, Guido; Vázquez-Valencia, Elsa Fabiola; Wagner, Peter; Wallny, Rainer S; Wang, Song-Ming; Waters, David S; Wester, William Carl; Whiteson, Daniel O; Wicklund, Arthur Barry; Wilbur, Scott; Williams, Hugh H; Wilson, Jonathan Samuel; Wilson, Peter James; Winer, Brian L; Wittich, Peter; Wolbers, Stephen A; Wolfe, Homer; Wright, Thomas Roland; Wu, Xin; Wu, Zhenbin; Yamamoto, Kazuhiro; Yamato, Daisuke; Yang, Tingjun; Yang, Un-Ki; Yang, Yu Chul; Yao, Wei-Ming; Yeh, Gong Ping; Yi, Kai; Yoh, John; Yorita, Kohei; Yoshida, Takuo; Yu, Geum Bong; Yu, Intae; Zanetti, Anna Maria; Zeng, Yu; Zhou, Chen; Zucchelli, Stefano

    2014-01-01

    We present a measurement of the ZZ boson-pair production cross section in 1.96 TeV center-of-mass energy ppbar collisions. We reconstruct final states incorporating four charged leptons or two charged leptons and two neutrinos from the full data set collected by the Collider Detector experiment at the Fermilab Tevatron, corresponding to 9.7 fb-1 of integrated luminosity. Combining the results obtained from each final state, we measure a cross section of 1.04(+0.32)(-0.25) pb, in good agreement with the standard model prediction at next-to-leading order in the strong-interaction coupling.

  3. The Role in the Virtual Astronomical Observatory in the Era of Massive Data Sets

    Science.gov (United States)

    Berriman, G. Bruce; Hanisch, Robert J.; Lazio, T. Joseph W.

    2012-01-01

    The Virtual Observatory (VO) is realizing global electronic integration of astronomy data. One of the long-term goals of the U.S. VO project, the Virtual Astronomical Observatory (VAO), is development of services and protocols that respond to the growing size and complexity of astronomy data sets. This paper describes how VAO staff are active in such development efforts, especially in innovative strategies and techniques that recognize the limited operating budgets likely available to astronomers even as demand increases. The project has a program of professional outreach whereby new services and protocols are evaluated.

  4. Measurement of the ZZ production cross section using the full CDF II data set

    CERN Document Server

    Aaltonen, Timo Antero; Amidei, Dante E; Anastassov, Anton Iankov; Annovi, Alberto; Antos, Jaroslav; Apollinari, Giorgio; Appel, Jeffrey A; Arisawa, Tetsuo; Artikov, Akram Muzafarovich; Asaadi, Jonathan A; Ashmanskas, William Joseph; Auerbach, Benjamin; Aurisano, Adam J; Azfar, Farrukh A; Badgett, William Farris; Bae, Taegil; Barbaro-Galtieri, Angela; Barnes, Virgil E; Barnett, Bruce Arnold; Barria, Patrizia; Bartos, Pavol; Bauce, Matteo; Bedeschi, Franco; Behari, Satyajit; Bellettini, Giorgio; Bellinger, James Nugent; Benjamin, Douglas P; Beretvas, Andrew F; Bhatti, Anwar Ahmad; Bland, Karen Renee; Blumenfeld, Barry J; Bocci, Andrea; Bodek, Arie; Bortoletto, Daniela; Boudreau, Joseph Francis; Boveia, Antonio; Brigliadori, Luca; Bromberg, Carl Michael; Brucken, Erik; Budagov, Ioulian A; Budd, Howard Scott; Burkett, Kevin Alan; Busetto, Giovanni; Bussey, Peter John; Butti, Pierfrancesco; Buzatu, Adrian; Calamba, Aristotle; Camarda, Stefano; Campanelli, Mario; Canelli, Florencia; Carls, Benjamin; Carlsmith, Duncan L; Carosi, Roberto; Carrillo Moreno, Salvador; Casal Larana, Bruno; Casarsa, Massimo; Castro, Andrea; Catastini, Pierluigi; Cauz, Diego; Cavaliere, Viviana; Cavalli-Sforza, Matteo; Cerri, Alessandro; Cerrito, Lucio; Chen, Yen-Chu; Chertok, Maxwell Benjamin; Chiarelli, Giorgio; Chlachidze, Gouram; Cho, Kihyeon; Chokheli, Davit; Clark, Allan Geoffrey; Clarke, Christopher Joseph; Convery, Mary Elizabeth; Conway, John Stephen; Corbo, Matteo; Cordelli, Marco; Cox, Charles Alexander; Cox, David Jeremy; Cremonesi, Matteo; Cruz Alonso, Daniel; Cuevas Maestro, Javier; Culbertson, Raymond Lloyd; D'Ascenzo, Nicola; Datta, Mousumi; de Barbaro, Pawel; Demortier, Luc M; Deninno, Maria Maddalena; D'Errico, Maria; Devoto, Francesco; Di Canto, Angelo; Di Ruzza, Benedetto; Dittmann, Jay Richard; Donati, Simone; D'Onofrio, Monica; Dorigo, Mirco; Driutti, Anna; Ebina, Koji; Edgar, Ryan Christopher; Elagin, Andrey L; Erbacher, Robin D; Errede, Steven Michael; Esham, Benjamin; Farrington, Sinead Marie; Fernández Ramos, Juan Pablo; Field, Richard D; Flanagan, Gene U; Forrest, Robert David; Franklin, Melissa EB; Freeman, John Christian; Frisch, Henry J; Funakoshi, Yujiro; Galloni, Camilla; Garfinkel, Arthur F; Garosi, Paola; Gerberich, Heather Kay; Gerchtein, Elena A; Giagu, Stefano; Giakoumopoulou, Viktoria Athina; Gibson, Karen Ruth; Ginsburg, Camille Marie; Giokaris, Nikos D; Giromini, Paolo; Giurgiu, Gavril A; Glagolev, Vladimir; Glenzinski, Douglas Andrew; Gold, Michael S; Goldin, Daniel; Golossanov, Alexander; Gomez, Gervasio; Gomez-Ceballos, Guillelmo; Goncharov, Maxim T; González López, Oscar; Gorelov, Igor V; Goshaw, Alfred T; Goulianos, Konstantin A; Gramellini, Elena; Grinstein, Sebastian; Grosso-Pilcher, Carla; Group, Robert Craig; Guimaraes da Costa, Joao; Hahn, Stephen R; Han, Ji-Yeon; Happacher, Fabio; Hara, Kazuhiko; Hare, Matthew Frederick; Harr, Robert Francis; Harrington-Taber, Timothy; Hatakeyama, Kenichi; Hays, Christopher Paul; Heinrich, Joel G; Herndon, Matthew Fairbanks; Hocker, James Andrew; Hong, Ziqing; Hopkins, Walter Howard; Hou, Suen Ray; Hughes, Richard Edward; Husemann, Ulrich; Hussein, Mohammad; Huston, Joey Walter; Introzzi, Gianluca; Iori, Maurizio; Ivanov, Andrew Gennadievich; James, Eric B; Jang, Dongwook; Jayatilaka, Bodhitha Anjalike; Jeon, Eun-Ju; Jindariani, Sergo Robert; Jones, Matthew T; Joo, Kyung Kwang; Jun, Soon Yung; Junk, Thomas R; Kambeitz, Manuel; Kamon, Teruki; Karchin, Paul Edmund; Kasmi, Azeddine; Kato, Yukihiro; Ketchum, Wesley Robert; Keung, Justin Kien; Kilminster, Benjamin John; Kim, DongHee; Kim, Hyunsoo; Kim, Jieun; Kim, Min Jeong; Kim, Shin-Hong; Kim, Soo Bong; Kim, Young-Jin; Kim, Young-Kee; Kimura, Naoki; Kirby, Michael H; Knoepfel, Kyle James; Kondo, Kunitaka; Kong, Dae Jung; Konigsberg, Jacobo; Kotwal, Ashutosh Vijay; Kreps, Michal; Kroll, IJoseph; Kruse, Mark Charles; Kuhr, Thomas; Kurata, Masakazu; Laasanen, Alvin Toivo; Lammel, Stephan; Lancaster, Mark; Lannon, Kevin Patrick; Latino, Giuseppe; Lee, Hyun Su; Lee, Jaison; Leo, Sabato; Leone, Sandra; Lewis, Jonathan D; Limosani, Antonio; Lipeles, Elliot David; Lister, Alison; Liu, Hao; Liu, Qiuguang; Liu, Tiehui Ted; Lockwitz, Sarah E; Loginov, Andrey Borisovich; Lucchesi, Donatella; Lucà, Alessandra; Lueck, Jan; Lujan, Paul Joseph; Lukens, Patrick Thomas; Lungu, Gheorghe; Lys, Jeremy E; Lysak, Roman; Madrak, Robyn Leigh; Maestro, Paolo; Malik, Sarah Alam; Manca, Giulia; Manousakis-Katsikakis, Arkadios; Marchese, Luigi; Margaroli, Fabrizio; Marino, Christopher Phillip; Martínez-Perez, Mario; Matera, Keith; Mattson, Mark Edward; Mazzacane, Anna; Mazzanti, Paolo; McNulty, Ronan; Mehta, Andrew; Mehtala, Petteri; Mesropian, Christina; Miao, Ting; Mietlicki, David John; Mitra, Ankush; Miyake, Hideki; Moed, Shulamit; Moggi, Niccolo; Moon, Chang-Seong; Moore, Ronald Scott; Morello, Michael Joseph; Mukherjee, Aseet; Muller, Thomas; Murat, Pavel A; Mussini, Manuel; Nachtman, Jane Marie; Nagai, Yoshikazu; Naganoma, Junji; Nakano, Itsuo; Napier, Austin; Nett, Jason Michael; Neu, Christopher Carl; Nigmanov, Turgun S; Nodulman, Lawrence J; Noh, Seoyoung; Norniella Francisco, Olga; Oakes, Louise Beth; Oh, Seog Hwan; Oh, Young-do; Oksuzian, Iuri Artur; Okusawa, Toru; Orava, Risto Olavi; Ortolan, Lorenzo; Pagliarone, Carmine Elvezio; Palencia, Jose Enrique; Palni, Prabhakar; Papadimitriou, Vaia; Parker, William Chesluk; Pauletta, Giovanni; Paulini, Manfred; Paus, Christoph Maria Ernst; Phillips, Thomas J; Piacentino, Giovanni M; Pianori, Elisabetta; Pilot, Justin Robert; Pitts, Kevin T; Plager, Charles; Pondrom, Lee G; Poprocki, Stephen; Potamianos, Karolos Jozef; Pranko, Aliaksandr Pavlovich; Prokoshin, Fedor; Ptohos, Fotios K; Punzi, Giovanni; Ranjan, Niharika; Redondo Fernández, Ignacio; Renton, Peter B; Rescigno, Marco; Rimondi, Franco; Ristori, Luciano; Robson, Aidan; Rodriguez, Tatiana Isabel; Rolli, Simona; Ronzani, Manfredi; Roser, Robert Martin; Rosner, Jonathan L; Ruffini, Fabrizio; Ruiz Jimeno, Alberto; Russ, James S; Rusu, Vadim Liviu; Sakumoto, Willis Kazuo; Sakurai, Yuki; Santi, Lorenzo; Sato, Koji; Saveliev, Valeri; Savoy-Navarro, Aurore; Schlabach, Philip; Schmidt, Eugene E; Schwarz, Thomas A; Scodellaro, Luca; Scuri, Fabrizio; Seidel, Sally C; Seiya, Yoshihiro; Semenov, Alexei; Sforza, Federico; Shalhout, Shalhout Zaki; Shears, Tara G; Shepard, Paul F; Shimojima, Makoto; Shochet, Melvyn J; Shreyber-Tecker, Irina; Simonenko, Alexander V; Sliwa, Krzysztof Jan; Smith, John Rodgers; Snider, Frederick Douglas; Song, Hao; Sorin, Maria Veronica; St Denis, Richard Dante; Stancari, Michelle Dawn; Stentz, Dale James; Strologas, John; Sudo, Yuji; Sukhanov, Alexander I; Suslov, Igor M; Takemasa, Ken-ichi; Takeuchi, Yuji; Tang, Jian; Tecchio, Monica; Teng, Ping-Kun; Thom, Julia; Thomson, Evelyn Jean; Thukral, Vaikunth; Toback, David A; Tokar, Stanislav; Tollefson, Kirsten Anne; Tomura, Tomonobu; Tonelli, Diego; Torre, Stefano; Torretta, Donatella; Totaro, Pierluigi; Trovato, Marco; Ukegawa, Fumihiko; Uozumi, Satoru; Velev, Gueorgui; Vellidis, Konstantinos; Vernieri, Caterina; Vidal Marono, Miguel; Vilar Cortabitarte, Rocio; Vizán Garcia, Jesus Manuel; Vogel, Marcelo; Volpi, Guido; Vázquez-Valencia, Elsa Fabiola; Wagner, Peter; Wallny, Rainer S; Wang, Song-Ming; Waters, David S; Wester, William Carl; Whiteson, Daniel O; Wicklund, Arthur Barry; Wilbur, Scott; Williams, Hugh H; Wilson, Jonathan Samuel; Wilson, Peter James; Winer, Brian L; Wittich, Peter; Wolbers, Stephen A; Wolfe, Homer; Wright, Thomas Roland; Wu, Xin; Wu, Zhenbin; Yamamoto, Kazuhiro; Yamato, Daisuke; Yang, Tingjun; Yang, Un-Ki; Yang, Yu Chul; Yao, Wei-Ming; Yeh, Gong Ping; Yi, Kai; Yoh, John; Yorita, Kohei; Yoshida, Takuo; Yu, Geum Bong; Yu, Intae; Zanetti, Anna Maria; Zeng, Yu; Zhou, Chen; Zucchelli, Stefano

    2014-06-03

    We present a measurement of the ZZ boson-pair production cross section in 1.96 TeV center-of-mass energy ppbar collisions. We reconstruct final states incorporating four charged leptons or two charged leptons and two neutrinos from the full data set collected by the Collider Detector experiment at the Fermilab Tevatron, corresponding to 9.7 fb-1 of integrated luminosity. Combining the results obtained from each final state, we measure a cross section of 1.04(+0.32)(-0.25) pb, in good agreement with the standard model prediction at next-to-leading order in the strong-interaction coupling.

  5. Pepper and Sesame Chicken

    Institute of Scientific and Technical Information of China (English)

    1994-01-01

    Ingredients: 250 grams of chicken breast, 50 grams of water chestnut, thick pieces of white bread or steamed bun. Supplementary Ingredients: Sesame, lard, MSG, salt, whites of three eggs, starch. Directions: Chop up the chicken breast into mash, cut the water chestnuts into small pieces and put them in a bowl. Mix in the supplementary ingredients. Spread the mixed mash onto the bread pieces and roll them in sesame. Heat 250 grams of oil. When hot, put in the pieces one by one. When the pieces turn

  6. Strategy for Developing Local Chicken

    Directory of Open Access Journals (Sweden)

    Sofjan Iskandar

    2006-12-01

    Full Text Available Chicken industry in Indonesia offer jobs for people in the village areas . The balance in development industry of selected and local chicken has to be anticipated as there has been threat of reducing importation of grand parent stock of selected chicken due to global avian influenza . In the mean time, high appreciation to the local chicken has been shown by the existence of local chicken farms in the size of business scale . For local chicken business, the government has been built programs, projects, and infrastructures, although the programs and projects were dropped scattered in to several institutions, which were end up with less significant impact to the people. Therefore, it is the time that the government should put more efforts to integrate various sources . focusing in enhancing local chicken industry .

  7. Soil quality assessment in rice production systems: establishing a minimum data set.

    Science.gov (United States)

    Rodrigues de Lima, Ana Cláudia; Hoogmoed, Willem; Brussaard, Lijbert

    2008-01-01

    Soil quality, as a measure of the soil's capacity to function, can be assessed by indicators based on physical, chemical, and biological properties. Here we report on the assessment of soil quality in 21 rice (Oryza sativa) fields under three rice production systems (semi-direct, pre-germinated, and conventional) on four soil textural classes in the Camaquã region of Rio Grande do Sul, Brazil. The objectives of our study were: (i) to identify soil quality indicators that discriminate both management systems and soil textural classes, (ii) to establish a minimum data set of soil quality indicators and (iii) to test whether this minimum data set is correlated with yield. Twenty-nine soil biological, chemical, and physical properties were evaluated to characterize regional soil quality. Soil quality assessment was based on factor and discriminant analysis. Bulk density, available water, and micronutrients (Cu, Zn, and Mn) were the most powerful soil properties in distinguishing among different soil textural classes. Organic matter, earthworms, micronutrients (Cu and Mn), and mean weight diameter were the most powerful soil properties in assessing differences in soil quality among the rice management systems. Manganese was the property most strongly correlated with yield (adjusted r2 = 0.365, P = 0.001). The merits of sub-dividing samples according to texture and the linkage between soil quality indicators, soil functioning, plant performance, and soil management options are discussed in particular.

  8. A variant reference data set for the Africanized honeybee, Apis mellifera

    Science.gov (United States)

    Kadri, Samir M.; Harpur, Brock A.; Orsi, Ricardo O.; Zayed, Amro

    2016-01-01

    The Africanized honeybee (AHB) is a population of Apis mellifera found in the Americas. AHBs originated in 1956 in Rio Clara, Brazil where imported African A. m. scutellata escaped and hybridized with local populations of European A. mellifera. Africanized populations can now be found from Northern Argentina to the Southern United States. AHBs—often referred to as ‘Killer Bees’— are a major concern to the beekeeping industry as well as a model for the evolutionary genetics of colony defence. We performed high coverage pooled-resequencing of 360 diploid workers from 30 Brazilian AHB colonies using Illumina Hi-Seq (150 bp PE). This yielded a high density SNP data set with an average read depth at each site of 20.25 reads. With 3,606,720 SNPs and 155,336 SNPs within 11,365 genes, this data set is the largest genomic resource available for AHBs and will enable high-resolution studies of the population dynamics, evolution, and genetics of this successful biological invader, in addition to facilitating the development of SNP-based tools for identifying AHBs. PMID:27824336

  9. On the geometry and topology of initial data sets with horizons

    CERN Document Server

    Andersson, Lars; Galloway, Gregory J; Pollack, Daniel

    2015-01-01

    We study the relationship between initial data sets with horizons and the existence of metrics of positive scalar curvature. We define a Cauchy Domain of Outer Communications (CDOC) to be an asymptotically flat initial set $(M, g, K)$ such that the boundary $\\partial M$ of $M$ is a collection of Marginally Outer (or Inner) Trapped Surfaces (MOTSs and/or MITSs) and such that $M\\setminus \\partial M$ contains no MOTSs or MITSs. This definition is meant to capture, on the level of the initial data sets, the well known notion of the domain of outer communications (DOC) as the region of spacetime outside of all the black holes (and white holes). Our main theorem establishes that in dimensions $3\\leq n \\leq 7$, a CDOC which satisfies the dominant energy condition and has a strictly stable boundary has a positive scalar curvature metric which smoothly compactifies the asymptotically flat end and is a Riemannian product metric near the boundary where the cross sectional metric is conformal to a small perturbation of t...

  10. PMCR-Miner: parallel maximal confident association rules miner algorithm for microarray data set.

    Science.gov (United States)

    Zakaria, Wael; Kotb, Yasser; Ghaleb, Fayed F M

    2015-01-01

    The MCR-Miner algorithm is aimed to mine all maximal high confident association rules form the microarray up/down-expressed genes data set. This paper introduces two new algorithms: IMCR-Miner and PMCR-Miner. The IMCR-Miner algorithm is an extension of the MCR-Miner algorithm with some improvements. These improvements implement a novel way to store the samples of each gene into a list of unsigned integers in order to benefit using the bitwise operations. In addition, the IMCR-Miner algorithm overcomes the drawbacks faced by the MCR-Miner algorithm by setting some restrictions to ignore repeated comparisons. The PMCR-Miner algorithm is a parallel version of the new proposed IMCR-Miner algorithm. The PMCR-Miner algorithm is based on shared-memory systems and task parallelism, where no time is needed in the process of sharing and combining data between processors. The experimental results on real microarray data sets show that the PMCR-Miner algorithm is more efficient and scalable than the counterparts.

  11. Ensemble single column modeling (ESCM) in the tropical western Pacific: Forcing data sets and uncertainty analysis

    Science.gov (United States)

    Hume, Timothy; Jakob, Christian

    2005-07-01

    Single column models (SCMs) are useful tools for the evaluation of parameterisations of radiative and moist processes used in general circulation models (GCMs). Most SCM studies to date have concentrated on regions where there is a sufficiently dense observational network to derive the required forcing data. This paper describes an ensemble single column modeling (ESCM) approach where the forcing data are derived from numerical weather prediction (NWP) analysis products. To highlight the benefits of the ESCM approach, four forcing data sets were derived for a two year period at the Tropical Western Pacific ARM (Atmospheric Radiation Measurement Program) sites at Manus Island and Nauru. In the first section of the study, the NWP derived forcing data are validated against a range of observations at the tropical sites. In the second section, the sensitivity of two different SCMs to uncertainties in the forcing data sets are analysed. It is shown that despite the inherent uncertainties in the NWP derived forcing data, an ESCM approach is able to identify errors in the SCM physics. This suggests the ESCM approach is useful for testing parameterisations in relatively observation sparse regions, such as the TWP.

  12. A Bayesian outlier criterion to detect SNPs under selection in large data sets.

    Directory of Open Access Journals (Sweden)

    Mathieu Gautier

    Full Text Available BACKGROUND: The recent advent of high-throughput SNP genotyping technologies has opened new avenues of research for population genetics. In particular, a growing interest in the identification of footprints of selection, based on genome scans for adaptive differentiation, has emerged. METHODOLOGY/PRINCIPAL FINDINGS: The purpose of this study is to develop an efficient model-based approach to perform bayesian exploratory analyses for adaptive differentiation in very large SNP data sets. The basic idea is to start with a very simple model for neutral loci that is easy to implement under a bayesian framework and to identify selected loci as outliers via Posterior Predictive P-values (PPP-values. Applications of this strategy are considered using two different statistical models. The first one was initially interpreted in the context of populations evolving respectively under pure genetic drift from a common ancestral population while the second one relies on populations under migration-drift equilibrium. Robustness and power of the two resulting bayesian model-based approaches to detect SNP under selection are further evaluated through extensive simulations. An application to a cattle data set is also provided. CONCLUSIONS/SIGNIFICANCE: The procedure described turns out to be much faster than former bayesian approaches and also reasonably efficient especially to detect loci under positive selection.

  13. COLLABORATIVE RESEARCH: Parallel Analysis Tools and New Visualization Techniques for Ultra-Large Climate Data Set

    Energy Technology Data Exchange (ETDEWEB)

    middleton, Don [Co-PI; Haley, Mary

    2014-12-10

    ParVis was a project funded under LAB 10-05: “Earth System Modeling: Advanced Scientific Visualization of Ultra-Large Climate Data Sets”. Argonne was the lead lab with partners at PNNL, SNL, NCAR and UC-Davis. This report covers progress from January 1st, 2013 through Dec 1st, 2014. Two previous reports covered the period from Summer, 2010, through September 2011 and October 2011 through December 2012, respectively. While the project was originally planned to end on April 30, 2013, personnel and priority changes allowed many of the institutions to continue work through FY14 using existing funds. A primary focus of ParVis was introducing parallelism to climate model analysis to greatly reduce the time-to-visualization for ultra-large climate data sets. Work in the first two years was conducted on two tracks with different time horizons: one track to provide immediate help to climate scientists already struggling to apply their analysis to existing large data sets and another focused on building a new data-parallel library and tool for climate analysis and visualization that will give the field a platform for performing analysis and visualization on ultra-large datasets for the foreseeable future. In the final 2 years of the project, we focused mostly on the new data-parallel library and associated tools for climate analysis and visualization.

  14. Fresnel Volume Migration of the ISO89-3D data set

    Science.gov (United States)

    Hloušek, F.; Buske, S.

    2016-11-01

    This paper demonstrates the capabilities of Fresnel Volume Migration (FVM) for 3-D single-component seismic data in a crystalline environment. We show its application to the ISO89-3D data set, which was acquired in 1989 at the German continental deep drilling site (KTB) near Windischeschenbach (Southeast Germany). A key point in FVM is the derivation of the emergent angle for the recorded wavefield. This angle is used as the initial condition of the ray-tracing-algorithm within FVM. In order to limit the migration operator to the physically relevant part of a reflector, it is restricted to the Fresnel-volume around the backpropagated ray. We discuss different possibilities for an adequate choice of the used aperture for a local slant-stack algorithm using the semblance as a measure of the coherency for different emergent angles. Furthermore, we reduce the number of used receivers for this procedure using the Voronoi diagram, thereby leading to a more equal distribution of the receivers within the selected aperture. We demonstrate the performance of these methods for a simple 3-D synthetic example and show the results for the ISO89-3D data set. For the latter, our approach yields images of significantly better quality compared to previous investigations and allows for a detailed characterization of the subsurface. Even in migrated single shot gathers, structures are clearly visible due to the focusing achieved by FVM.

  15. Asteroseismic modeling of 16 Cyg A & B using the complete Kepler data set

    CERN Document Server

    Metcalfe, Travis S; Davies, Guy R

    2015-01-01

    Asteroseismology of bright stars with well-determined properties from parallax measurements and interferometry can yield precise stellar ages and meaningful constraints on the composition. We substantiate this claim with an updated asteroseismic analysis of the solar-analog binary system 16 Cyg A & B using the complete 30-month data sets from the Kepler space telescope. An analysis with the Asteroseismic Modeling Portal (AMP), using all of the available constraints to model each star independently, yields the same age ($t=7.0 \\pm 0.3$ Gyr) and composition ($Z=0.021 \\pm 0.002$, $Y_i=0.25 \\pm 0.01$) for both stars, as expected for a binary system. We quantify the accuracy of the derived stellar properties by conducting a similar analysis of a Kepler-like data set for the Sun, and we investigate how the reliability of asteroseismic inference changes when fewer observational constraints are available or when different fitting methods are employed. We find that our estimates of the initial helium mass fraction...

  16. Mesoscopic structures reveal the network between the layers of multiplex data sets

    Science.gov (United States)

    Iacovacci, Jacopo; Wu, Zhihao; Bianconi, Ginestra

    2015-10-01

    Multiplex networks describe a large variety of complex systems, whose elements (nodes) can be connected by different types of interactions forming different layers (networks) of the multiplex. Multiplex networks include social networks, transportation networks, or biological networks in the cell or in the brain. Extracting relevant information from these networks is of crucial importance for solving challenging inference problems and for characterizing the multiplex networks microscopic and mesoscopic structure. Here we propose an information theory method to extract the network between the layers of multiplex data sets, forming a "network of networks." We build an indicator function, based on the entropy of network ensembles, to characterize the mesoscopic similarities between the layers of a multiplex network, and we use clustering techniques to characterize the communities present in this network of networks. We apply the proposed method to study the Multiplex Collaboration Network formed by scientists collaborating on different subjects and publishing in the American Physical Society journals. The analysis of this data set reveals the interplay between the collaboration networks and the organization of knowledge in physics.

  17. Charon's radius and density from the combined data sets of the 2005 July 11 occultation

    CERN Document Server

    Person, M J; Elliot, J L; Gangestad, J; Gulbis, A A S; Pasachoff, J M; Souza, S P

    2006-01-01

    The 2005 July 11 C313.2 stellar occultation by Charon was observed by three separate research groups, including our own, at observatories throughout South America. Here, the published timings from the three data sets have been combined to more accurately determine the mean radius of Charon: 606.0 +/- 1.5 km. Our analysis indicates that a slight oblateness in the body (0.006 +/- 0.003) best matches the data, with a confidence level of 86%. The oblateness has a pole position angle of 71.4 deg +/- 10.4 deg and is consistent with Charon's pole position angle of 67 deg. Charon's mean radius corresponds to a bulk density of 1.63 +/- 0.07 g/cm3, which is significantly less than Pluto's (1.92 +/- 0.12 g/cm3). This density differential favors an impact formation scenario for the system in which at least one of the impactors was differentiated. Finally, unexplained differences between chord timings measured at Cerro Pachon and the rest of the data set could be indicative of a depression as deep as 7 km on Charon's limb...

  18. The All-Wavelength Extended Groth Strip International Survey(AEGIS) Data Sets

    Energy Technology Data Exchange (ETDEWEB)

    Davis, M.; Guhathakurta, P.; Konidaris, N.P.; Newman, J.A.; Ashby, M.L.N.; Biggs, A.D.; Barmby, P.; Bundy, K.; Chapman, S.C.; Coil,A.L.; Conselice, C.J.; Cooper, M.C.; Croton, D.J.; Eisenhardt, P.R.M.; Ellis, R.S.; Faber, S.M.; Fang, T.; Fazio, G.G.; Georgakakis, A.; Gerke,B.F.; Goss, W.M.; Gwyn, S.; Harker, J.; Hopkins, A.M.; Huang, J.-S.; Ivison, R.J.; Kassin, S.A.; Kirby, E.N.; Koekemoer, A.M.; Koo, D.C.; Laird, E.S.; Le Floc' h, E.; Lin, L.; Lotz, J.M.; Marshall, P.J.; Martin,D.C.; Metevier, A.J.; Moustakas, L.A.; Nandra, K.; Noeske, K.G.; Papovich, C.; Phillips, A.C.; Rich,R. M.; Rieke, G.H.; Rigopoulou, D.; Salim, S.; Schiminovich, D.; Simard, L.; Smail, I.; Small,T.A.; Weiner,B.J.; Willmer, C.N.A.; Willner, S.P.; Wilson, G.; Wright, E.L.; Yan, R.

    2006-10-13

    In this the first of a series of Letters, we present a description of the panchromatic data sets that have been acquired in the Extended Groth Strip region of the sky. Our survey, the All-wavelength Extended Groth Strip International Survey (AEGIS), is intended to study the physical properties and evolutionary processes of galaxies at z{approx}1. It includes the following deep, wide-field imaging data sets: Chandra/ACIS X-ray (0.5-10 keV), GALEX ultraviolet (1200-2500 Angstroms), CFHT/MegaCam Legacy Survey optical (3600-9000 Angstroms), CFHT/CFH12K optical (4500-9000 Angstroms), Hubble Space Telescope/ACS optical (4400-8500 Angstroms), Palomar/WIRC near-infrared (1.2-2.2 {micro}m), Spitzer/IRAC mid-infrared (3.6-8.0 {micro}m), Spitzer/MIPS far-infrared (24-70 {micro}m), and VLA radio continuum (6-20 cm). In addition, this region of the sky has been targeted for extensive spectroscopy using the DEIMOS spectrograph on the Keck II 10 m telescope. Our survey is compared to other large multiwavelength surveys in terms of depth and sky coverage.

  19. The All-wavelength Extended Groth Strip International Survey (AEGIS) Data Sets

    Energy Technology Data Exchange (ETDEWEB)

    Davis, M.; Guhathakurta, P.; Konidaris, N.; Newman, J.A.; Ashby, M.L.N.; Biggs, A.D.; Barmby, P.; Bundy, K.; Chapman, S.; Coil, A.L.; Conselice, C.; Cooper, M.; Croton,; Eisenhardt, P.; Ellis, R.; Faber, S.; Fang, T.; Fazio, G.G.; Georgakakis, A.; Gerke, B.; Goss, W.M.; /UC, Berkeley, Astron. Dept. /Lick Observ. /LBL, Berkeley

    2006-07-21

    In this the first of a series of ''Letters'', we present a description of the panchromatic data sets that have been acquired in the Extended Groth Strip region of the sky. Our survey, the All-wavelength Extended Groth Strip International Survey (AEGIS), is intended to study the physical properties and evolutionary processes of galaxies at z {approx} 1. It includes the following deep, wide-field imaging data sets: Chandra/ACIS{sup 30} X-ray (0.5-10 keV), GALEX{sup 31} ultraviolet (1200-2500 A), CFHT/MegaCam Legacy Survey{sup 32} optical (3600-9000 {angstrom}), CFHT/CFH12K optical (4500-9000 {angstrom}), Hubble Space Telescope/ACS{sup 33} optical (4400-8500 {angstrom}), Palomar/WIRC{sup 34} near-infrared (1.2-2.2 {micro}m), Spitzer/IRAC{sup 35} mid-infrared (3.6-8.0 {micro}m), Spitzer/MIPS far-infrared (24-70 {micro}m), and VLA{sup 36} radio continuum (6-20 cm). In addition, this region of the sky has been targeted for extensive spectroscopy using the DEIMOS spectrograph on the Keck II 10 m telescope{sup 37}. Our survey is compared to other large multiwavelength surveys in terms of depth and sky coverage.

  20. A meteorological and snow observational data set from Snoqualmie Pass (921 m), Washington Cascades, USA

    Science.gov (United States)

    Wayand, Nicholas E.; Massmann, Adam; Butler, Colin; Keenan, Eric; Stimberis, John; Lundquist, Jessica D.

    2015-12-01

    We introduce a quality controlled observational atmospheric, snow, and soil data set from Snoqualmie Pass, Washington, USA, to enable testing of hydrometeorological and snow process representations within a rain-snow transitional climate where existing observations are sparse and limited. Continuous meteorological forcing (including air temperature, total precipitation, wind speed, specific humidity, air pressure, and short and longwave irradiance) are provided at hourly intervals for a 24 year historical period (water years 1989-2012) and at half-hourly intervals for a more recent period (water years 2013-2015), separated based on the availability of observations. The majority of missing data were filled with biased-corrected reanalysis model values (using NLDAS). Additional observations include 40 years of snow board new snow accumulation, multiple measurements of total snow depth, and manual snow pits, while more recent years include subdaily surface temperature, snowpack drainage, soil moisture and temperature profiles, and eddy covariance-derived turbulent heat flux. This data set is ideal for testing hypotheses about energy balance, soil, and snow processes in the rain-snow transition zone.

  1. A lake data set for the Tibetan Plateau from the 1960s, 2005, and 2014

    Science.gov (United States)

    Wan, Wei; Long, Di; Hong, Yang; Ma, Yingzhao; Yuan, Yuan; Xiao, Pengfeng; Duan, Hongtao; Han, Zhongying; Gu, Xingfa

    2016-01-01

    Long-term datasets of number and size of lakes over the Tibetan Plateau (TP) are among the most critical components for better understanding the interactions among the cryosphere, hydrosphere, and atmosphere at regional and global scales. Due to the harsh environment and the scarcity of data over the TP, data accumulation and sharing become more valuable for scientists worldwide to make new discoveries in this region. This paper, for the first time, presents a comprehensive and freely available data set of lakes’ status (name, location, shape, area, perimeter, etc.) over the TP region dating back to the 1960s, including three time series, i.e., the 1960s, 2005, and 2014, derived from ground survey (the 1960s) or high-spatial-resolution satellite images from the China-Brazil Earth Resources Satellite (CBERS) (2005) and China’s newly launched GaoFen-1 (GF-1, which means high-resolution images in Chinese) satellite (2014). The data set could provide scientists with useful information for revealing environmental changes and mechanisms over the TP region. PMID:27328160

  2. Novel approach to analysing large data sets of personal sun exposure measurements.

    Science.gov (United States)

    Blesić, Suzana M; Stratimirović, Đorđe I; Ajtić, Jelena V; Wright, Caradee Y; Allen, Martin W

    2016-11-01

    Personal sun exposure measurements provide important information to guide the development of sun awareness and disease prevention campaigns. We assess the scaling properties of personal ultraviolet radiation (pUVR) sun exposure measurements using the wavelet transform (WT) spectral analysis to process long-range, high-frequency personal recordings collected by electronic UVR dosimeters designed to measure erythemal UVR exposure. We analysed the sun exposure recordings of school children, farmers, marathon runners and outdoor workers in South Africa, and construction workers and work site supervisors in New Zealand. We found scaling behaviour in all the analysed pUVR data sets. We found that the observed scaling changes from uncorrelated to long-range correlated with increasing duration of sun exposure. Peaks in the WT spectra that we found suggest the existence of characteristic times in sun exposure behaviour that were to some extent universal across our data set. Our study also showed that WT measures enable group classification, as well as distinction between individual UVR exposures, otherwise unattainable by conventional statistical methods.

  3. Dimensionality Reduction on Multi-Dimensional Transfer Functions for Multi-Channel Volume Data Sets

    Science.gov (United States)

    Kim, Han Suk; Schulze, Jürgen P.; Cone, Angela C.; Sosinsky, Gina E.; Martone, Maryann E.

    2011-01-01

    The design of transfer functions for volume rendering is a non-trivial task. This is particularly true for multi-channel data sets, where multiple data values exist for each voxel, which requires multi-dimensional transfer functions. In this paper, we propose a new method for multi-dimensional transfer function design. Our new method provides a framework to combine multiple computational approaches and pushes the boundary of gradient-based multi-dimensional transfer functions to multiple channels, while keeping the dimensionality of transfer functions at a manageable level, i.e., a maximum of three dimensions, which can be displayed visually in a straightforward way. Our approach utilizes channel intensity, gradient, curvature and texture properties of each voxel. Applying recently developed nonlinear dimensionality reduction algorithms reduces the high-dimensional data of the domain. In this paper, we use Isomap and Locally Linear Embedding as well as a traditional algorithm, Principle Component Analysis. Our results show that these dimensionality reduction algorithms significantly improve the transfer function design process without compromising visualization accuracy. We demonstrate the effectiveness of our new dimensionality reduction algorithms with two volumetric confocal microscopy data sets. PMID:21841914

  4. The All-wavelength Extended Groth Strip International Survey (AEGIS) Data Sets

    CERN Document Server

    Davis, M; Konidaris, N; Newman, J A; Ashby, M L N; Biggs, A D; Barmby, P; Bundy, K; Chapman, S; Coil, A L; Conselice, C J; Cooper, M; Croton, D; Eisenhardt, P; Ellis, R; Faber, S; Fang, T; Fazio, G G; Georgakakis, A; Gerke, B; Goss, W M; Gwyn, S; Harker, J; Hopkins, A; Huang, J S; Ivison, R J; Kassin, S A; Kirby, E; Koekemoer, A; Koo, D C; Laird, E; Le Floc'h, E; Lin, L; Lotz, J; Marshall, P J; Martin, D C; Metevier, A; Moustakas, L A; Nandra, K; Noeske, K F; Papovich, C; Phillips, A C; Rich, R M; Rieke, G H; Rigopoulou, D; Salim, S; Schiminovich, D; Simard, L; Smail, I; Small, T A; Weiner, B; Willmer, C N A; Willner, S P; Wilson, G; Wright, E; Yan, R

    2006-01-01

    In this the first of a series of Letters, we present a description of the panchromatic data sets that have been acquired in the Extended Groth Strip region of the sky. Our survey, the All-wavelength Extended Groth strip International Survey (AEGIS), is intended to study the physical properties and evolutionary processes of galaxies at z ~ 1. It includes the following deep, wide-field imaging data sets: Chandra/ACIS X-ray (0.5 - 10 keV), GALEX ultraviolet (1200 - 2500 Angstrom), CFHT/MegaCam Legacy Survey optical (3600 - 9000 Angstroms), CFHT/CFH12K optical (4500 - 9000 Angstroms), Hubble Space Telescope/ACS optical (4400 - 8500 Angstroms), Palomar/WIRC near-infrared (1.2 - 2.2 microns), Spitzer/IRAC mid-infrared (3.6 - 8.0 microns), Spitzer/MIPS far-infrared (24 - 70 microns), and VLA radio continuum (6 - 20 cm). In addition, this region of the sky has been targeted for extensive spectroscopy using the DEIMOS spectrograph on the Keck II 10 m telescope. Our survey is compared to other large multiwavelength sur...

  5. The All-Wavelength Extended Groth Strip International Survey(AEGIS) Data Sets

    Energy Technology Data Exchange (ETDEWEB)

    Davis, M.; Guhathakurta, P.; Konidaris, N.P.; Newman, J.A.; Ashby, M.L.N.; Biggs, A.D.; Barmby, P.; Bundy, K.; Chapman, S.C.; Coil,A.L.; Conselice, C.J.; Cooper, M.C.; Croton, D.J.; Eisenhardt, P.R.M.; Ellis, R.S.; Faber, S.M.; Fang, T.; Fazio, G.G.; Georgakakis, A.; Gerke,B.F.; Goss, W.M.; Gwyn, S.; Harker, J.; Hopkins, A.M.; Huang, J.-S.; Ivison, R.J.; Kassin, S.A.; Kirby, E.N.; Koekemoer, A.M.; Koo, D.C.; Laird, E.S.; Le Floc' h, E.; Lin, L.; Lotz, J.M.; Marshall, P.J.; Martin,D.C.; Metevier, A.J.; Moustakas, L.A.; Nandra, K.; Noeske, K.G.; Papovich, C.; Phillips, A.C.; Rich,R. M.; Rieke, G.H.; Rigopoulou, D.; Salim, S.; Schiminovich, D.; Simard, L.; Smail, I.; Small,T.A.; Weiner,B.J.; Willmer, C.N.A.; Willner, S.P.; Wilson, G.; Wright, E.L.; Yan, R.

    2006-10-13

    In this the first of a series of Letters, we present a description of the panchromatic data sets that have been acquired in the Extended Groth Strip region of the sky. Our survey, the All-wavelength Extended Groth Strip International Survey (AEGIS), is intended to study the physical properties and evolutionary processes of galaxies at z{approx}1. It includes the following deep, wide-field imaging data sets: Chandra/ACIS X-ray (0.5-10 keV), GALEX ultraviolet (1200-2500 Angstroms), CFHT/MegaCam Legacy Survey optical (3600-9000 Angstroms), CFHT/CFH12K optical (4500-9000 Angstroms), Hubble Space Telescope/ACS optical (4400-8500 Angstroms), Palomar/WIRC near-infrared (1.2-2.2 {micro}m), Spitzer/IRAC mid-infrared (3.6-8.0 {micro}m), Spitzer/MIPS far-infrared (24-70 {micro}m), and VLA radio continuum (6-20 cm). In addition, this region of the sky has been targeted for extensive spectroscopy using the DEIMOS spectrograph on the Keck II 10 m telescope. Our survey is compared to other large multiwavelength surveys in terms of depth and sky coverage.

  6. The All-wavelength Extended Groth Strip International Survey (AEGIS) Data Sets

    Energy Technology Data Exchange (ETDEWEB)

    Davis, M.; Guhathakurta, P.; Konidaris, N.; Newman, J.A.; Ashby, M.L.N.; Biggs, A.D.; Barmby, P.; Bundy, K.; Chapman, S.; Coil, A.L.; Conselice, C.; Cooper, M.; Croton,; Eisenhardt, P.; Ellis, R.; Faber, S.; Fang, T.; Fazio, G.G.; Georgakakis, A.; Gerke, B.; Goss, W.M.; /UC, Berkeley, Astron. Dept. /Lick Observ. /LBL, Berkeley

    2006-07-21

    In this the first of a series of ''Letters'', we present a description of the panchromatic data sets that have been acquired in the Extended Groth Strip region of the sky. Our survey, the All-wavelength Extended Groth Strip International Survey (AEGIS), is intended to study the physical properties and evolutionary processes of galaxies at z {approx} 1. It includes the following deep, wide-field imaging data sets: Chandra/ACIS{sup 30} X-ray (0.5-10 keV), GALEX{sup 31} ultraviolet (1200-2500 A), CFHT/MegaCam Legacy Survey{sup 32} optical (3600-9000 {angstrom}), CFHT/CFH12K optical (4500-9000 {angstrom}), Hubble Space Telescope/ACS{sup 33} optical (4400-8500 {angstrom}), Palomar/WIRC{sup 34} near-infrared (1.2-2.2 {micro}m), Spitzer/IRAC{sup 35} mid-infrared (3.6-8.0 {micro}m), Spitzer/MIPS far-infrared (24-70 {micro}m), and VLA{sup 36} radio continuum (6-20 cm). In addition, this region of the sky has been targeted for extensive spectroscopy using the DEIMOS spectrograph on the Keck II 10 m telescope{sup 37}. Our survey is compared to other large multiwavelength surveys in terms of depth and sky coverage.

  7. Comparative modeling and benchmarking data sets for human histone deacetylases and sirtuin families.

    Science.gov (United States)

    Xia, Jie; Tilahun, Ermias Lemma; Kebede, Eyob Hailu; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

    2015-02-23

    Histone deacetylases (HDACs) are an important class of drug targets for the treatment of cancers, neurodegenerative diseases, and other types of diseases. Virtual screening (VS) has become fairly effective approaches for drug discovery of novel and highly selective histone deacetylase inhibitors (HDACIs). To facilitate the process, we constructed maximal unbiased benchmarking data sets for HDACs (MUBD-HDACs) using our recently published methods that were originally developed for building unbiased benchmarking sets for ligand-based virtual screening (LBVS). The MUBD-HDACs cover all four classes including Class III (Sirtuins family) and 14 HDAC isoforms, composed of 631 inhibitors and 24609 unbiased decoys. Its ligand sets have been validated extensively as chemically diverse, while the decoy sets were shown to be property-matching with ligands and maximal unbiased in terms of "artificial enrichment" and "analogue bias". We also conducted comparative studies with DUD-E and DEKOIS 2.0 sets against HDAC2 and HDAC8 targets and demonstrate that our MUBD-HDACs are unique in that they can be applied unbiasedly to both LBVS and SBVS approaches. In addition, we defined a novel metric, i.e. NLBScore, to detect the "2D bias" and "LBVS favorable" effect within the benchmarking sets. In summary, MUBD-HDACs are the only comprehensive and maximal-unbiased benchmark data sets for HDACs (including Sirtuins) that are available so far. MUBD-HDACs are freely available at http://www.xswlab.org/ .

  8. Eu-Detect: An algorithm for detecting eukaryotic sequences in metagenomic data sets

    Indian Academy of Sciences (India)

    Monzoorul Haque Mohammed; Sudha Chadaram Dinakar; Dinakar Komanduri; Tarini Shankar Ghosh; Sharmila S Mande

    2011-09-01

    Physical partitioning techniques are routinely employed (during sample preparation stage) for segregating the prokaryotic and eukaryotic fractions of metagenomic samples. In spite of these efforts, several metagenomic studies focusing on bacterial and archaeal populations have reported the presence of contaminating eukaryotic sequences inmetagenomic data sets. Contaminating sequences originate not only from genomes of micro-eukaryotic species but also from genomes of (higher) eukaryotic host cells. The latter scenario usually occurs in the case of host-associatedmetagenomes. Identification and removal of contaminating sequences is important, since these sequences not only impact estimates of microbial diversity but also affect the accuracy of several downstream analyses. Currently, the computational techniques used for identifying contaminating eukaryotic sequences, being alignment based, are slow, inefficient, and require huge computing resources. In this article, we present Eu-Detect, an alignment-free algorithm that can rapidly identify eukaryotic sequences contaminating metagenomic data sets. Validation results indicate that on a desktop with modest hardware specifications, the Eu-Detect algorithm is able to rapidly segregate DNA sequence fragments of prokaryotic and eukaryotic origin, with high sensitivity. A Web server for the Eu-Detect algorithm is available at http://metagenomics.atc.tcs.com/Eu-Detect/.

  9. Fast computation of categorical richness on raster data sets and related problems

    DEFF Research Database (Denmark)

    de Berg, Mark; Tsirogiannis, Constantinos; Wilkinson, Bryan T.

    2015-01-01

    of millions of cells. The categorical richness problem is related to colored range counting, where the goal is to preprocess a colored point set such that we can efficiently count the number of colors appearing inside a query range. We present a data structure for colored range counting in R^2 for the case......In many scientific fields, it is common to encounter raster data sets consisting of categorical data, such as soil type or land usage of a terrain. A problem that arises in the presence of such data is the following: given a raster G of n cells storing categorical data, compute for every cell c...... that runs in O(n) time and one for circular windows that runs in O((1+K/r)n) time, where K is the number of different categories appearing in G. The algorithms are not only very efficient in theory, but also in practice: our experiments show that our algorithms can handle raster data sets of hundreds...

  10. NCBI Epigenomics: a new public resource for exploring epigenomic data sets.

    Science.gov (United States)

    Fingerman, Ian M; McDaniel, Lee; Zhang, Xuan; Ratzat, Walter; Hassan, Tarek; Jiang, Zhifang; Cohen, Robert F; Schuler, Gregory D

    2011-01-01

    The Epigenomics database at the National Center for Biotechnology Information (NCBI) is a new resource that has been created to serve as a comprehensive public resource for whole-genome epigenetic data sets (www.ncbi.nlm.nih.gov/epigenomics). Epigenetics is the study of stable and heritable changes in gene expression that occur independently of the primary DNA sequence. Epigenetic mechanisms include post-translational modifications of histones, DNA methylation, chromatin conformation and non-coding RNAs. It has been observed that misregulation of epigenetic processes has been associated with human disease. We have constructed the new resource by selecting the subset of epigenetics-specific data from general-purpose archives, such as the Gene Expression Omnibus, and Sequence Read Archives, and then subjecting them to further review, annotation and reorganization. Raw data is processed and mapped to genomic coordinates to generate 'tracks' that are a visual representation of the data. These data tracks can be viewed using popular genome browsers or downloaded for local analysis. The Epigenomics resource also provides the user with a unique interface that allows for intuitive browsing and searching of data sets based on biological attributes. Currently, there are 69 studies, 337 samples and over 1100 data tracks from five well-studied species that are viewable and downloadable in Epigenomics.

  11. Small Big Data: Using multiple data-sets to explore unfolding social and economic change

    Directory of Open Access Journals (Sweden)

    Emily Gray

    2015-06-01

    Full Text Available Bold approaches to data collection and large-scale quantitative advances have long been a preoccupation for social science researchers. In this commentary we further debate over the use of large-scale survey data and official statistics with ‘Big Data’ methodologists, and emphasise the ability of these resources to incorporate the essential social and cultural heredity that is intrinsic to the human sciences. In doing so, we introduce a series of new data-sets that integrate approximately 30 years of survey data on victimisation, fear of crime and disorder and social attitudes with indicators of socio-economic conditions and policy outcomes in Britain. The data-sets that we outline below do not conform to typical conceptions of ‘Big Data’. But, we would contend, they are ‘big’ in terms of the volume, variety and complexity of data which has been collated (and to which additional data can be linked and ‘big’ also in that they allow us to explore key questions pertaining to how social and economic policy change at the national level alters the attitudes and experiences of citizens. Importantly, they are also ‘small’ in the sense that the task of rendering the data usable, linking it and decoding it, required both manual processing and tacit knowledge of the context of the data and intentions of its creators.

  12. Intercomparison of Climate Data Sets as a Measure of Observational Uncertainty

    Energy Technology Data Exchange (ETDEWEB)

    Covey, C; Achuta Rao, K M; Fiorino, M; Gleckler, P J; Taylor, K E; Wehner, M F

    2002-02-22

    Uncertainties in climate observations are revealed when alternate observationally based data sets are compared. General circulation model-based ''reanalyses'' of meteorological observations will yield different results from different models, even if identical sets of raw unanalyzed data form their starting points. We have examined 25 longitude-latitude fields (including selected levels for three-dimensional quantities) encompassing atmospheric climate variables for which the PCMDI observational data base contains two or more high-quality sources. For the most part we compare ECMWF with NCEP reanalysis. In some cases, we compare in situ and/or satellite-derived data with reanalysis. To obtain an overview of the differences for all 25 fields, we use a graphical technique developed for climate model diagnosis: a ''portrait diagram'' displaying root-mean-square differences between the alternate data sources. With a few exceptions (arising from the requirement that RMS differences be normalized to accommodate different units of variables) the portrait diagrams indicate areas of agreement and disagreement that can be confirmed by examining traditional graphics such as zonal mean plots. In accord with conventional wisdom, the greatest agreement between alternate data sets--hence the smallest implied observational uncertainty--occurs for upper tropospheric zonal wind. We also find fairly good agreement between reanalysis and more direct measures of precipitation, suggesting that modern observational systems are resolving some long-standing problems with its measurement.

  13. Analysis of bibliometric indicators for individual scholars in a large data set

    CERN Document Server

    Radicchi, Filippo

    2013-01-01

    Citation numbers and other quantities derived from bibliographic databases are becoming standard tools for the assessment of productivity and impact of research activities. Though widely used, still their statistical properties have not been well established so far. This is especially true in the case of bibliometric indicators aimed at the evaluation of individual scholars, because large-scale data sets are typically difficult to be retrieved. Here, we take advantage of a recently introduced large bibliographic data set, Google Scholar Citations, which collects the entire publication record of individual scholars. We analyze the scientific profile of more than 30,000 researchers, and study the relation between the h-index, the number of publications and the number of citations of individual scientists. While the number of publications of a scientist has a rather weak relation with his/her h-index, we find that the h-index of a scientist is strongly correlated with the number of citations that she/he has rece...

  14. OCR Context-Sensitive Error Correction Based on Google Web 1T 5-Gram Data Set

    CERN Document Server

    Bassil, Youssef

    2012-01-01

    Since the dawn of the computing era, information has been represented digitally so that it can be processed by electronic computers. Paper books and documents were abundant and widely being published at that time; and hence, there was a need to convert them into digital format. OCR, short for Optical Character Recognition was conceived to translate paper-based books into digital e-books. Regrettably, OCR systems are still erroneous and inaccurate as they produce misspellings in the recognized text, especially when the source document is of low printing quality. This paper proposes a post-processing OCR context-sensitive error correction method for detecting and correcting non-word and real-word OCR errors. The cornerstone of this proposed approach is the use of Google Web 1T 5-gram data set as a dictionary of words to spell-check OCR text. The Google data set incorporates a very large vocabulary and word statistics entirely reaped from the Internet, making it a reliable source to perform dictionary-based erro...

  15. Biogeography and floral evolution of baobabs (Adansonia, Bombacaceae) as inferred from multiple data sets.

    Science.gov (United States)

    Baum, D A; Small, R L; Wendel, J F

    1998-06-01

    The phylogeny of baobab trees was analyzed using four data sets: chloroplast DNA restriction sites, sequences of the chloroplast rpl16 intron, sequences of the internal transcribed spacer (ITS) region of nuclear ribosomal DNA, and morphology. We sampled each of the eight species of Adansonia plus three outgroup taxa from tribe Adansonieae. These data were analyzed singly and in combination using parsimony. ITS and morphology provided the greatest resolution and were largely concordant. The two chloroplast data sets showed concordance with one another but showed significant conflict with ITS and morphology. A possible explanation for the conflict is genealogical discordance within the Malagasy Longitubae, perhaps due to introgression events. A maximum-likelihood analysis of branching times shows that the dispersal between Africa and Australia occurred well after the fragmentation of Gondwana and therefore involved overwater dispersal. The phylogeny does not permit unambiguous reconstruction of floral evolution but suggests the plausible hypothesis that hawkmoth pollination was ancestral in Adansonia and that there were two parallel switches to pollination by mammals in the genus.

  16. 4D inversion of time-lapse magnetotelluric data sets for monitoring geothermal reservoir

    Science.gov (United States)

    Nam, Myung Jin; Song, Yoonho; Jang, Hannuree; Kim, Bitnarae

    2017-06-01

    The productivity of a geothermal reservoir, which is a function of the pore-space and fluid-flow path of the reservoir, varies since the properties of the reservoir changes with geothermal reservoir production. Because the variation in the reservoir properties causes changes in electrical resistivity, time-lapse (TL) three-dimensional (3D) magnetotelluric (MT) methods can be applied to monitor the productivity variation of a geothermal reservoir thanks to not only its sensitivity to the electrical resistivity but also its deep depth of survey penetration. For an accurate interpretation of TL MT-data sets, a four-dimensional (4D) MT inversion algorithm has been developed to simultaneously invert all vintage data considering time-coupling between vintages. However, the changes in electrical resistivity of deep geothermal reservoirs are usually small generating minimum variation in TL MT responses. Maximizing the sensitivity of inversion to the changes in resistivity is critical in the success of 4D MT inversion. Thus, we further developed a focused 4D MT inversion method by considering not only the location of a reservoir but also the distribution of newly-generated fractures during the production. For the evaluation of the 4D MT algorithm, we tested our 4D inversion algorithms using synthetic TL MT-data sets.

  17. Diabetes research in children network:availability of protocol data sets.

    Science.gov (United States)

    Ruedy, Katrina J; Beck, Roy W; Xing, Dongyuan; Kollman, Craig

    2007-09-01

    The Diabetes Research in Children Network (DirecNet) was established in 2001 by the National Institute of Child Health and Human Development and the National Institute of Diabetes and Digestive and Kidney Diseases through special congressional funding for type 1 diabetes research. The network consists of five clinical centers, a coordinating center, and a central laboratory. Since its inception, DirecNet has conducted nine protocols, resulting in 28 published manuscripts with an additional 2 under review and 5 in development. The protocols have involved evaluation of technology available for the treatment of type 1 diabetes, including home glucose meters (OneTouch Ultra, FreeStyle, and BD Logic), continuous glucose monitoring systems (GW2B, CGMS, FreeStyle Navigator, and Guardian RT), and hemoglobin A1c (HbA1c) devices (DCA 2000 and A1cNow). In addition, the group has conducted several studies evaluating factors affecting hypoglycemia, including exercise and bedtime snack composition. The data sets that have resulted from these studies include data from the devices being evaluated, central laboratory glucose, HbA1c and hormone data, clinical center glucose and HbA1c data, accelerometer data, and pump data depending on the procedures involved with each protocol. These data sets are, or will be, available at no charge on the study group's public Web site. Several psychosocial questionnaires developed by DirecNet are also available.

  18. Gene regulatory network inference using fused LASSO on multiple data sets.

    Science.gov (United States)

    Omranian, Nooshin; Eloundou-Mbebi, Jeanne M O; Mueller-Roeber, Bernd; Nikoloski, Zoran

    2016-02-11

    Devising computational methods to accurately reconstruct gene regulatory networks given gene expression data is key to systems biology applications. Here we propose a method for reconstructing gene regulatory networks by simultaneous consideration of data sets from different perturbation experiments and corresponding controls. The method imposes three biologically meaningful constraints: (1) expression levels of each gene should be explained by the expression levels of a small number of transcription factor coding genes, (2) networks inferred from different data sets should be similar with respect to the type and number of regulatory interactions, and (3) relationships between genes which exhibit similar differential behavior over the considered perturbations should be favored. We demonstrate that these constraints can be transformed in a fused LASSO formulation for the proposed method. The comparative analysis on transcriptomics time-series data from prokaryotic species, Escherichia coli and Mycobacterium tuberculosis, as well as a eukaryotic species, mouse, demonstrated that the proposed method has the advantages of the most recent approaches for regulatory network inference, while obtaining better performance and assigning higher scores to the true regulatory links. The study indicates that the combination of sparse regression techniques with other biologically meaningful constraints is a promising framework for gene regulatory network reconstructions.

  19. Mesoscopic structures reveal the network between the layers of multiplex data sets.

    Science.gov (United States)

    Iacovacci, Jacopo; Wu, Zhihao; Bianconi, Ginestra

    2015-10-01

    Multiplex networks describe a large variety of complex systems, whose elements (nodes) can be connected by different types of interactions forming different layers (networks) of the multiplex. Multiplex networks include social networks, transportation networks, or biological networks in the cell or in the brain. Extracting relevant information from these networks is of crucial importance for solving challenging inference problems and for characterizing the multiplex networks microscopic and mesoscopic structure. Here we propose an information theory method to extract the network between the layers of multiplex data sets, forming a "network of networks." We build an indicator function, based on the entropy of network ensembles, to characterize the mesoscopic similarities between the layers of a multiplex network, and we use clustering techniques to characterize the communities present in this network of networks. We apply the proposed method to study the Multiplex Collaboration Network formed by scientists collaborating on different subjects and publishing in the American Physical Society journals. The analysis of this data set reveals the interplay between the collaboration networks and the organization of knowledge in physics.

  20. Parallel analysis tools and new visualization techniques for ultra-large climate data set

    Energy Technology Data Exchange (ETDEWEB)

    Middleton, Don [National Center for Atmospheric Research, Boulder, CO (United States); Haley, Mary [National Center for Atmospheric Research, Boulder, CO (United States)

    2014-12-10

    ParVis was a project funded under LAB 10-05: “Earth System Modeling: Advanced Scientific Visualization of Ultra-Large Climate Data Sets”. Argonne was the lead lab with partners at PNNL, SNL, NCAR and UC-Davis. This report covers progress from January 1st, 2013 through Dec 1st, 2014. Two previous reports covered the period from Summer, 2010, through September 2011 and October 2011 through December 2012, respectively. While the project was originally planned to end on April 30, 2013, personnel and priority changes allowed many of the institutions to continue work through FY14 using existing funds. A primary focus of ParVis was introducing parallelism to climate model analysis to greatly reduce the time-to-visualization for ultra-large climate data sets. Work in the first two years was conducted on two tracks with different time horizons: one track to provide immediate help to climate scientists already struggling to apply their analysis to existing large data sets and another focused on building a new data-parallel library and tool for climate analysis and visualization that will give the field a platform for performing analysis and visualization on ultra-large datasets for the foreseeable future. In the final 2 years of the project, we focused mostly on the new data-parallel library and associated tools for climate analysis and visualization.

  1. A variant reference data set for the Africanized honeybee, Apis mellifera.

    Science.gov (United States)

    Kadri, Samir M; Harpur, Brock A; Orsi, Ricardo O; Zayed, Amro

    2016-11-08

    The Africanized honeybee (AHB) is a population of Apis mellifera found in the Americas. AHBs originated in 1956 in Rio Clara, Brazil where imported African A. m. scutellata escaped and hybridized with local populations of European A. mellifera. Africanized populations can now be found from Northern Argentina to the Southern United States. AHBs-often referred to as 'Killer Bees'- are a major concern to the beekeeping industry as well as a model for the evolutionary genetics of colony defence. We performed high coverage pooled-resequencing of 360 diploid workers from 30 Brazilian AHB colonies using Illumina Hi-Seq (150 bp PE). This yielded a high density SNP data set with an average read depth at each site of 20.25 reads. With 3,606,720 SNPs and 155,336 SNPs within 11,365 genes, this data set is the largest genomic resource available for AHBs and will enable high-resolution studies of the population dynamics, evolution, and genetics of this successful biological invader, in addition to facilitating the development of SNP-based tools for identifying AHBs.

  2. Three-Cup Chicken

    Institute of Scientific and Technical Information of China (English)

    1999-01-01

    Ingredents:500 grams chicken legs,100 grams(about one tea cup)rice wine,50 grams(a small tea cup)sesame oil,50grams refined soy sauce,25 grams white sugar,10grams oyster sauce,chopped scallions,ginger root,garlic,and some hot chili peppers

  3. Twin Flavor Chicken Wings

    Institute of Scientific and Technical Information of China (English)

    1999-01-01

    Ingredients:1000g chicken wings,about,100g Shredded rape-seedleaves,100g black sesame seeds,7g salt,5g sugar,3gMSG,10g cooking wine,5g cassia bark,1000g cookingoil(actual consumption only 100 grams),one egg,anoptional amount of scallion,ginger root,starch and

  4. Immunomodulating Lactobacilli in Chicken

    NARCIS (Netherlands)

    M.E. Koenen (Marjorie)

    2004-01-01

    markdownabstract__Abstract__ The gastro-intestinal (GI) tract of a chicken starts with the beak, followed by the esophagus and crop, proventriculus (glandular stomach), gizzard (muscular stomach), duodenum, ileum, a pair of blind elongated caeca, colon and ending in the cloaca. The GI-tract

  5. Welfare of broiler chickens

    Directory of Open Access Journals (Sweden)

    Federico Sirri

    2010-01-01

    Full Text Available Broiler chickens have been selected for their rapid growth rate as well as for high carcass yields, with particular regard to the breast, and reared in intensive systems at high stocking density ranging from 30 to 40 kg live weight/m2. These conditions lead to a worsening of the welfare status of birds. In Europe a specific directive for the protection of broiler chickens has been recently approved whereas in Italy there is not yet any regulation. The EU directive lays down minimum rules for the protection of chickens kept for meat production and gives indications on management practices with particular focus on stocking density, light regimen and air quality, training and guidance for people dealing with chickens, as well as monitoring plans for holding and slaughterhouse. In this review the rearing factors influencing the welfare conditions of birds are described and detailed information on the effects of stocking density, light regimen, litter characteristic and air quality (ammonia, carbon dioxide, humidity, dust are provided. Moreover, the main health implications of poor welfare conditions of the birds, such as contact dermatitis, metabolic, skeletal and muscular disorders are considered. The behavioural repertoire, including scratching, dust bathing, ground pecking, wing flapping, locomotor activity, along with factors that might impair these aspects, are discussed. Lastly, farm animal welfare assessment through physiological and behavioural indicators is described with particular emphasis on the “Unitary Welfare Index,” a tool that considers a wide range of indicators, including productive traits, in order to audit and compare the welfare status of chickens kept in different farms.

  6. DUACS DT2014: the new multi-mission altimeter data set reprocessed over 20 years

    Science.gov (United States)

    Pujol, Marie-Isabelle; Faugère, Yannice; Taburet, Guillaume; Dupuy, Stéphanie; Pelloquin, Camille; Ablain, Michael; Picot, Nicolas

    2016-09-01

    The new DUACS DT2014 reprocessed products have been available since April 2014. Numerous innovative changes have been introduced at each step of an extensively revised data processing protocol. The use of a new 20-year altimeter reference period in place of the previous 7-year reference significantly changes the sea level anomaly (SLA) patterns and thus has a strong user impact. The use of up-to-date altimeter standards and geophysical corrections, reduced smoothing of the along-track data, and refined mapping parameters, including spatial and temporal correlation-scale refinement and measurement errors, all contribute to an improved high-quality DT2014 SLA data set. Although all of the DUACS products have been upgraded, this paper focuses on the enhancements to the gridded SLA products over the global ocean. As part of this exercise, 21 years of data have been homogenized, allowing us to retrieve accurate large-scale climate signals such as global and regional MSL trends, interannual signals, and better refined mesoscale features.An extensive assessment exercise has been carried out on this data set, which allows us to establish a consolidated error budget. The errors at mesoscale are about 1.4 cm2 in low-variability areas, increase to an average of 8.9 cm2 in coastal regions, and reach nearly 32.5 cm2 in high mesoscale activity areas. The DT2014 products, compared to the previous DT2010 version, retain signals for wavelengths lower than ˜ 250 km, inducing SLA variance and mean EKE increases of, respectively, +5.1 and +15 %. Comparisons with independent measurements highlight the improved mesoscale representation within this new data set. The error reduction at the mesoscale reaches nearly 10 % of the error observed with DT2010. DT2014 also presents an improved coastal signal with a nearly 2 to 4 % mean error reduction. High-latitude areas are also more accurately represented in DT2014, with an improved consistency between spatial coverage and sea ice edge

  7. Field rotor measurements. Data sets prepared for analysis of stall hysteresis

    Energy Technology Data Exchange (ETDEWEB)

    Aagaard Madsen, H.; Thirstrup Petersen, J. [Risoe National Lab. (Denmark); Bruining, A. [Delft Univ. of Technology (Netherlands); Brand, A. [ECN (Netherlands); Graham, M. [Imperical College (United Kingdom)

    1998-05-01

    As part of the JOULE-3 project `STALLVIB` an analysis and synthesis of the data from the field rotor experiments at ECN, Delft University, Imperial College, NREL and Risoe has been carried out. This has been done in order to see to what extent the data could be used for further development and validation of engineering dynamic stall models. A detailed investigation of the influence of the post-processing of the different data sets has been performed. Further, important statistical functions such as PSD spectra, coherence and transfer functions have been derived for the data sets which can be used as basis for evaluation of the quality of the data seen relative to actual application of the data. The importance of using an appropriate low-pass filtering to remove high frequency noise has been demonstrated when the relation between instantaneous values of e.g. {alpha} and C{sub N} is considered. In general, the complicated measurement on a rotor of {alpha} and w and the interpretation of these parameters combined with the strongly three-dimensional, turbulent flow field around the rotating blade has the consequence that it seems difficult to derive systematic information from the different data sets about stall hysteresis. In particular, the measurement of {alpha}, which determination of the stagnation point gives reasonable data below stall but fails in stall. On the other hand, measurements of {alpha} with a five hole pitot tube can be used also in the stall region. Another main problem is the non-dimensionalization of the coefficients C{sub N} and C{sub r}. If the dynamic pressure used for the non-dimensionalization is not fully correlated with the aerodynamic pressure over the considered airfoil section due to e.g. influence of the gravity on the pressure pipes, the hysteresis loops will be distorted. However, using the data with caution and applying a suitable post-processing as described by the different participants, it will probably be possible to obtain some

  8. Validating hierarchical verbal autopsy expert algorithms in a large data set with known causes of death.

    Science.gov (United States)

    Kalter, Henry D; Perin, Jamie; Black, Robert E

    2016-06-01

    Physician assessment historically has been the most common method of analyzing verbal autopsy (VA) data. Recently, the World Health Organization endorsed two automated methods, Tariff 2.0 and InterVA-4, which promise greater objectivity and lower cost. A disadvantage of the Tariff method is that it requires a training data set from a prior validation study, while InterVA relies on clinically specified conditional probabilities. We undertook to validate the hierarchical expert algorithm analysis of VA data, an automated, intuitive, deterministic method that does not require a training data set. Using Population Health Metrics Research Consortium study hospital source data, we compared the primary causes of 1629 neonatal and 1456 1-59 month-old child deaths from VA expert algorithms arranged in a hierarchy to their reference standard causes. The expert algorithms were held constant, while five prior and one new "compromise" neonatal hierarchy, and three former child hierarchies were tested. For each comparison, the reference standard data were resampled 1000 times within the range of cause-specific mortality fractions (CSMF) for one of three approximated community scenarios in the 2013 WHO global causes of death, plus one random mortality cause proportions scenario. We utilized CSMF accuracy to assess overall population-level validity, and the absolute difference between VA and reference standard CSMFs to examine particular causes. Chance-corrected concordance (CCC) and Cohen's kappa were used to evaluate individual-level cause assignment. Overall CSMF accuracy for the best-performing expert algorithm hierarchy was 0.80 (range 0.57-0.96) for neonatal deaths and 0.76 (0.50-0.97) for child deaths. Performance for particular causes of death varied, with fairly flat estimated CSMF over a range of reference values for several causes. Performance at the individual diagnosis level was also less favorable than that for overall CSMF (neonatal: best CCC = 0.23, range 0

  9. Breeding and Genetics Symposium: really big data: processing and analysis of very large data sets.

    Science.gov (United States)

    Cole, J B; Newman, S; Foertter, F; Aguilar, I; Coffey, M

    2012-03-01

    Modern animal breeding data sets are large and getting larger, due in part to recent availability of high-density SNP arrays and cheap sequencing technology. High-performance computing methods for efficient data warehousing and analysis are under development. Financial and security considerations are important when using shared clusters. Sound software engineering practices are needed, and it is better to use existing solutions when possible. Storage requirements for genotypes are modest, although full-sequence data will require greater storage capacity. Storage requirements for intermediate and results files for genetic evaluations are much greater, particularly when multiple runs must be stored for research and validation studies. The greatest gains in accuracy from genomic selection have been realized for traits of low heritability, and there is increasing interest in new health and management traits. The collection of sufficient phenotypes to produce accurate evaluations may take many years, and high-reliability proofs for older bulls are needed to estimate marker effects. Data mining algorithms applied to large data sets may help identify unexpected relationships in the data, and improved visualization tools will provide insights. Genomic selection using large data requires a lot of computing power, particularly when large fractions of the population are genotyped. Theoretical improvements have made possible the inversion of large numerator relationship matrices, permitted the solving of large systems of equations, and produced fast algorithms for variance component estimation. Recent work shows that single-step approaches combining BLUP with a genomic relationship (G) matrix have similar computational requirements to traditional BLUP, and the limiting factor is the construction and inversion of G for many genotypes. A naïve algorithm for creating G for 14,000 individuals required almost 24 h to run, but custom libraries and parallel computing reduced that to

  10. High-resolution daily gridded data sets of air temperature and wind speed for Europe

    Science.gov (United States)

    Brinckmann, Sven; Krähenmann, Stefan; Bissolli, Peter

    2016-10-01

    New high-resolution data sets for near-surface daily air temperature (minimum, maximum and mean) and daily mean wind speed for Europe (the CORDEX domain) are provided for the period 2001-2010 for the purpose of regional model validation in the framework of DecReg, a sub-project of the German MiKlip project, which aims to develop decadal climate predictions. The main input data sources are SYNOP observations, partly supplemented by station data from the ECA&D data set (http://www.ecad.eu). These data are quality tested to eliminate erroneous data. By spatial interpolation of these station observations, grid data in a resolution of 0.044° (≈ 5km) on a rotated grid with virtual North Pole at 39.25° N, 162° W are derived. For temperature interpolation a modified version of a regression kriging method developed by Krähenmann et al.(2011) is used. At first, predictor fields of altitude, continentality and zonal mean temperature are used for a regression applied to monthly station data. The residuals of the monthly regression and the deviations of the daily data from the monthly averages are interpolated using simple kriging in a second and third step. For wind speed a new method based on the concept used for temperature was developed, involving predictor fields of exposure, roughness length, coastal distance and ERA-Interim reanalysis wind speed at 850 hPa. Interpolation uncertainty is estimated by means of the kriging variance and regression uncertainties. Furthermore, to assess the quality of the final daily grid data, cross validation is performed. Variance explained by the regression ranges from 70 to 90 % for monthly temperature and from 50 to 60 % for monthly wind speed. The resulting RMSE for the final daily grid data amounts to 1-2 K and 1-1.5 ms-1 (depending on season and parameter) for daily temperature parameters and daily mean wind speed, respectively. The data sets presented in this article are published at doi:10.5676/DWD_CDC/DECREG0110v2.

  11. 3D printing of normal and pathologic tricuspid valves from transthoracic 3D echocardiography data sets.

    Science.gov (United States)

    Muraru, Denisa; Veronesi, Federico; Maddalozzo, Anna; Dequal, Daniele; Frajhof, Leonardo; Rabischoffsky, Arnaldo; Iliceto, Sabino; Badano, Luigi P

    2017-07-01

    To explore the feasibility of using transthoracic 3D echocardiography (3DTTE) data to generate 3D patient-specific models of tricuspid valve (TV). Multi-beat 3D data sets of the TV (32 vol/s) were acquired in five subjects with various TV morphologies from the apical approach and analysed offline with custom-made software. Coordinates representing the annulus and the leaflets were imported into MeshLab (Visual Computing Lab ISTICNR) to develop solid models to be converted to stereolithographic file format and 3D print. Measurements of the TV annulus antero-posterior (AP) and medio-lateral (ML) diameters, perimeter (P), and TV tenting height (H) and volume (V) obtained from the 3D echo data set were compared with those performed on the 3D models using a caliper, a syringe and a millimeter tape. Antero-posterior (4.2 ± 0.2 cm vs. 4.2 ± 0 cm), ML (3.7 ± 0.2 cm vs. 3.6 ± 0.1 cm), P (12.6 ± 0.2 cm vs. 12.7 ± 0.1 cm), H (11.2 ± 2.1 mm vs. 10.8 ± 2.1 mm) and V (3.0 ± 0.6 ml vs. 2.8 ± 1.4 ml) were similar (P = NS for all) when measured on the 3D data set and the printed model. The two sets of measurements were highly correlated (r = 0.991). The mean absolute error (2D - 3D) for AP, ML, P and tenting H was 0.7 ± 0.3 mm, indicating accuracy of the 3D model of printing of the TV from 3DTTE data is feasible with highly conserved fidelity. This technique has the potential for rapid integration into clinical practice to assist with decision-making, surgical planning, and teaching.

  12. The Impacts of Different Meteorology Data Sets on Nitrogen Fate and Transport in the SWAT Watershed Model

    Science.gov (United States)

    In this study, we investigated how different meteorology data sets impacts nitrogen fate and transport responses in the Soil and Water Assessment Tool (SWAT) model. We used two meteorology data sets: National Climatic Data Center (observed) and Mesoscale Model 5/Weather Research ...

  13. Variation in LCA results for disposable polystyrene beverage cups due to multiple data sets and modelling choices

    NARCIS (Netherlands)

    Harst, van der E.J.M.; Potting, J.

    2014-01-01

    Life Cycle Assessments (LCAs) of the same products often result in different, sometimes even contradictory outcomes. Reasons for these differences include using different data sets and deviating modelling choices. This paper purposely used different data sets and modelling choices to identify how th

  14. A European daily high-resolution gridded data set of surface temperature and precipitation for 1950-2006

    NARCIS (Netherlands)

    Haylock, M.R.; Hofstra, N.; Klein Tank, A.M.G.; Klok, E.J.; Jones, P.D.; New, M.

    2008-01-01

    We present a European land-only daily high-resolution gridded data set for precipitation and minimum, maximum, and mean surface temperature for the period 1950-2006. This data set improves on previous products in its spatial resolution and extent, time period, number of contributing stations, and

  15. A U-statistics-based approach for modeling Cronbach coefficient alpha within a longitudinal data setting.

    Science.gov (United States)

    Yan, Ma; Alejandro, Gonzalez Della Valle; Hui, Zhang; Tu, X M

    2010-03-15

    Cronbach coefficient alpha (CCA) is a classic measure of item internal consistency of an instrument and is used in a wide range of behavioral, biomedical, psychosocial, and health-care-related research. Methods are available for making inference about one CCA or multiple CCAs from correlated outcomes. However, none of the existing approaches effectively address missing data. As longitudinal study designs become increasingly popular and complex in modern-day clinical studies, missing data have become a serious issue, and the lack of methods to systematically address this problem has hampered the progress of research in the aforementioned fields. In this paper, we develop a novel approach to tackle the complexities involved in addressing missing data (at the instrument level due to subject dropout) within a longitudinal data setting. The approach is illustrated with both clinical and simulated data.

  16. Using climate models to estimate the quality of global observational data sets.

    Science.gov (United States)

    Massonnet, François; Bellprat, Omar; Guemas, Virginie; Doblas-Reyes, Francisco J

    2016-10-28

    Observational estimates of the climate system are essential to monitoring and understanding ongoing climate change and to assessing the quality of climate models used to produce near- and long-term climate information. This study poses the dual and unconventional question: Can climate models be used to assess the quality of observational references? We show that this question not only rests on solid theoretical grounds but also offers insightful applications in practice. By comparing four observational products of sea surface temperature with a large multimodel climate forecast ensemble, we find compelling evidence that models systematically score better against the most recent, advanced, but also most independent product. These results call for generalized procedures of model-observation comparison and provide guidance for a more objective observational data set selection. Copyright © 2016, American Association for the Advancement of Science.

  17. Using hybrid associative classifier with translation (HACT for studying imbalanced data sets

    Directory of Open Access Journals (Sweden)

    Laura Cleofas Sánchez

    2012-04-01

    Full Text Available Class imbalance may reduce the classifier performance in several recognition pattern problems. Such negative effect is more notable with least represented class (minority class Patterns. A strategy for handling this problem consisted of treating the classes included in this problem separately (majority and minority classes to balance the data sets (DS. This paper has studied high sensitivity to class imbalance shown by an associative model of classification: hybrid associative classifier with translation (HACT; imbalanced DS impact on associative model performance was studied. The convenience of using sub-sampling methods for decreasing imbalanced negative effects on associative memories was analysed. This proposal’s feasibility was based on experimental results obtained from eleven real-world datasets.

  18. The combined EarthScope data set at the IRIS DMC

    Science.gov (United States)

    Trabant, C.; Sharer, G.; Benson, R.; Ahern, T.

    2007-12-01

    The IRIS Data Management Center (DMC) is the perpetual archive and access point for an ever-increasing variety of geophysical data in terms of volume, geographic distribution and scientific value. A particular highlight is the combined data set produced by the EarthScope project. The DMC archives data from each of the primary components: USArray, the Plate Boundary Observatory (PBO) & the San Andreas Fault Observatory at Depth (SAFOD). Growing at over 4.6 gigabytes per day, the USArray data set currently totals approximately 5 terabytes. Composed of four separate sub-components: the Permanent, Transportable, Flexible and Magnetotelluric Arrays, the USArray data set provides a multi-scale view of the western United States at present and the conterminous United States when it is completed. The primary data from USArray are in the form of broadband and short-period seismic recordings and magnetotelluric measurements. Complementing the data from USArray are the short- period, borehole seismic data and borehole and laser strain data from PBO. The DMC also archives the high- resolution seismic data from instruments in the SAFOD main and pilot drill holes. The SAFOD seismic data is available in two forms: lower-rate monitoring channels sampled at 250 hertz and full resolution channels varying between 1 and 4 kilohertz. Beyond data collection and archive management the DMC performs value-added functions. All data arriving at the DMC as real-time data streams are processed by QUACK, an automated Quality Control (QC) system. All the measurements made by this system are stored in a database and made available to data contributors and users via a web interface including customized report generation. In addition to the automated QC measurements, quality control is performed on USArray data at the DMC by a team of analysts. The primary functions of the analysts are to routinely report data quality assessment to the respective network operators and log serious, unfixable data

  19. The NOAA-9 Earth Radiation Budget Experiment Wide Field-of-View Data Set

    Science.gov (United States)

    Bush, Kathryn A.; Smith, G. Louis; Young, David F.

    1999-01-01

    The Earth Radiation Budget Experiment (ERBE) consisted of wide field-of-view (WFOV) radiometers and scanning radiometers for measuring outgoing longwave radiation and solar radiation reflected from the Earth. These instruments were carried by the dedicated Earth Radiation Budget Satellite (ERBS) and by the NOAA-9 and -10 operational spacecraft. The WFOV radiometers provided data from which instantaneous fluxes at the top of the atmosphere (TOA) are computed by use of a numerical filter algorithm. Monthly mean fluxes over a 5-degree equal angle grid are computed from the instantaneous TOA fluxes. The WFOV radiometers aboard the NOAA-9 spacecraft operated from February 1985 through December 1992, at which time a failure of the shortwave radiometer ended the usable data after nearly 8 years. This paper examines the monthly mean products from that data set.

  20. DISCOVERY OF LATENT STRUCTURES: EXPERIENCE WITH THE COIL CHALLENGE 2000 DATA SET

    Institute of Scientific and Technical Information of China (English)

    Nevin L. ZHANG; Yi WANG; Tao CHEN

    2008-01-01

    The authors present a case study to demonstrate the possibility of discovering complex and interesting latent structures using hierarchical latent class (HLC) models. A similar effort was made earlier by Zhang (2002), but that study involved only small applications with 4 or 5 observed variables and no more than 2 latent variables due to the lack of efficient learning algorithms. Significant progress has been made since then on algorithmic research, and it is now possible to learn HLC models with dozens of observed variables. This allows us to demonstrate the benefits of HLC models more convincingly than before. The authors have successfully analyzed the CoIL Challenge 2000 data set using HLC models. The model obtained consists of 22 latent variables, and its structure is intuitively appealing. It is exciting to know that such a large and meaningful latent structure can be automatically inferred from data.

  1. Reconstruction of the primordial power spectrum of curvature perturbations using multiple data sets

    CERN Document Server

    Hunt, Paul

    2014-01-01

    Detailed knowledge of the primordial power spectrum (PPS) of curvature perturbations is essential both in order to elucidate the physical mechanism (`inflation') which generated it, and for estimating the parameters of the assumed cosmological model from CMB and LSS data. Hence it ought to be extracted from such data in a model-independent manner, however this is difficult because relevant cosmological observables are given in general by a convolution of the PPS with some smoothing kernel. The deconvolution problem is ill-conditioned so a regularisation scheme must be employed to control error propagation. We demonstrate that `Tikhonov regularisation' can robustly reconstruct the PPS from multiple cosmological data sets, a significant advantage being that both its uncertainty and resolution are precisely quantified. Using Monte Carlo simulations we investigate the performance of several regularisation parameter selection methods and find that generalised cross-validation and Mallow's C_p method give optimal r...

  2. Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome

    Institute of Scientific and Technical Information of China (English)

    Heng Li; Tao Liu; Hai-Hong Li; Yan Li; Li-Jun Fang; Hui-Min Xie; Wei-Mou Zheng; Bai-Lin Hao; Jin-Song Liu; Zhao Xu; Jiao Jin; Lin Fang; Lei Gao; Yu-Dong Li; Zi-Xing Xing; Shao-Gen Gao

    2005-01-01

    With several rice genome projects approaching completion gene prediction/finding by computer algorithms has become an urgent task. Two test sets were constructed by mapping the newly published 28,469 full-length KOME rice cDNA to the RGP BAC clone sequences of Oryza sativa ssp. japonica: a single-gene set of 550 sequences and a multi-gene set of 62 sequences with 271 genes. These data sets were used to evaluate five ab initio gene prediction programs: RiceHMM,GlimmerR, GeneMark, FGENSH and BGF. The predictions were compared on nucleotide, exon and whole gene structure levels using commonly accepted measures and several new measures. The test results show a progress in performance in chronological order. At the same time complementarity of the programs hints on the possibility of further improvement and on the feasibility of reaching better performance by combining several gene-finders.

  3. Photometric Variability in the CSTAR Field: Results from the 2008 Data Set

    Science.gov (United States)

    Wang, Songhu; Zhang, Hui; Zhou, Xu; Zhou, Ji-Lin; Fu, Jian-Ning; Yang, Ming; Liu, Huigen; Xie, Jiwei; Wang, Lifan; Wang, Lingzhi; Wittenmyer, R. A.; Ashley, M. C. B.; Feng, Long-Long; Gong, Xuefei; Lawrence, J. S.; Liu, Qiang; Luong-Van, D. M.; Ma, Jun; Peng, Xiyan; Storey, J. W. V.; Wu, Zhenyu; Yan, Jun; Yang, Huigen; Yang, Ji; Yuan, Xiangyan; Zhang, Tianmeng; Zhang, Xiaojia; Zhu, Zhenxi; Zou, Hu

    2015-06-01

    The Chinese Small Telescope Array (CSTAR) is the first telescope facility built at Dome A, Antarctica. During the 2008 observing season, the installation provided long-baseline and high-cadence photometric observations in the i-band for 18,145 targets within 20 {{deg }2} CSTAR field around the South Celestial Pole for the purpose of monitoring the astronomical observing quality of Dome A and detecting various types of photometric variability. Using sensitive and robust detection methods, we discover 274 potential variables from this data set, 83 of which are new discoveries. We characterize most of them, providing the periods, amplitudes, and classes of variability. The catalog of all these variables is presented along with the discussion of their statistical properties.

  4. Using Browser Notebooks to Analyse Big Atmospheric Data-sets in the Cloud

    Science.gov (United States)

    Robinson, N.; Tomlinson, J.; Arribas, A.; Prudden, R.

    2016-12-01

    We are presenting an account of our experience building an ecosystem for the analysis of big atmospheric data-sets. By using modern technologies we have developed a prototype platform which is scaleable and capable of analysing very large atmospheric datasets. We tested different big-data ecosystems such as Hadoop MapReduce, Spark and Dask, in order to find the one which was best suited for analysis of multidimensional binary data such as NetCDF. We make extensive use of infrastructure-as-code and containerisation to provide a platform which is reusable, and which can scale to accommodate changes in demand. We make this platform readily accessible using browser based notebooks. As a result, analysts with minimal technology experience can, in tens of lines of Python, make interactive data-visualisation web pages, which can analyse very large amounts of data using cutting edge big-data technology

  5. The NANOGrav Nine-year Data Set: Astrometric Measurements of 37 Millisecond Pulsars

    CERN Document Server

    Matthews, Allison M; Fonseca, Emmanuel; Arzoumanian, Zaven; Crowter, Kathryn; Demorest, Paul B; Dolch, Timothy; Ellis, Justin A; Ferdman, Robert D; Gonzalez, Marjorie E; Jones, Glenn; Jones, Megan L; Lam, Michael T; Levin, Lina; McLaughlin, Maura A; Pennucci, Timothy T; Ransom, Scott M; Stairs, Ingrid H; Stovall, Kevin; Swiggum, Joseph K; Zhu, Weiwei

    2015-01-01

    Using the nine-year radio-pulsar timing data set from the North American Nanohertz Observatory for Gravitational Waves (NANOGrav), collected at Arecibo Observatory and the Green Bank Telescope, we have measured the positions, proper motions, and parallaxes for 37 millisecond pulsars. We report eleven significant parallax measurements and distance measurements, and nineteen lower limits on distance. We compare these measurements to distances predicted by the NE2001 interstellar electron density model and find them to be in general agreement. We use measured orbital-decay rates and spin-down rates to confirm two of the parallax distances and to place distance upper limits on other sources; these distance limits agree with the parallax distances with one exception, PSR J1024-0719, which we discuss at length. Using our measurements in combination with other published measurements, we calculate the velocity dispersion of the millisecond pulsar population in Galactocentric coordinates. We find the radial, azimuthal...

  6. Adaptive fuzzy leader clustering of complex data sets in pattern recognition

    Science.gov (United States)

    Newton, Scott C.; Pemmaraju, Surya; Mitra, Sunanda

    1992-01-01

    A modular, unsupervised neural network architecture for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns on-line in a stable and efficient manner. The initial classification is performed in two stages: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from fuzzy C-means system equations for the centroids and the membership values. The AFLC algorithm is applied to the Anderson Iris data and laser-luminescent fingerprint image data. It is concluded that the AFLC algorithm successfully classifies features extracted from real data, discrete or continuous.

  7. Research on Heuristic Feature Extraction and Classification of EEG Signal Based on BCI Data Set

    Directory of Open Access Journals (Sweden)

    Lijuan Duan

    2013-01-01

    Full Text Available In this study, an EEG signal classification framework was proposed. The framework contained three feature extraction methods refer to optimization strategy. Firstly, we selected optimal electrodes based on the single electrode classification performance and combined all the optimal electrodes’ data as the feature. Then, we discussed the contribution of each time span of EEG signals for each electrode and joined all the optimal time spans’ data together to be used for classifying. In addition, we further selected useful information from original data based on genetic algorithm. Finally, the performances were evaluated by Bayes and SVM classifiers on BCI 2003 Competition data set Ia. And the accuracy of genetic algorithm has reached 91.81%. The experimental results show that our methods offer the better performance for reliable classification of the EEG signal.

  8. Correction of Magnetic Optics and Beam Trajectory Using LOCO Based Algorithm with Expanded Experimental Data Sets

    Energy Technology Data Exchange (ETDEWEB)

    Romanov, A.; Edstrom, D.; Emanov, F. A.; Koop, I. A.; Perevedentsev, E. A.; Rogovsky, Yu. A.; Shwartz, D. B.; Valishev, A.

    2017-03-28

    Precise beam based measurement and correction of magnetic optics is essential for the successful operation of accelerators. The LOCO algorithm is a proven and reliable tool, which in some situations can be improved by using a broader class of experimental data. The standard data sets for LOCO include the closed orbit responses to dipole corrector variation, dispersion, and betatron tunes. This paper discusses the benefits from augmenting the data with four additional classes of experimental data: the beam shape measured with beam profile monitors; responses of closed orbit bumps to focusing field variations; betatron tune responses to focusing field variations; BPM-to-BPM betatron phase advances and beta functions in BPMs from turn-by-turn coordinates of kicked beam. All of the described features were implemented in the Sixdsimulation software that was used to correct the optics of the VEPP-2000 collider, the VEPP-5 injector booster ring, and the FAST linac.

  9. Reliability of the International Spinal Cord Injury Bowel Function Basic and Extended Data Sets

    DEFF Research Database (Denmark)

    Juul, Therese; Bazzochi, G; Coggrave, M;

    2011-01-01

    and second tests were separated by 14 days. Cohen’s kappa was computed as a measure of agreement between raters. Results: Inter-rater reliability assessed by kappa statistics was very good (X0.81) in 5 items, good (0.61–0.80) in 11 items, moderate (0.41–0.60) in 20 items, fair (0.21–0.40) in 11 and poor (o0......Study design: This study was designed as an international validation study. Objective: The objective of this study was to assess the inter-rater reliability of the International Spinal Cord Injury Bowel Function Basic and Extended Data Sets. Setting: Three European spinal cord injury centers...

  10. Preparing a new data set for earthquake damage detection in SAR imagery: the Christchurch example I

    Science.gov (United States)

    Kuny, S.; Hammer, Horst; Schulz, K.

    2014-10-01

    As the introducing first part of this paper, the data set of Christchurch, New Zealand, is outlined with regard to its purpose: the detection of earthquake damages. The aim is to produce simulated SAR images that are realistic enough to function successfully as pre-event images in a change detection effort. To this end, some modifications to the input 3D city model are introduced and discussed. This includes the use of a GIS map, for a realistic modelling of the radiometric variety, and the insertion of high vegetation to the model, so as to achieve a realistic occlusion of building corners. A detailed description of the impact, these modifications have on the simulation, is given and a comparison between the simulations and corresponding real data is drawn.

  11. Clustering procedures for the optimal selection of data sets from multiple crystals in macromolecular crystallography

    Energy Technology Data Exchange (ETDEWEB)

    Foadi, James [Diamond Light Source, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0DE (United Kingdom); Imperial College, London SW7 2AZ (United Kingdom); Aller, Pierre [Diamond Light Source, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0DE (United Kingdom); Alguel, Yilmaz; Cameron, Alex [Imperial College, London SW7 2AZ (United Kingdom); Axford, Danny; Owen, Robin L. [Diamond Light Source, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0DE (United Kingdom); Armour, Wes [Oxford e-Research Centre (OeRC), Keble Road, Oxford OX1 3QG (United Kingdom); Waterman, David G. [Research Complex at Harwell (RCaH), Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA (United Kingdom); Iwata, So [Diamond Light Source, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0DE (United Kingdom); Imperial College, London SW7 2AZ (United Kingdom); Evans, Gwyndaf, E-mail: gwyndaf.evans@diamond.ac.uk [Diamond Light Source, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0DE (United Kingdom)

    2013-08-01

    A systematic approach to the scaling and merging of data from multiple crystals in macromolecular crystallography is introduced and explained. The availability of intense microbeam macromolecular crystallography beamlines at third-generation synchrotron sources has enabled data collection and structure solution from microcrystals of <10 µm in size. The increased likelihood of severe radiation damage where microcrystals or particularly sensitive crystals are used forces crystallographers to acquire large numbers of data sets from many crystals of the same protein structure. The associated analysis and merging of multi-crystal data is currently a manual and time-consuming step. Here, a computer program, BLEND, that has been written to assist with and automate many of the steps in this process is described. It is demonstrated how BLEND has successfully been used in the solution of a novel membrane protein.

  12. Variable Quality Compression of Fluid Dynamical Data Sets Using a 3D DCT Technique

    Science.gov (United States)

    Loddoch, A.; Schmalzl, J.

    2005-12-01

    In this work we present a data compression scheme that is especially suited for the compression of data sets resulting from computational fluid dynamics (CFD). By adopting the concept of the JPEG compression standard and extending the approach of Schmalzl (Schmalzl, J. Using standard image compression algorithms to store data from computational fluid dynamics. Computers and Geosciences, 29, 10211031, 2003) we employ a three-dimensional discrete cosine transform of the data. The resulting frequency components are rearranged, quantized and finally stored using Huffman-encoding and standard variable length integer codes. The compression ratio and also the introduced loss of accuracy can be adjusted by means of two compression parameters to give the desired compression profile. Using the proposed technique compression ratios of more than 60:1 are possible with an mean error of the compressed data of less than 0.1%.

  13. Reconstruction of the primordial power spectrum of curvature perturbations using multiple data sets

    DEFF Research Database (Denmark)

    Hunt, Paul; Sarkar, Subir

    2014-01-01

    Detailed knowledge of the primordial power spectrum of curvature perturbations is essential both in order to elucidate the physical mechanism (`inflation') which generated it, and for estimating the cosmological parameters from observations of the cosmic microwave background and large......-scale structure. Hence it ought to be extracted from such data in a model-independent manner, however this is difficult because relevant cosmological observables are given by a convolution of the primordial perturbations with some smoothing kernel which depends on both the assumed world model and the matter...... content of the universe. Moreover the deconvolution problem is ill-conditioned so a regularisation scheme must be employed to control error propagation. We demonstrate that `Tikhonov regularisation' can robustly reconstruct the primordial spectrum from multiple cosmological data sets, a significant...

  14. Noise reduction in multiple-echo data sets using singular value decomposition.

    Science.gov (United States)

    Bydder, Mark; Du, Jiang

    2006-09-01

    A method is described for denoising multiple-echo data sets using singular value decomposition (SVD). Images are acquired using a multiple gradient- or spin-echo sequence, and the variation of the signal with echo time (TE) in all pixels is subjected to SVD analysis to determine the components of the signal variation. The least significant components are associated with small singular values and tend to characterize the noise variation. Applying a "minimum variance" filter to the singular values suppresses the noise components in a way that optimally approximates the underlying noise-free images. The result is a reduction in noise in the individual TE images with minimal degradation of the spatial resolution and contrast. Phantom and in vivo results are presented.

  15. Photometric Variability in the CSTAR Field: Results From the 2008 Data Set

    CERN Document Server

    Wang, Songhu; Zhou, Xu; Zhou, Ji-Lin; Fu, Jian-Ning; Yang, Ming; Liu, Huigen; Xie, Jiwei; Wang, Lifan; Wang, Lingzhi; Wittenmyer, R A; Ashley, M C B; Feng, Long-Long; Gong, Xuefei; Lawrence, J S; Liu, Qiang; Luong-Van, D M; Ma, Jun; Peng, Xiyan; Storey, J W V; Wu, Zhenyu; Yan, Jun; Yang, Huigen; Yang, Ji; Yuan, Xiangyan; Zhang, Tianmeng; Zhang, Xiaojia; Zhu, Zhenxi; Zou, Hu

    2015-01-01

    The Chinese Small Telescope ARray (CSTAR) is the first telescope facility built at Dome A, Antarctica. During the 2008 observing season, the installation provided long-baseline and high-cadence photometric observations in the i-band for 18,145 targets within 20 deg2 CSTAR field around the South Celestial Pole for the purpose of monitoring the astronomical observing quality of Dome A and detecting various types of photometric variability. Using sensitive and robust detection methods, we discover 274 potential variables from this data set, 83 of which are new discoveries. We characterize most of them, providing the periods, amplitudes and classes of variability. The catalog of all these variables is presented along with the discussion of their statistical properties.

  16. Detection of Small-Scaled Features Using Landsat and Sentinel-2 Data Sets

    Science.gov (United States)

    Steensen, Torge; Muller, Sonke; Dresen, Boris; Buscher, Olaf

    2016-08-01

    In advanced times of renewable energies, our attention has to be on secondary features that can be utilised to enhance our independence from fossil fuels. In terms of biomass, this focus lies on small-scaled features like vegetation units alongside roads or hedges between agricultural fields. Currently, there is no easily- accessible inventory, if at all, outlining the growth and re-growth patterns of such vegetation. Since they are trimmed at least annually to allow the passing of traffic, we can, theoretically, harvest the cut and convert it into energy. This, however, requires a map outlining the vegetation growth and the potential energy amount at different locations as well as adequate transport routes and potential processing plant locations. With the help of Landsat and Sentinel-2 data sets, we explore the possibilities to create such a map. Additional data is provided in the form of regularly acquired, airborne orthophotos and GIS-based infrastructure data.

  17. Undergraduate Research - Analyzing Data Sets: Global Positioning System (GPS) and Modeling the 1994 Northridge Earthquake

    Science.gov (United States)

    Simila, G.; Shubin, C.; Horn, W.

    2003-12-01

    Our undergraduate research program (2000-2003), funded by NASA, consisted of four short courses on the analysis of selected data sets from GPS, solar physics, orbital mechanics, and proteomics. During the program, approximately 80 students were recruited from science, math, engineering, and technology disciplines. This short course introduced students to GPS and earthquake data analysis with additional presentations by scientists from JPL. Additional lectures involved discussions of the wave equation, Fourier analysis, statistical techniques, and computer applications of Excel and Matlab. Each student modeled the observed GPS displacements produced by the 1994 Northridge earthquake and presented an oral report. An additional component of the program involved students as research assistants engaged in a variety of projects at CSUN and JPL. Each short course continued the following semester with weekly research lectures.

  18. A Distributed Architecture for Sharing Ecological Data Sets with Access and Usage Control Guarantees

    DEFF Research Database (Denmark)

    Bonnet, Philippe; Gonzalez, Javier; Granados, Joel Andres

    2014-01-01

    and usage control is necessary to enforce existing open data policies. We have proposed the vision of trusted cells: A decentralized infrastructure, based on secure hardware running on devices equipped with trusted execution environments at the edges of the Internet. We originally described the utilization...... new insights, there are signicant barriers to the realization of this vision. One of the key challenge is to allow scientists to share their data widely while retaining some form of control over who accesses this data (access control) and more importantly how it is used (usage control). Access...... data sets with access and usage control guarantees. We rely on examples from terrestrial research and monitoring in the arctic in the context of the INTERACT project....

  19. The NAFE'05/CoSMOS Data Set: Toward SMOS Soil Moisture Retrieval, Downscaling, and Assimilation

    DEFF Research Database (Denmark)

    Panciera, Rocco; Walker, Jeffrey P.; Kalma, Jetse D.

    2008-01-01

    -resolution data from SMOS; and 3) testing its assimilation into land surface models for root zone soil moisture retrieval. This paper describes the NAFE'05 and COSMOS airborne data sets together with the ground data collected in support of both aircraft campaigns. The airborne L-band acquisitions included 40 km x...... was to provide simulated Soil Moisture and Ocean Salinity (SMOS) observations using airborne L-band radiometers supported by soil moisture and other relevant ground data for the following: 1) the development of SMOS soil moisture retrieval algorithms; 2) developing approaches for downscaling the low....... The L-band data were accompanied by airborne thermal infrared and optical measurements. The ground data consisted of continuous soil moisture profile measurements at 18 monitoring sites throughout the 40 km x 40 km study area and extensive spatial near-surface soil moisture measurements concurrent...

  20. Extraction of 3D velocity and porosity fields from GeoPET data sets

    Energy Technology Data Exchange (ETDEWEB)

    Lippmann-Pipke, Johanna; Kulenkampff, Johannes [Helmholtz-Zentrum Dresden-Rossendorf e.V., Dresden (Germany). Reactive Transport; Eichelbaum, S. [Nemtics Visualization, Leipzig (Germany)

    2017-06-01

    Geoscientific process monitoring with positron emission tomography (GeoPET) is proven to be applicable for quantitative tomographic transport process monitoring in natural geological materials. We benchmarked GeoPET by inversely fitting a numerical finite element model to a diffusive transport experiment in Opalinus clay. The obtained effective diffusion coefficients, D{sub e}, parallel and D{sub e}, perpendicular to, are well in line with data from literature. But more complex, heterogeneous migration, and flow patterns cannot be similarly evaluated by inverse fitting using optimization tools. Alternatively, we started developing an algorithm that allows the quantitative extraction of velocity and porosity fields, v{sub i=x,y,z} (x,y,z) and n(x,y,z) from GeoPET time series, c{sub PET}(x,y,z,t). They may serve as constituent data sets for reactive transport modelling.