WorldWideScience

Sample records for multiple completed datasets

  1. ASSISTments Dataset from Multiple Randomized Controlled Experiments

    Science.gov (United States)

    Selent, Douglas; Patikorn, Thanaporn; Heffernan, Neil

    2016-01-01

    In this paper, we present a dataset consisting of data generated from 22 previously and currently running randomized controlled experiments inside the ASSISTments online learning platform. This dataset provides data mining opportunities for researchers to analyze ASSISTments data in a convenient format across multiple experiments at the same time.…

  2. Sparse Group Penalized Integrative Analysis of Multiple Cancer Prognosis Datasets

    Science.gov (United States)

    Liu, Jin; Huang, Jian; Xie, Yang; Ma, Shuangge

    2014-01-01

    SUMMARY In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Because of the “large d, small n” characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyzes multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group MCP (minimax concave penalty) approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach. PMID:23938111

  3. The SAIL databank: linking multiple health and social care datasets

    Directory of Open Access Journals (Sweden)

    Ford David V

    2009-01-01

    Full Text Available Abstract Background Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress. Methods Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR, to assess the efficacy of this process, and the optimum matching technique. Results The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL at the 50% threshold, and error rates were Conclusion With the infrastructure that has been put in place, the reliable matching process that has been developed enables an ALF to be consistently allocated to records in the databank. The SAIL databank represents a research-ready platform for record-linkage studies.

  4. The SAIL databank: linking multiple health and social care datasets.

    Science.gov (United States)

    Lyons, Ronan A; Jones, Kerina H; John, Gareth; Brooks, Caroline J; Verplancke, Jean-Philippe; Ford, David V; Brown, Ginevra; Leake, Ken

    2009-01-16

    Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage) databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress. Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF) to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage) was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR), to assess the efficacy of this process, and the optimum matching technique. The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL) at the 50% threshold, and error rates were SAIL databank represents a research-ready platform for record-linkage studies.

  5. Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

    Science.gov (United States)

    Lary, D. J.

    2013-12-01

    A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.

  6. Multivariate Analysis of Multiple Datasets: a Practical Guide for Chemical Ecology.

    Science.gov (United States)

    Hervé, Maxime R; Nicolè, Florence; Lê Cao, Kim-Anh

    2018-03-01

    Chemical ecology has strong links with metabolomics, the large-scale study of all metabolites detectable in a biological sample. Consequently, chemical ecologists are often challenged by the statistical analyses of such large datasets. This holds especially true when the purpose is to integrate multiple datasets to obtain a holistic view and a better understanding of a biological system under study. The present article provides a comprehensive resource to analyze such complex datasets using multivariate methods. It starts from the necessary pre-treatment of data including data transformations and distance calculations, to the application of both gold standard and novel multivariate methods for the integration of different omics data. We illustrate the process of analysis along with detailed results interpretations for six issues representative of the different types of biological questions encountered by chemical ecologists. We provide the necessary knowledge and tools with reproducible R codes and chemical-ecological datasets to practice and teach multivariate methods.

  7. Visual Comparison of Multiple Gene Expression Datasets in a Genomic Context

    Directory of Open Access Journals (Sweden)

    Borowski Krzysztof

    2008-06-01

    Full Text Available The need for novel methods of visualizing microarray data is growing. New perspectives are beneficial to finding patterns in expression data. The Bluejay genome browser provides an integrative way of visualizing gene expression datasets in a genomic context. We have now developed the functionality to display multiple microarray datasets simultaneously in Bluejay, in order to provide researchers with a comprehensive view of their datasets linked to a graphical representation of gene function. This will enable biologists to obtain valuable insights on expression patterns, by allowing them to analyze the expression values in relation to the gene locations as well as to compare expression profiles of related genomes or of di erent experiments for the same genome.

  8. Correction of elevation offsets in multiple co-located lidar datasets

    Science.gov (United States)

    Thompson, David M.; Dalyander, P. Soupy; Long, Joseph W.; Plant, Nathaniel G.

    2017-04-07

    IntroductionTopographic elevation data collected with airborne light detection and ranging (lidar) can be used to analyze short- and long-term changes to beach and dune systems. Analysis of multiple lidar datasets at Dauphin Island, Alabama, revealed systematic, island-wide elevation differences on the order of 10s of centimeters (cm) that were not attributable to real-world change and, therefore, were likely to represent systematic sampling offsets. These offsets vary between the datasets, but appear spatially consistent within a given survey. This report describes a method that was developed to identify and correct offsets between lidar datasets collected over the same site at different times so that true elevation changes over time, associated with sediment accumulation or erosion, can be analyzed.

  9. A Bayesian trans-dimensional approach for the fusion of multiple geophysical datasets

    Science.gov (United States)

    JafarGandomi, Arash; Binley, Andrew

    2013-09-01

    We propose a Bayesian fusion approach to integrate multiple geophysical datasets with different coverage and sensitivity. The fusion strategy is based on the capability of various geophysical methods to provide enough resolution to identify either subsurface material parameters or subsurface structure, or both. We focus on electrical resistivity as the target material parameter and electrical resistivity tomography (ERT), electromagnetic induction (EMI), and ground penetrating radar (GPR) as the set of geophysical methods. However, extending the approach to different sets of geophysical parameters and methods is straightforward. Different geophysical datasets are entered into a trans-dimensional Markov chain Monte Carlo (McMC) search-based joint inversion algorithm. The trans-dimensional property of the McMC algorithm allows dynamic parameterisation of the model space, which in turn helps to avoid bias of the post-inversion results towards a particular model. Given that we are attempting to develop an approach that has practical potential, we discretize the subsurface into an array of one-dimensional earth-models. Accordingly, the ERT data that are collected by using two-dimensional acquisition geometry are re-casted to a set of equivalent vertical electric soundings. Different data are inverted either individually or jointly to estimate one-dimensional subsurface models at discrete locations. We use Shannon's information measure to quantify the information obtained from the inversion of different combinations of geophysical datasets. Information from multiple methods is brought together via introducing joint likelihood function and/or constraining the prior information. A Bayesian maximum entropy approach is used for spatial fusion of spatially dispersed estimated one-dimensional models and mapping of the target parameter. We illustrate the approach with a synthetic dataset and then apply it to a field dataset. We show that the proposed fusion strategy is

  10. Surgical management of complete penile duplication accompanied by multiple anomalies.

    Science.gov (United States)

    Karaca, Irfan; Turk, Erdal; Ucan, A Basak; Yayla, Derya; Itirli, Gulcin; Ercal, Derya

    2014-09-01

    Diphallus (penile duplication) is very rare and seen once every 5.5 million births. It can be isolated, but is usually accompanied by other congenital anomalies. Previous studies have reported many concurrent anomalies, such as bladder extrophy, cloacal extrophy, duplicated bladder, scrotal abnormalities, hypospadias, separated symphysis pubis, intestinal anomalies and imperforate anus; no penile duplication case accompanied by omphalocele has been reported. We present the surgical management of a patient with multiple anomalies, including complete penile duplication, hypo-gastric omphalocele and extrophic rectal duplication.

  11. Compilation and analysis of multiple groundwater-quality datasets for Idaho

    Science.gov (United States)

    Hundt, Stephen A.; Hopkins, Candice B.

    2018-05-09

    Groundwater is an important source of drinking and irrigation water throughout Idaho, and groundwater quality is monitored by various Federal, State, and local agencies. The historical, multi-agency records of groundwater quality include a valuable dataset that has yet to be compiled or analyzed on a statewide level. The purpose of this study is to combine groundwater-quality data from multiple sources into a single database, to summarize this dataset, and to perform bulk analyses to reveal spatial and temporal patterns of water quality throughout Idaho. Data were retrieved from the Water Quality Portal (https://www.waterqualitydata.us/), the Idaho Department of Environmental Quality, and the Idaho Department of Water Resources. Analyses included counting the number of times a sample location had concentrations above Maximum Contaminant Levels (MCL), performing trends tests, and calculating correlations between water-quality analytes. The water-quality database and the analysis results are available through USGS ScienceBase (https://doi.org/10.5066/F72V2FBG).

  12. MACSIMS : multiple alignment of complete sequences information management system

    Directory of Open Access Journals (Sweden)

    Plewniak Frédéric

    2006-06-01

    Full Text Available Abstract Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at http://bips.u-strasbg.fr/MACSIMS/.

  13. Imputation and quality control steps for combining multiple genome-wide datasets

    Directory of Open Access Journals (Sweden)

    Shefali S Verma

    2014-12-01

    Full Text Available The electronic MEdical Records and GEnomics (eMERGE network brings together DNA biobanks linked to electronic health records (EHRs from multiple institutions. Approximately 52,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes, and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2 were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.

  14. Mapping of government land encroachment in Cameron Highlands using multiple remote sensing datasets

    International Nuclear Information System (INIS)

    Zin, M H M; Ahmad, B

    2014-01-01

    The cold and refreshing highland weather is one of the factors that give impact to socio-economic growth in Cameron Highlands. This unique weather of the highland surrounded by tropical rain forest can only be found in a few places in Malaysia. It makes this place a famous tourism attraction and also provides a very suitable temperature for agriculture activities. Thus it makes agriculture such as tea plantation, vegetable, fruits and flowers one of the biggest economic activities in Cameron Highlands. However unauthorized agriculture activities are rampant. The government land, mostly forest area have been encroached by farmers, in many cases indiscriminately cutting down trees and hill slopes. This study is meant to detect and assess this encroachment using multiple remote sensing datasets. The datasets were used together with cadastral parcel data where survey lines describe property boundary, pieces of land are subdivided into lots of government and private. The general maximum likelihood classification method was used on remote sensing image to classify the land-cover in the study area. Ground truth data from field observation were used to assess the accuracy of the classification. Cadastral parcel data was overlaid on the classification map in order to detect the encroachment area. The result of this study shows that there is a land cover change of 93.535 ha in the government land of the study area between years 2001 to 2010, nevertheless almost no encroachment took place in the studied forest reserve area. The result of this study will be useful for the authority in monitoring and managing the forest

  15. Mapping of government land encroachment in Cameron Highlands using multiple remote sensing datasets

    Science.gov (United States)

    Zin, M. H. M.; Ahmad, B.

    2014-02-01

    The cold and refreshing highland weather is one of the factors that give impact to socio-economic growth in Cameron Highlands. This unique weather of the highland surrounded by tropical rain forest can only be found in a few places in Malaysia. It makes this place a famous tourism attraction and also provides a very suitable temperature for agriculture activities. Thus it makes agriculture such as tea plantation, vegetable, fruits and flowers one of the biggest economic activities in Cameron Highlands. However unauthorized agriculture activities are rampant. The government land, mostly forest area have been encroached by farmers, in many cases indiscriminately cutting down trees and hill slopes. This study is meant to detect and assess this encroachment using multiple remote sensing datasets. The datasets were used together with cadastral parcel data where survey lines describe property boundary, pieces of land are subdivided into lots of government and private. The general maximum likelihood classification method was used on remote sensing image to classify the land-cover in the study area. Ground truth data from field observation were used to assess the accuracy of the classification. Cadastral parcel data was overlaid on the classification map in order to detect the encroachment area. The result of this study shows that there is a land cover change of 93.535 ha in the government land of the study area between years 2001 to 2010, nevertheless almost no encroachment took place in the studied forest reserve area. The result of this study will be useful for the authority in monitoring and managing the forest.

  16. An integrated pan-tropical biomass map using multiple reference datasets

    NARCIS (Netherlands)

    Avitabile, V.; Herold, M.; Heuvelink, G.B.M.; Lewis, S.L.; Phillips, O.L.; Asner, G.P.; Armston, J.; Asthon, P.; Banin, L.F.; Bayol, N.; Berry, N.; Boeckx, P.; Jong, De B.; Devries, B.; Girardin, C.; Kearsley, E.; Lindsell, J.A.; Lopez-gonzalez, G.; Lucas, R.; Malhi, Y.; Morel, A.; Mitchard, E.; Nagy, L.; Qie, L.; Quinones, M.; Ryan, C.M.; Slik, F.; Sunderland, T.; Vaglio Laurin, G.; Valentini, R.; Verbeeck, H.; Wijaya, A.; Willcock, S.

    2016-01-01

    We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of

  17. Observed climate variability over Chad using multiple observational and reanalysis datasets

    Science.gov (United States)

    Maharana, Pyarimohan; Abdel-Lathif, Ahmat Younous; Pattnayak, Kanhu Charan

    2018-03-01

    Chad is the largest of Africa's landlocked countries and one of the least studied region of the African continent. The major portion of Chad lies in the Sahel region, which is known for its rapid climate change. In this study, multiple observational datasets are analyzed from 1950 to 2014, in order to examine the trend of precipitation and temperature along with their variability over Chad to understand possible impacts of climate change over this region. Trend analysis of the climatic fields has been carried out using Mann-Kendall test. The precipitation over Chad is mostly contributed during summer by West African Monsoon, with maximum northward limit of 18° N. The Atlantic Ocean as well as the Mediterranean Sea are the major source of moisture for the summer rainfall over Chad. Based on the rainfall time series, the entire study period has been divided in to wet (1950 to 1965), dry (1966 to 1990) and recovery period (1991 to 2014). The rainfall has decreased drastically for almost 3 decades during the dry period resulted into various drought years. The temperature increases at a rate of 0.15 °C/decade during the entire period of analysis. The seasonal rainfall as well as temperature plays a major role in the change of land use/cover. The decrease of monsoon rainfall during the dry period reduces the C4 cover drastically; this reduction of C4 grass cover leads to increase of C3 grass cover. The slow revival of rainfall is still not good enough for the increase of shrub cover but it favors the gradual reduction of bare land over Chad.

  18. An integrated pan-tropical biomass map using multiple reference datasets

    OpenAIRE

    Avitabile, V.; Herold, M.; Heuvelink, G. B. M.; Lewis, S. L.; Phillips, O. L.; Asner, G. P.; Armston, J.; Ashton, P. S.; Banin, L.; Bayol, N.; Berry, N. J.; Boeckx, P.; de Jong, B. H. J.; DeVries, B.; Girardin, C. A. J.

    2016-01-01

    We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of field observations and locally calibrated high-resolution biomass maps, harmonized and upscaled to 14 477 1-km AGB estimates. Our data fusion approach uses bias removal and weighted linear averaging...

  19. Analysis of Naïve Bayes Algorithm for Email Spam Filtering across Multiple Datasets

    Science.gov (United States)

    Fitriah Rusland, Nurul; Wahid, Norfaradilla; Kasim, Shahreen; Hafit, Hanayanti

    2017-08-01

    E-mail spam continues to become a problem on the Internet. Spammed e-mail may contain many copies of the same message, commercial advertisement or other irrelevant posts like pornographic content. In previous research, different filtering techniques are used to detect these e-mails such as using Random Forest, Naïve Bayesian, Support Vector Machine (SVM) and Neutral Network. In this research, we test Naïve Bayes algorithm for e-mail spam filtering on two datasets and test its performance, i.e., Spam Data and SPAMBASE datasets [8]. The performance of the datasets is evaluated based on their accuracy, recall, precision and F-measure. Our research use WEKA tool for the evaluation of Naïve Bayes algorithm for e-mail spam filtering on both datasets. The result shows that the type of email and the number of instances of the dataset has an influence towards the performance of Naïve Bayes.

  20. SAR image dataset of military ground targets with multiple poses for ATR

    Science.gov (United States)

    Belloni, Carole; Balleri, Alessio; Aouf, Nabil; Merlet, Thomas; Le Caillec, Jean-Marc

    2017-10-01

    Automatic Target Recognition (ATR) is the task of automatically detecting and classifying targets. Recognition using Synthetic Aperture Radar (SAR) images is interesting because SAR images can be acquired at night and under any weather conditions, whereas optical sensors operating in the visible band do not have this capability. Existing SAR ATR algorithms have mostly been evaluated using the MSTAR dataset.1 The problem with the MSTAR is that some of the proposed ATR methods have shown good classification performance even when targets were hidden,2 suggesting the presence of a bias in the dataset. Evaluations of SAR ATR techniques are currently challenging due to the lack of publicly available data in the SAR domain. In this paper, we present a high resolution SAR dataset consisting of images of a set of ground military target models taken at various aspect angles, The dataset can be used for a fair evaluation and comparison of SAR ATR algorithms. We applied the Inverse Synthetic Aperture Radar (ISAR) technique to echoes from targets rotating on a turntable and illuminated with a stepped frequency waveform. The targets in the database consist of four variants of two 1.7m-long models of T-64 and T-72 tanks. The gun, the turret position and the depression angle are varied to form 26 different sequences of images. The emitted signal spanned the frequency range from 13 GHz to 18 GHz to achieve a bandwidth of 5 GHz sampled with 4001 frequency points. The resolution obtained with respect to the size of the model targets is comparable to typical values obtained using SAR airborne systems. Single polarized images (Horizontal-Horizontal) are generated using the backprojection algorithm.3 A total of 1480 images are produced using a 20° integration angle. The images in the dataset are organized in a suggested training and testing set to facilitate a standard evaluation of SAR ATR algorithms.

  1. Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets

    Directory of Open Access Journals (Sweden)

    Karacali Bilge

    2007-10-01

    Full Text Available Abstract Background Independently derived expression profiles of the same biological condition often have few genes in common. In this study, we created populations of expression profiles from publicly available microarray datasets of cancer (breast, lymphoma and renal samples linked to clinical information with an iterative machine learning algorithm. ROC curves were used to assess the prediction error of each profile for classification. We compared the prediction error of profiles correlated with molecular phenotype against profiles correlated with relapse-free status. Prediction error of profiles identified with supervised univariate feature selection algorithms were compared to profiles selected randomly from a all genes on the microarray platform and b a list of known disease-related genes (a priori selection. We also determined the relevance of expression profiles on test arrays from independent datasets, measured on either the same or different microarray platforms. Results Highly discriminative expression profiles were produced on both simulated gene expression data and expression data from breast cancer and lymphoma datasets on the basis of ER and BCL-6 expression, respectively. Use of relapse-free status to identify profiles for prognosis prediction resulted in poorly discriminative decision rules. Supervised feature selection resulted in more accurate classifications than random or a priori selection, however, the difference in prediction error decreased as the number of features increased. These results held when decision rules were applied across-datasets to samples profiled on the same microarray platform. Conclusion Our results show that many gene sets predict molecular phenotypes accurately. Given this, expression profiles identified using different training datasets should be expected to show little agreement. In addition, we demonstrate the difficulty in predicting relapse directly from microarray data using supervised machine

  2. An integrated pan-tropical biomass map using multiple reference datasets.

    Science.gov (United States)

    Avitabile, Valerio; Herold, Martin; Heuvelink, Gerard B M; Lewis, Simon L; Phillips, Oliver L; Asner, Gregory P; Armston, John; Ashton, Peter S; Banin, Lindsay; Bayol, Nicolas; Berry, Nicholas J; Boeckx, Pascal; de Jong, Bernardus H J; DeVries, Ben; Girardin, Cecile A J; Kearsley, Elizabeth; Lindsell, Jeremy A; Lopez-Gonzalez, Gabriela; Lucas, Richard; Malhi, Yadvinder; Morel, Alexandra; Mitchard, Edward T A; Nagy, Laszlo; Qie, Lan; Quinones, Marcela J; Ryan, Casey M; Ferry, Slik J W; Sunderland, Terry; Laurin, Gaia Vaglio; Gatti, Roberto Cazzolla; Valentini, Riccardo; Verbeeck, Hans; Wijaya, Arief; Willcock, Simon

    2016-04-01

    We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of field observations and locally calibrated high-resolution biomass maps, harmonized and upscaled to 14 477 1-km AGB estimates. Our data fusion approach uses bias removal and weighted linear averaging that incorporates and spatializes the biomass patterns indicated by the reference data. The method was applied independently in areas (strata) with homogeneous error patterns of the input (Saatchi and Baccini) maps, which were estimated from the reference data and additional covariates. Based on the fused map, we estimated AGB stock for the tropics (23.4 N-23.4 S) of 375 Pg dry mass, 9-18% lower than the Saatchi and Baccini estimates. The fused map also showed differing spatial patterns of AGB over large areas, with higher AGB density in the dense forest areas in the Congo basin, Eastern Amazon and South-East Asia, and lower values in Central America and in most dry vegetation areas of Africa than either of the input maps. The validation exercise, based on 2118 estimates from the reference dataset not used in the fusion process, showed that the fused map had a RMSE 15-21% lower than that of the input maps and, most importantly, nearly unbiased estimates (mean bias 5 Mg dry mass ha(-1) vs. 21 and 28 Mg ha(-1) for the input maps). The fusion method can be applied at any scale including the policy-relevant national level, where it can provide improved biomass estimates by integrating existing regional biomass maps as input maps and additional, country-specific reference datasets. © 2015 John Wiley & Sons Ltd.

  3. Quantifying selective reporting and the Proteus phenomenon for multiple datasets with similar bias.

    Directory of Open Access Journals (Sweden)

    Thomas Pfeiffer

    2011-03-01

    Full Text Available Meta-analyses play an important role in synthesizing evidence from diverse studies and datasets that address similar questions. A major obstacle for meta-analyses arises from biases in reporting. In particular, it is speculated that findings which do not achieve formal statistical significance are less likely reported than statistically significant findings. Moreover, the patterns of bias can be complex and may also depend on the timing of the research results and their relationship with previously published work. In this paper, we present an approach that is specifically designed to analyze large-scale datasets on published results. Such datasets are currently emerging in diverse research fields, particularly in molecular medicine. We use our approach to investigate a dataset on Alzheimer's disease (AD that covers 1167 results from case-control studies on 102 genetic markers. We observe that initial studies on a genetic marker tend to be substantially more biased than subsequent replications. The chances for initial, statistically non-significant results to be published are estimated to be about 44% (95% CI, 32% to 63% relative to statistically significant results, while statistically non-significant replications have almost the same chance to be published as statistically significant replications (84%; 95% CI, 66% to 107%. Early replications tend to be biased against initial findings, an observation previously termed Proteus phenomenon: The chances for non-significant studies going in the same direction as the initial result are estimated to be lower than the chances for non-significant studies opposing the initial result (73%; 95% CI, 55% to 96%. Such dynamic patterns in bias are difficult to capture by conventional methods, where typically simple publication bias is assumed to operate. Our approach captures and corrects for complex dynamic patterns of bias, and thereby helps generating conclusions from published results that are more robust

  4. Investigating water budget dynamics in 18 river basins across the Tibetan Plateau through multiple datasets

    Science.gov (United States)

    Liu, Wenbin; Sun, Fubao; Li, Yanzhong; Zhang, Guoqing; Sang, Yan-Fang; Lim, Wee Ho; Liu, Jiahong; Wang, Hong; Bai, Peng

    2018-01-01

    The dynamics of basin-scale water budgets over the Tibetan Plateau (TP) are not well understood nowadays due to the lack of in situ hydro-climatic observations. In this study, we investigate the seasonal cycles and trends of water budget components (e.g. precipitation P, evapotranspiration ET and runoff Q) in 18 TP river basins during the period 1982-2011 through the use of multi-source datasets (e.g. in situ observations, satellite retrievals, reanalysis outputs and land surface model simulations). A water balance-based two-step procedure, which considers the changes in basin-scale water storage on the annual scale, is also adopted to calculate actual ET. The results indicated that precipitation (mainly snowfall from mid-autumn to next spring), which are mainly concentrated during June-October (varied among different monsoons-impacted basins), was the major contributor to the runoff in TP basins. The P, ET and Q were found to marginally increase in most TP basins during the past 30 years except for the upper Yellow River basin and some sub-basins of Yalong River, which were mainly affected by the weakening east Asian monsoon. Moreover, the aridity index (PET/P) and runoff coefficient (Q/P) decreased slightly in most basins, which were in agreement with the warming and moistening climate in the Tibetan Plateau. The results obtained demonstrated the usefulness of integrating multi-source datasets to hydrological applications in the data-sparse regions. More generally, such an approach might offer helpful insights into understanding the water and energy budgets and sustainability of water resource management practices of data-sparse regions in a changing environment.

  5. Integrative analysis of multiple diverse omics datasets by sparse group multitask regression

    Directory of Open Access Journals (Sweden)

    Dongdong eLin

    2014-10-01

    Full Text Available A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: 1 treat the biomarker identification in each single study as a task and then combine them by multitask learning; 2 group variables from all studies for identifying significant genes; 3 enforce sparse constraint on groups of variables to overcome the ‘small sample, but large variables’ problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed

  6. Identification of dust storm source areas in West Asia using multiple environmental datasets.

    Science.gov (United States)

    Cao, Hui; Amiraslani, Farshad; Liu, Jian; Zhou, Na

    2015-01-01

    Sand and Dust storms are common phenomena in arid and semi-arid areas. West Asia Region, especially Tigris-Euphrates alluvial plain, has been recognized as one of the most important dust source areas in the world. In this paper, a method is applied to extract SDS (Sand and Dust Storms) sources in West Asia region using thematic maps, climate and geography, HYSPLIT model and satellite images. Out of 50 dust storms happened during 2000-2013 and collected in form of MODIS images, 27 events were incorporated as demonstrations of the simulated trajectories by HYSPLIT model. Besides, a dataset of the newly released Landsat images was used as base-map for the interpretation of SDS source regions. As a result, six main clusters were recognized as dust source areas. Of which, 3 clusters situated in Tigris-Euphrates plain were identified as severe SDS sources (including 70% dust storms in this research). Another cluster in Sistan plain is also a potential source area. This approach also confirmed six main paths causing dust storms. These paths are driven by the climate system including Siberian and Polar anticyclones, monsoon from Indian Subcontinent and depression from north of Africa. The identification of SDS source areas and paths will improve our understandings on the mechanisms and impacts of dust storms on socio-economy and environment of the region. Copyright © 2014 Elsevier B.V. All rights reserved.

  7. Search for MH370: New Geologic Insights Gained from Integrating Multiple Geophysical Datasets

    Science.gov (United States)

    McBee, J.; Gharib, J. J.; Ingle, S.

    2017-12-01

    During the search for the missing flight MH370, Fugro acquired the largest extent of high resolution bathymetry in the southern Indian Ocean to date. These recently released multibeam echosounder (MBES) backscatter, bathymetry, water column, and sub-bottom profiler data reveal additional insights into the characteristics of the Indian Ocean seafloor and the geologic and oceanographic processes that shaped it. The mapping is at a sufficient resolution to examine relict spreading texture such as fracture zones, pseudofaults, failed propogating rifts, devals (deviations from axial linearity), etc. In this presentation, we will highlight several prominent regional seafloor features, and illustrate insights gained by integrating MBES backscatter and water column data with bathymetric analyses. Backscatter data to the north of the southern flank of Broken Ridge illustrate the complexity to which sediment has been reworked downslope where intricate patterns of low backscatter intensities are observed. Here, exposed rocks form prominent high backscatter reflectors amid the surrounding low backscatter sediments. The lateral extent of high backscatter intensity reflectors south of the Diamantina reveals the expansiveness of exposed igneous rocks that resulted from seafloor spreading. Volcanic features, including off-axis volcanoes and leaky transforms are also interpreted as high backscatter anomalies in the tectonic speading fabric to the north of the Geelvinck fracture zone, towards the southern extent of the dataset. These and other examples show that by integrating the entire suite of data collected by MBES systems, more detailed interpretations of the geologic processes that shaped the seafloor may be gained than by examination of bathymetry alone.

  8. Inferring Ice Thickness from a Glacier Dynamics Model and Multiple Surface Datasets.

    Science.gov (United States)

    Guan, Y.; Haran, M.; Pollard, D.

    2017-12-01

    The future behavior of the West Antarctic Ice Sheet (WAIS) may have a major impact on future climate. For instance, ice sheet melt may contribute significantly to global sea level rise. Understanding the current state of WAIS is therefore of great interest. WAIS is drained by fast-flowing glaciers which are major contributors to ice loss. Hence, understanding the stability and dynamics of glaciers is critical for predicting the future of the ice sheet. Glacier dynamics are driven by the interplay between the topography, temperature and basal conditions beneath the ice. A glacier dynamics model describes the interactions between these processes. We develop a hierarchical Bayesian model that integrates multiple ice sheet surface data sets with a glacier dynamics model. Our approach allows us to (1) infer important parameters describing the glacier dynamics, (2) learn about ice sheet thickness, and (3) account for errors in the observations and the model. Because we have relatively dense and accurate ice thickness data from the Thwaites Glacier in West Antarctica, we use these data to validate the proposed approach. The long-term goal of this work is to have a general model that may be used to study multiple glaciers in the Antarctic.

  9. Simultaneity, Sequentiality, and Speed: Organizational Messages about Multiple-Task Completion

    Science.gov (United States)

    Stephens, Keri K.; Cho, Jaehee K.; Ballard, Dawna I.

    2012-01-01

    Workplace norms for task completion increasingly value speed and the ability to accomplish multiple tasks at once. This study situates this popularized issue of multitasking within the context of chronemics scholarship by addressing related issues of simultaneity, sequentiality, and speed. Ultimately, we consider 2 multiple-task completion…

  10. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Yu-Wei [Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Simmons, Blake A. [Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Singer, Steven W. [Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2015-10-29

    The recovery of genomes from metagenomic datasets is a critical step to defining the functional roles of the underlying uncultivated populations. We previously developed MaxBin, an automated binning approach for high-throughput recovery of microbial genomes from metagenomes. Here, we present an expanded binning algorithm, MaxBin 2.0, which recovers genomes from co-assembly of a collection of metagenomic datasets. Tests on simulated datasets revealed that MaxBin 2.0 is highly accurate in recovering individual genomes, and the application of MaxBin 2.0 to several metagenomes from environmental samples demonstrated that it could achieve two complementary goals: recovering more bacterial genomes compared to binning a single sample as well as comparing the microbial community composition between different sampling environments. Availability and implementation: MaxBin 2.0 is freely available at http://sourceforge.net/projects/maxbin/ under BSD license. Supplementary information: Supplementary data are available at Bioinformatics online.

  11. A geospatial database model for the management of remote sensing datasets at multiple spectral, spatial, and temporal scales

    Science.gov (United States)

    Ifimov, Gabriela; Pigeau, Grace; Arroyo-Mora, J. Pablo; Soffer, Raymond; Leblanc, George

    2017-10-01

    In this study the development and implementation of a geospatial database model for the management of multiscale datasets encompassing airborne imagery and associated metadata is presented. To develop the multi-source geospatial database we have used a Relational Database Management System (RDBMS) on a Structure Query Language (SQL) server which was then integrated into ArcGIS and implemented as a geodatabase. The acquired datasets were compiled, standardized, and integrated into the RDBMS, where logical associations between different types of information were linked (e.g. location, date, and instrument). Airborne data, at different processing levels (digital numbers through geocorrected reflectance), were implemented in the geospatial database where the datasets are linked spatially and temporally. An example dataset consisting of airborne hyperspectral imagery, collected for inter and intra-annual vegetation characterization and detection of potential hydrocarbon seepage events over pipeline areas, is presented. Our work provides a model for the management of airborne imagery, which is a challenging aspect of data management in remote sensing, especially when large volumes of data are collected.

  12. An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency.

    Science.gov (United States)

    Guo, Wei-Li; Huang, De-Shuang

    2017-08-22

    Transcription factors (TFs) are DNA-binding proteins that have a central role in regulating gene expression. Identification of DNA-binding sites of TFs is a key task in understanding transcriptional regulation, cellular processes and disease. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) enables genome-wide identification of in vivo TF binding sites. However, it is still difficult to map every TF in every cell line owing to cost and biological material availability, which poses an enormous obstacle for integrated analysis of gene regulation. To address this problem, we propose a novel computational approach, TFBSImpute, for predicting additional TF binding profiles by leveraging information from available ChIP-seq TF binding data. TFBSImpute fuses the dataset to a 3-mode tensor and imputes missing TF binding signals via simultaneous completion of multiple TF binding matrices with positional consistency. We show that signals predicted by our method achieve overall similarity with experimental data and that TFBSImpute significantly outperforms baseline approaches, by assessing the performance of imputation methods against observed ChIP-seq TF binding profiles. Besides, motif analysis shows that TFBSImpute preforms better in capturing binding motifs enriched in observed data compared with baselines, indicating that the higher performance of TFBSImpute is not simply due to averaging related samples. We anticipate that our approach will constitute a useful complement to experimental mapping of TF binding, which is beneficial for further study of regulation mechanisms and disease.

  13. Monitoring the Lavina di Roncovetro (RE, Italy) landslide by integrating traditional monitoring systems and multiple high-resolution topographic datasets

    Science.gov (United States)

    Fornaciai, Alessandro; Favalli, Massimiliano; Gigli, Giovanni; Nannipieri, Luca; Mucchi, Lorenzo; Intieri, Emanuele; Agostini, Andrea; Pizziolo, Marco; Bertolini, Giovanni; Trippi, Federico; Casagli, Nicola; Schina, Rosa; Carnevale, Ennio

    2016-04-01

    Roncovetro Landslide were generated at different times. The 3D models are then georeferenced and the digital elevation models (DEMs) created. By comparing the obtained DEMs, changes in the investigated area were detected and the sediment volumes, as well as the 3D displacement at the most active parts of the landslide quantified. In this work, we test the performance of the SFM techniques applied on active landslide by comparing them with the traditional monitoring systems, highlighting the strengths and weaknesses of both methods. In addition, we show the preliminary results obtained integrating the traditional monitoring systems and the multiple high-resolution topographic datasets, over a period of more than one year, used for investigating the spatial and the temporal evolution of the upper sector of the Roncovetro landslide.

  14. Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach.

    Directory of Open Access Journals (Sweden)

    Nilotpal Chowdhury

    Full Text Available Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis.The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets.Four microarray series (having 742 patients were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate - adjusted for expression of Cell cycle related genes and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA.Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed.To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and

  15. Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach.

    Science.gov (United States)

    Chowdhury, Nilotpal; Sapru, Shantanu

    2015-01-01

    Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis. The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS) in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets. Four microarray series (having 742 patients) were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate - adjusted for expression of Cell cycle related genes) and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA). Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM) gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed. To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and interesting

  16. Barriers and facilitators associated with colonoscopy completion in individuals with multiple chronic conditions: a qualitative study

    Directory of Open Access Journals (Sweden)

    Sultan S

    2017-05-01

    Full Text Available Shahnaz Sultan,1–4 Melissa R Partin,1,2 Phalgoon Shah,5 Jennifer LeLaurin,4 Ivette Magaly Freytes,4 Chandylen L Nightingale,6 Susan F Fesperman,4 Barbara A Curbow,7 Rebecca J Beyth3,4,8 1Center for Chronic Disease Outcomes Research, Minneapolis Veterans Affairs Health Care System, 2Department of Medicine, University of Minnesota, Minneapolis, MN, 3Department of Medicine, University of Florida College of Medicine, Gainesville, FL, 4Center of Innovation on Disability and Rehabilitation Research, North Florida/South Georgia Veterans Health System, Gainesville, FL, 5Department of Medicine, Tripler Army Medical Center, Honolulu, HI, 6Department of Social Sciences and Health Policy, Wake Forest School of Medicine, Winston-Salem NC, 7Department of Community and Behavioral Health, University of Maryland, College Park, MD, 8Geriatric Research, Education and Clinical Center, North Florida/South Georgia Veterans Health System, Gainesville, FL, USA Background: A recommendation to undergo a colonoscopy, an invasive procedure that requires commitment and motivation, planning (scheduling and finding a driver and preparation (diet restriction and laxative consumption, may be uniquely challenging for individuals with multiple chronic conditions (MCCs. This qualitative study aimed to describe the barriers and facilitators to colonoscopy experienced by such patients.Materials and methods: Semistructured focus groups were conducted with male Veterans who were scheduled for outpatient colonoscopy and either failed to complete the procedure or completed the examination. Focus group recordings were transcribed and analyzed by an inductive grounded approach using constant comparative analysis.Results: Forty-four individuals aged 51–83 years participated in this study (23 adherent and 21 nonadherent. Participants had an average of 7.4 chronic conditions (range 2–14. The five most common chronic conditions were hypertension (75%, hyperlipidemia (75

  17. Global estimates of CO sources with high resolution by adjoint inversion of multiple satellite datasets (MOPITT, AIRS, SCIAMACHY, TES

    Directory of Open Access Journals (Sweden)

    M. Kopacz

    2010-02-01

    Full Text Available We combine CO column measurements from the MOPITT, AIRS, SCIAMACHY, and TES satellite instruments in a full-year (May 2004–April 2005 global inversion of CO sources at 4°×5° spatial resolution and monthly temporal resolution. The inversion uses the GEOS-Chem chemical transport model (CTM and its adjoint applied to MOPITT, AIRS, and SCIAMACHY. Observations from TES, surface sites (NOAA/GMD, and aircraft (MOZAIC are used for evaluation of the a posteriori solution. Using GEOS-Chem as a common intercomparison platform shows global consistency between the different satellite datasets and with the in situ data. Differences can be largely explained by different averaging kernels and a priori information. The global CO emission from combustion as constrained in the inversion is 1350 Tg a−1. This is much higher than current bottom-up emission inventories. A large fraction of the correction results from a seasonal underestimate of CO sources at northern mid-latitudes in winter and suggests a larger-than-expected CO source from vehicle cold starts and residential heating. Implementing this seasonal variation of emissions solves the long-standing problem of models underestimating CO in the northern extratropics in winter-spring. A posteriori emissions also indicate a general underestimation of biomass burning in the GFED2 inventory. However, the tropical biomass burning constraints are not quantitatively consistent across the different datasets.

  18. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    The datasets presented in this article are related to the research articles entitled “Neutrophil Extracellular Traps in Ulcerative Colitis: A Proteome Analysis of Intestinal Biopsies” (Bennike et al., 2015 [1]), and “Proteome Analysis of Rheumatoid Arthritis Gut Mucosa” (Bennike et al., 2017 [2])...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  19. On the representability of complete genomes by multiple competing finite-context (Markov models.

    Directory of Open Access Journals (Sweden)

    Armando J Pinho

    Full Text Available A finite-context (Markov model of order k yields the probability distribution of the next symbol in a sequence of symbols, given the recent past up to depth k. Markov modeling has long been applied to DNA sequences, for example to find gene-coding regions. With the first studies came the discovery that DNA sequences are non-stationary: distinct regions require distinct model orders. Since then, Markov and hidden Markov models have been extensively used to describe the gene structure of prokaryotes and eukaryotes. However, to our knowledge, a comprehensive study about the potential of Markov models to describe complete genomes is still lacking. We address this gap in this paper. Our approach relies on (i multiple competing Markov models of different orders (ii careful programming techniques that allow orders as large as sixteen (iii adequate inverted repeat handling (iv probability estimates suited to the wide range of context depths used. To measure how well a model fits the data at a particular position in the sequence we use the negative logarithm of the probability estimate at that position. The measure yields information profiles of the sequence, which are of independent interest. The average over the entire sequence, which amounts to the average number of bits per base needed to describe the sequence, is used as a global performance measure. Our main conclusion is that, from the probabilistic or information theoretic point of view and according to this performance measure, multiple competing Markov models explain entire genomes almost as well or even better than state-of-the-art DNA compression methods, such as XM, which rely on very different statistical models. This is surprising, because Markov models are local (short-range, contrasting with the statistical models underlying other methods, where the extensive data repetitions in DNA sequences is explored, and therefore have a non-local character.

  20. Evaluation of the global MODIS 30 arc-second spatially and temporally complete snow-free land surface albedo and reflectance anisotropy dataset

    Science.gov (United States)

    Sun, Qingsong; Wang, Zhuosen; Li, Zhan; Erb, Angela; Schaaf, Crystal B.

    2017-06-01

    Land surface albedo is an essential variable for surface energy and climate modeling as it describes the proportion of incident solar radiant flux that is reflected from the Earth's surface. To capture the temporal variability and spatial heterogeneity of the land surface, satellite remote sensing must be used to monitor albedo accurately at a global scale. However, large data gaps caused by cloud or ephemeral snow have slowed the adoption of satellite albedo products by the climate modeling community. To address the needs of this community, we used a number of temporal and spatial gap-filling strategies to improve the spatial and temporal coverage of the global land surface MODIS BRDF, albedo and NBAR products. A rigorous evaluation of the gap-filled values shows good agreement with original high quality data (RMSE = 0.027 for the NIR band albedo, 0.020 for the red band albedo). This global snow-free and cloud-free MODIS BRDF and albedo dataset (established from 2001 to 2015) offers unique opportunities to monitor and assess the impact of the changes on the Earth's land surface.

  1. Twente spine model : A complete and coherent dataset for musculo-skeletal modeling of the thoracic and cervical regions of the human spine

    NARCIS (Netherlands)

    Bayoglu, Riza; Geeraedts, Leo; Groenen, Karlijn H.J.; Verdonschot, Nico; Koopman, Bart; Homminga, Jasper

    2017-01-01

    Musculo-skeletal modeling could play a key role in advancing our understanding of the healthy and pathological spine, but the credibility of such models are strictly dependent on the accuracy of the anatomical data incorporated. In this study, we present a complete and coherent musculo-skeletal

  2. Historical Datasets Support Genomic Selection Models for the Prediction of Cotton Fiber Quality Phenotypes Across Multiple Environments.

    Science.gov (United States)

    Gapare, Washington; Liu, Shiming; Conaty, Warren; Zhu, Qian-Hao; Gillespie, Vanessa; Llewellyn, Danny; Stiller, Warwick; Wilson, Iain

    2018-03-20

    Genomic selection (GS) has successfully been used in plant breeding to improve selection efficiency and reduce breeding time and cost. However, there has not been a study to evaluate GS prediction models that may be used for predicting cotton breeding lines across multiple environments. In this study, we evaluated the performance of Bayes Ridge Regression, BayesA, BayesB, BayesC and Reproducing Kernel Hilbert Spaces regression models. We then extended the single-site GS model to accommodate genotype × environment interaction (G×E) in order to assess the merits of multi- over single-environment models in a practical breeding and selection context in cotton, a crop for which this has not previously been evaluated. Our study was based on a population of 215 upland cotton ( Gossypium hirsutum ) breeding lines which were evaluated for fiber length and strength at multiple locations in Australia and genotyped with 13,330 single nucleotide polymorphic (SNP) markers. BayesB, which assumes unique variance for each marker and a proportion of markers to have large effects, while most other markers have zero effect, was the preferred model. GS accuracy for fiber length based on a single-site model varied across sites, ranging from 0.27 to 0.77 (mean = 0.38), while that of fiber strength ranged from 0.19 to 0.58 (mean = 0.35) using randomly selected sub-populations as the training population. Prediction accuracies from the M×E model were higher than those for single-site and across-site models, with an average accuracy of 0.71 and 0.59 for fiber length and strength, respectively. The use of the M×E model could therefore identify which breeding lines have effects that are stable across environments and which ones are responsible for G×E and so reduce the amount of phenotypic screening required in cotton breeding programs to identify adaptable genotypes. Copyright © 2018, G3: Genes, Genomes, Genetics.

  3. Classification of Small-Scale Eucalyptus Plantations Based on NDVI Time Series Obtained from Multiple High-Resolution Datasets

    Directory of Open Access Journals (Sweden)

    Hailang Qiao

    2016-02-01

    Full Text Available Eucalyptus, a short-rotation plantation, has been expanding rapidly in southeast China in recent years owing to its short growth cycle and high yield of wood. Effective identification of eucalyptus, therefore, is important for monitoring land use changes and investigating environmental quality. For this article, we used remote sensing images over 15 years (one per year with a 30-m spatial resolution, including Landsat 5 thematic mapper images, Landsat 7-enhanced thematic mapper images, and HJ 1A/1B images. These data were used to construct a 15-year Normalized Difference Vegetation Index (NDVI time series for several cities in Guangdong Province, China. Eucalyptus reference NDVI time series sub-sequences were acquired, including one-year-long and two-year-long growing periods, using invested eucalyptus samples in the study region. In order to compensate for the discontinuity of the NDVI time series that is a consequence of the relatively coarse temporal resolution, we developed an inverted triangle area methodology. Using this methodology, the images were classified on the basis of the matching degree of the NDVI time series and two reference NDVI time series sub-sequences during the growing period of the eucalyptus rotations. Three additional methodologies (Bounding Envelope, City Block, and Standardized Euclidian Distance were also tested and used as a comparison group. Threshold coefficients for the algorithms were adjusted using commission–omission error criteria. The results show that the triangle area methodology out-performed the other methodologies in classifying eucalyptus plantations. Threshold coefficients and an optimal discriminant function were determined using a mosaic photograph that had been taken by an unmanned aerial vehicle platform. Good stability was found as we performed further validation using multiple-year data from the high-resolution Gaofen Satellite 1 (GF-1 observations of larger regions. Eucalyptus planting dates

  4. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    patients (Morgan et al., 2012; Abraham and Medzhitov, 2011; Bennike, 2014) [8–10. Therefore, we characterized the proteome of colon mucosa biopsies from 10 inflammatory bowel disease ulcerative colitis (UC) patients, 11 gastrointestinal healthy rheumatoid arthritis (RA) patients, and 10 controls. We...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  5. Multiple Paths to Success: Degree Completion of 4-Year Starters Taking Various Pathways

    Science.gov (United States)

    Li, Dai

    2016-01-01

    With the use of data from the institutional research office at a comprehensive public 4-year university, this chapter describes an in-depth analysis of the institutional attendance, transfer, and graduation of three first-time student cohorts, revealing that not all types of multi-institutional attendance hurt degree completion, and strategic…

  6. Day-care treatment for multiple drug abusing adolescents: social factors linked with completing treatment.

    Science.gov (United States)

    Feigelman, W

    1987-01-01

    By identifying some of the social correlates linked with completing day-care drug abuse treatment, the present study has sought to broaden understanding of how drug rehabilitations are effected. As the findings have demonstrated, completing care is a result of a complex array of causes and their interaction. The disposition of the entering patient (i.e., their determination and other strengths) has a great bearing on treatment outcome. It is also a result of the patient's family, their motivations, resources and perseverance in enduring a long course of demanding therapeutic interventions. In addition, it is the product of meanings shared and transmitted between the patient's family and the treatment staff. Patients and their families project positive attitudes about the value of the therapeutic enterprise as well as a compliant demeanor. As staff recognize that patients and parents are acting cooperatively, then such perceptions tend to create self-fulfilling prophecies. The data has established that older adolescent patients are more likely to possess the motivational resources needed for program completion than younger patients. Apparently, self-referred patients are also more inclined to meet the demands of program requirements than those referred by the courts or other outside social agencies, although the differences fell short of the .05 level of statistical significance. Those completing the program are less likely to be diagnosed as depressed at intake. Parental characteristics comprise another group of variables that are related to treatment completion. Parents of higher occupational rank, who have had mental health care for themselves, and who are of Jewish ethnicity appear to possess useful strengths for meeting program challenges. The pattern of spouse mutuality in dealing with a child's needs as it exists preceding and during treatment seems to be another useful asset for successfully getting through this form of treatment. While parents with the

  7. Canonical resolution of the multiplicity problem for U(3): an explicit and complete constructive solution

    International Nuclear Information System (INIS)

    Biedenharn, L.C.; Lohe, M.A.; Louck, J.D.

    1975-01-01

    The multiplicity problem for tensor operators in U(3) has a unique (canonical) resolution which is utilized to effect the explicit construction of all U(3) Wigner and Racah coefficients. Methods are employed which elucidate the structure of the results; in particular, the significance of the denominator functions entering the structure of these coefficients, and the relation of these denominator functions to the null space of the canonical tensor operators. An interesting feature of the denominator functions is the appearance of new, group theoretical, polynomials exhibiting several remarkable and quite unexpected properties. (U.S.)

  8. Resolution of through tubing fluid flow and behind casing fluid flow in multiple completion wells

    International Nuclear Information System (INIS)

    Arnold, D.M.

    1977-01-01

    A method is provided for resolving undesired fluid flow in cement channels behind casing in one producing zone of a multi zone completion well operating on gas lift from the fluid flow from lower producing zones in the same well which is contained in production tubing passing through the producing zone being investigated. Gamma rays which are characteristic of the decay of the unstable isotope nitrogen 16 produced by activation of elemental oxygen nuclei comprising the molecular structure of both the tubing fluid flow and the undesired fluid flow are detected in at least two energy bonds at two longitudinally spaced detectors in a well borehole. By appropriately combining the four count rate signals so producing according to predetermined relationships the two fluid flow components in the same direction may be uniquely distinguished on the basis of their differing distances from the gamma ray detectors. 9 claims, 17 figures

  9. Complete mitochondrial genome phylogeographic analysis of killer whales (Orcinus orca) indicates multiple species

    DEFF Research Database (Denmark)

    Morin, Phillip A; Archer, Frederick I.; Foote, Andrew David

    2010-01-01

    Killer whales (Orcinus orca) currently comprise a single, cosmopolitan species with a diverse diet. However, studies over the last 30 yr have revealed populations of sympatric "ecotypes" with discrete prey preferences, morphology, and behaviors. Although these ecotypes avoid social interactions...... and are not known to interbreed, genetic studies to date have found extremely low levels of diversity in the mitochondrial control region, and few clear phylogeographic patterns worldwide. This low level of diversity is likely due to low mitochondrial mutation rates that are common to cetaceans. Using killer whales...... as a case study, we have developed a method to readily sequence, assemble, and analyze complete mitochondrial genomes from large numbers of samples to more accurately assess phylogeography and estimate divergence times. This represents an important tool for wildlife management, not only for killer whales...

  10. Complete restoration of multiple dystrophin isoforms in genetically corrected Duchenne muscular dystrophy patient–derived cardiomyocytes

    Directory of Open Access Journals (Sweden)

    Susi Zatti

    2014-01-01

    Full Text Available Duchenne muscular dystrophy (DMD–associated cardiac diseases are emerging as a major cause of morbidity and mortality in DMD patients, and many therapies for treatment of skeletal muscle failed to improve cardiac function. The reprogramming of patients' somatic cells into pluripotent stem cells, combined with technologies for correcting the genetic defect, possesses great potential for the development of new treatments for genetic diseases. In this study, we obtained human cardiomyocytes from DMD patient–derived, induced pluripotent stem cells genetically corrected with a human artificial chromosome carrying the whole dystrophin genomic sequence. Stimulation by cytokines was combined with cell culturing on hydrogel with physiological stiffness, allowing an adhesion-dependent maturation and a proper dystrophin expression. The obtained cardiomyocytes showed remarkable sarcomeric organization of cardiac troponin T and α-actinin, expressed cardiac-specific markers, and displayed electrically induced calcium transients lasting less than 1 second. We demonstrated that the human artificial chromosome carrying the whole dystrophin genomic sequence is stably maintained throughout the cardiac differentiation process and that multiple promoters of the dystrophin gene are properly activated, driving expression of different isoforms. These dystrophic cardiomyocytes can be a valuable source for in vitro modeling of DMD-associated cardiac disease. Furthermore, the derivation of genetically corrected, patient-specific cardiomyocytes represents a step toward the development of innovative cell and gene therapy approaches for DMD.

  11. Complete clinical responses to cancer therapy caused by multiple divergent approaches: a repeating theme lost in translation

    Directory of Open Access Journals (Sweden)

    Coventry BJ

    2012-05-01

    Full Text Available Brendon J Coventry, Martin L AshdownDiscipline of Surgery, University of Adelaide, Royal Adelaide Hospital and Faculty of Medicine, University of Melbourne, AustraliaAbstract: Over 50 years of cancer therapy history reveals complete clinical responses (CRs from remarkably divergent forms of therapies (eg, chemotherapy, radiotherapy, surgery, vaccines, autologous cell transfers, cytokines, monoclonal antibodies for advanced solid malignancies occur with an approximately similar frequency of 5%–10%. This has remained frustratingly almost static. However, CRs usually underpin strong durable 5-year patient survival. How can this apparent paradox be explained?Over some 20 years, realization that (1 chronic inflammation is intricately associated with cancer, and (2 the immune system is delicately balanced between responsiveness and tolerance of cancer, provides a greatly significant insight into ways cancer might be more effectively treated. In this review, divergent aspects from the largely segmented literature and recent conferences are drawn together to provide observations revealing some emerging reasoning, in terms of "final common pathways" of cancer cell damage, immune stimulation, and auto-vaccination events, ultimately leading to cancer cell destruction. Created from this is a unifying overarching concept to explain why multiple approaches to cancer therapy can provide complete responses at almost equivalent rates. This "missing" aspect provides a reasoned explanation for what has, and is being, increasingly reported in the mainstream literature – that inflammatory and immune responses appear intricately associated with, if not causative of, complete responses induced by divergent forms of cancer therapy. Curiously, whether by chemotherapy, radiation, surgery, or other means, therapy-induced cell injury results, leaving inflammation and immune system stimulation as a final common denominator across all of these mechanisms of cancer

  12. A Complete Response Case in a Patient with Multiple Lung Metastases of Rectal Cancer Treated with Bevacizumab plus XELIRI Therapy

    Directory of Open Access Journals (Sweden)

    Hiroki Hashida

    2017-01-01

    Full Text Available It has been reported that many patients with lung metastasis of colorectal cancer (CRC underwent chemotherapy with fluorouracil, folinic acid, oxaliplatin, irinotecan, or capecitabine. There is a small number of reports about the capecitabine and irinotecan (XELIRI plus bevacizumab (BV therapy for patients with metastatic CRC in Japan. We report a case of successful BV+XELIRI therapy for rectal cancer with multiple lung metastases as first-line chemotherapy. A 53-year-old female presented with advanced rectal cancer and metastatic lung tumors. Following surgery, the patient was treated with XELIRI+BV. After 6 courses, a computed tomography scan showed complete response of the lung metastases. No recurrence has occurred for 3 years after chemotherapy was stopped.

  13. PROLONGED MULTIPLE SPASMS OF SMOOTH CORONARY ARTERIES PRESENTING AS ACUTE MIOCARDIAL INFARCTION, COMPLETE AV BLOCK AND SYNCOPE

    Directory of Open Access Journals (Sweden)

    Franci Cesar

    2004-11-01

    Full Text Available Background. A variant form of angina pectoris (VAP is caused by coronary vessel spasm and occures in patients with and without varying degrees of obstructive coronary artery disease. Although the prognosis of VAP without significant organic stenosis is generally good, multivessel spasm is associated with a high risk of life-threatening abnormalities of rhythm and conduction.Patient and methods. We describe a patient who presented with prolonged chest pain, associated with hypotension, lost of consciousness, complete AV block and widespread ST segment elevations consistent with inferoanterior acute myocardial infarction. Urgent selective coronary angiography revealed spasms in right coronary artery and in left circumflex artery that were relieved by intracoronary injection of nitroglycerin. All coronary arteries were otherwise patient, without signs of atherosclerosis. The patient was treated with diltiazem and nitrates. She made a complete recovery and resumed her normal activities.Conclusions. Simultaneous multiple spasms of native coronary arteries represent a rare syndrome characterized by significantly higher incidence of potentially life-threatening arrhythmia. Less commonly, prolonged coronary spasm may mimic acute myocardial infarction. Modern management of acute coronary syndromes, including urgent coronarography, enables a prompt differentiation between prolonged coronary spasm and atherosclerotic coronary disease, warranting different treatment strategies. Medical treatment with nitrates and calcium channel blockers in most cases prevents recurrence of vasospasms and arrhythmias.

  14. Complete clinical responses to cancer therapy caused by multiple divergent approaches: a repeating theme lost in translation

    International Nuclear Information System (INIS)

    Coventry, Brendon J; Ashdown, Martin L

    2012-01-01

    Over 50 years of cancer therapy history reveals complete clinical responses (CRs) from remarkably divergent forms of therapies (eg, chemotherapy, radiotherapy, surgery, vaccines, autologous cell transfers, cytokines, monoclonal antibodies) for advanced solid malignancies occur with an approximately similar frequency of 5%–10%. This has remained frustratingly almost static. However, CRs usually underpin strong durable 5-year patient survival. How can this apparent paradox be explained? Over some 20 years, realization that (1) chronic inflammation is intricately associated with cancer, and (2) the immune system is delicately balanced between responsiveness and tolerance of cancer, provides a greatly significant insight into ways cancer might be more effectively treated. In this review, divergent aspects from the largely segmented literature and recent conferences are drawn together to provide observations revealing some emerging reasoning, in terms of “final common pathways” of cancer cell damage, immune stimulation, and auto-vaccination events, ultimately leading to cancer cell destruction. Created from this is a unifying overarching concept to explain why multiple approaches to cancer therapy can provide complete responses at almost equivalent rates. This “missing” aspect provides a reasoned explanation for what has, and is being, increasingly reported in the mainstream literature – that inflammatory and immune responses appear intricately associated with, if not causative of, complete responses induced by divergent forms of cancer therapy. Curiously, whether by chemotherapy, radiation, surgery, or other means, therapy-induced cell injury results, leaving inflammation and immune system stimulation as a final common denominator across all of these mechanisms of cancer therapy. This aspect has been somewhat obscured and has been “lost in translation” to date

  15. Critical analysis of the stringent complete response in multiple myeloma: contribution of sFLC and bone marrow clonality.

    Science.gov (United States)

    Martínez-López, Joaquín; Paiva, Bruno; López-Anglada, Lucía; Mateos, María-Victoria; Cedena, Teresa; Vidríales, María-Belén; Sáez-Gómez, María Auxiliadora; Contreras, Teresa; Oriol, Albert; Rapado, Inmaculada; Teruel, Ana-Isabel; Cordón, Lourdes; Blanchard, María Jesús; Bengoechea, Enrique; Palomera, Luis; de Arriba, Felipe; Cueto-Felgueroso, Cecilia; Orfao, Alberto; Bladé, Joan; San Miguel, Jesús F; Lahuerta, Juan José

    2015-08-13

    Stringent complete response (sCR) criteria are used in multiple myeloma as a deeper response category compared with CR, but prospective validation is lacking, it is not always clear how evaluation of clonality is performed, and is it not known what the relative clinical influence is of the serum free light chain ratio (sFLCr) and bone marrow (BM) clonality to define more sCR. To clarify this controversy, we focused on 94 patients that reached CR, of which 69 (73%) also fulfilled the sCR criteria. Patients with sCR displayed slightly longer time to progression (median, 62 vs 53 months, respectively; P = .31). On analyzing this contribution to the prognosis of sFLCr or clonality, it was found that the sFLCr does not identify patients in CR at distinct risk; by contrast, low-sensitive multiparametric flow cytometry (MFC) immunophenotyping (2 colors), which is equivalent to immunohistochemistry, identifies a small number of patients (5 cases) with high residual tumor burden and dismal outcome; nevertheless, using traditional 4-color MFC, persistent clonal BM disease was detectable in 36% of patients, who, compared with minimal residual disease-negative cases, had a significantly inferior outcome. These results show that the current definition of sCR should be revised. © 2015 by The American Society of Hematology.

  16. PET/CT Improves the Definition of Complete Response and Allows to Detect Otherwise Unidentifiable Skeletal Progression in Multiple Myeloma.

    Science.gov (United States)

    Zamagni, Elena; Nanni, Cristina; Mancuso, Katia; Tacchetti, Paola; Pezzi, Annalisa; Pantani, Lucia; Zannetti, Beatrice; Rambaldi, Ilaria; Brioli, Annamaria; Rocchi, Serena; Terragna, Carolina; Martello, Marina; Marzocchi, Giulia; Borsi, Enrica; Rizzello, Ilaria; Fanti, Stefano; Cavo, Michele

    2015-10-01

    To evaluate the role of 18F-FDG PET/CT in 282 symptomatic multiple myeloma patients treated up-front between 2002 and 2012. All patients were studied by PET/CT at baseline, during posttreatment follow-up, and at the time of relapse. Their median duration of follow-up was 67 months. Forty-two percent of the patients at diagnosis had >3 focal lesions, and in 50% SUVmax was >4.2; extramedullary disease was present in 5%. On multivariate analysis, ISS stage 3, SUVmax >4.2, and failure to achieve best complete response (CR) were the leading factors independently associated with shorter progression-free survival (PFS) and overall survival (OS). These 3 variables were used to construct a prognostic scoring system based on the number of risk factors. After treatment, PET/CT negativity (PET-neg) was observed in 70% of patients, whereas conventionally defined CR was achieved in 53%. Attainment of PET-neg favorably influenced PFS and OS. PET-neg was an independent predictor of prolonged PFS and OS for patients with conventionally defined CR. Sixty-three percent of patients experienced relapse or progression; in 12%, skeletal progression was exclusively detected by systematic PET/CT performed during follow-up. A multivariate analysis revealed that persistence of SUVmax >4.2 following first-line treatment was independently associated with exclusive PET/CT progression. PET/CT combined with ISS stage and achievement or not of CR on first-line therapy sorted patients into different prognostic groups. PET/CT led to a more careful evaluation of CR. Finally, in patients with persistent high glucose metabolism after first-line treatment, PET/CT can be recommended during follow-up, to screen for otherwise unidentifiable progression. ©2015 American Association for Cancer Research.

  17. EPA Nanorelease Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — EPA Nanorelease Dataset. This dataset is associated with the following publication: Wohlleben, W., C. Kingston, J. Carter, E. Sahle-Demessie, S. Vazquez-Campos, B....

  18. A re-consideration of the taxonomic status of Nebria lacustris Casey (Coleoptera, Carabidae, Nebriini based on multiple datasets – a single species or a species complex?

    Directory of Open Access Journals (Sweden)

    David Kavanaugh

    2011-11-01

    Full Text Available This study gathered evidence from principal component analysis (PCA of morphometric data and molecular analyses of nucleotide sequence data for four nuclear genes (28S, TpI, CAD1, and Wg and two mitochondrial genes (COI and 16S, using parsimony, maximum likelihood, and Bayesian methods. This evidence was combined with morphological and chorological data to re-evaluate the taxonomic status of Nebria lacustris Casey sensu lato. PCA demonstrated that both body size and one conspicuous aspect of pronotal shape vary simultaneously with elevation, latitude, and longitude and served to distinguish populations from the southern Appalachian highlands, south of the French Broad, from all other populations. Molecular analyses revealed surprisingly low overall genetic diversity within N. lacustris sensu lato, with only 0.39% of 4605 bp varied in the concatenated dataset. Evaluation of patterns observed in morphological and genetic variation and distribution led to the following taxonomic conclusions: (1 Nebria lacustris Casey and Nebria bellorum Kavanaugh should be considered distinct species, which is a NEW STATUS for N. bellorum. (2 No other distinct taxonomic subunits could be distinguished with the evidence at hand, but samples from northeastern Iowa, in part of the region known as the “Driftless Zone”, have unique genetic markers for two genes that hint at descent from a local population surviving at least the last glacial advance. (3 No morphometric or molecular evidence supports taxonomic distinction between lowland populations on the shores of Lake Champlain and upland populations in the adjacent Green Mountains of Vermont, despite evident size and pronotal shape differences between many of their members.

  19. Multiple Improvements of Multiple Imputation Likelihood Ratio Tests

    OpenAIRE

    Chan, Kin Wai; Meng, Xiao-Li

    2017-01-01

    Multiple imputation (MI) inference handles missing data by first properly imputing the missing values $m$ times, and then combining the $m$ analysis results from applying a complete-data procedure to each of the completed datasets. However, the existing method for combining likelihood ratio tests has multiple defects: (i) the combined test statistic can be negative in practice when the reference null distribution is a standard $F$ distribution; (ii) it is not invariant to re-parametrization; ...

  20. Complete analysis of MAP/G/1/N queue with single (multiple vacation(s under limited service discipline

    Directory of Open Access Journals (Sweden)

    U. C. Gupta

    2005-01-01

    Full Text Available We consider a finite-buffer single-server queue with Markovian arrival process (MAP where the server serves a limited number of customers, and when the limit is reached it goes on vacation. Both single- and multiple-vacation policies are analyzed and the queue length distributions at various epochs, such as pre-arrival, arbitrary, departure, have been obtained. The effect of certain model parameters on some important performance measures, like probability of loss, mean queue lengths, mean waiting time, is discussed. The model can be applied in computer communication and networking, for example, performance analysis of token passing ring of LAN and SVC (switched virtual connection of ATM.

  1. Leveraging multiple datasets for deep leaf counting

    OpenAIRE

    Dobrescu, Andrei; Giuffrida, Mario Valerio; Tsaftaris, Sotirios A

    2017-01-01

    The number of leaves a plant has is one of the key traits (phenotypes) describing its development and growth. Here, we propose an automated, deep learning based approach for counting leaves in model rosette plants. While state-of-the-art results on leaf counting with deep learning methods have recently been reported, they obtain the count as a result of leaf segmentation and thus require per-leaf (instance) segmentation to train the models (a rather strong annotation). Instead, our method tre...

  2. Aaron Journal article datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — All figures used in the journal article are in netCDF format. This dataset is associated with the following publication: Sims, A., K. Alapaty , and S. Raman....

  3. Integrated Surface Dataset (Global)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Integrated Surface (ISD) Dataset (ISD) is composed of worldwide surface weather observations from over 35,000 stations, though the best spatial coverage is...

  4. Control Measure Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — The EPA Control Measure Dataset is a collection of documents describing air pollution control available to regulated facilities for the control and abatement of air...

  5. National Hydrography Dataset (NHD)

    Data.gov (United States)

    Kansas Data Access and Support Center — The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that comprise the...

  6. Market Squid Ecology Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains ecological information collected on the major adult spawning and juvenile habitats of market squid off California and the US Pacific Northwest....

  7. Tables and figure datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — Soil and air concentrations of asbestos in Sumas study. This dataset is associated with the following publication: Wroble, J., T. Frederick, A. Frame, and D....

  8. Towards interoperable and reproducible QSAR analyses: Exchange of datasets.

    Science.gov (United States)

    Spjuth, Ola; Willighagen, Egon L; Guha, Rajarshi; Eklund, Martin; Wikberg, Jarl Es

    2010-06-30

    QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but

  9. Towards interoperable and reproducible QSAR analyses: Exchange of datasets

    Directory of Open Access Journals (Sweden)

    Spjuth Ola

    2010-06-01

    Full Text Available Abstract Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join

  10. Design of an audio advertisement dataset

    Science.gov (United States)

    Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

    2015-12-01

    Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.

  11. Isfahan MISP Dataset.

    Science.gov (United States)

    Kashefpur, Masoud; Kafieh, Rahele; Jorjandi, Sahar; Golmohammadi, Hadis; Khodabande, Zahra; Abbasi, Mohammadreza; Teifuri, Nilufar; Fakharzadeh, Ali Akbar; Kashefpoor, Maryam; Rabbani, Hossein

    2017-01-01

    An online depository was introduced to share clinical ground truth with the public and provide open access for researchers to evaluate their computer-aided algorithms. PHP was used for web programming and MySQL for database managing. The website was entitled "biosigdata.com." It was a fast, secure, and easy-to-use online database for medical signals and images. Freely registered users could download the datasets and could also share their own supplementary materials while maintaining their privacies (citation and fee). Commenting was also available for all datasets, and automatic sitemap and semi-automatic SEO indexing have been set for the site. A comprehensive list of available websites for medical datasets is also presented as a Supplementary (http://journalonweb.com/tempaccess/4800.584.JMSS_55_16I3253.pdf).

  12. Mridangam stroke dataset

    OpenAIRE

    CompMusic

    2014-01-01

    The audio examples were recorded from a professional Carnatic percussionist in a semi-anechoic studio conditions by Akshay Anantapadmanabhan using SM-58 microphones and an H4n ZOOM recorder. The audio was sampled at 44.1 kHz and stored as 16 bit wav files. The dataset can be used for training models for each Mridangam stroke. /n/nA detailed description of the Mridangam and its strokes can be found in the paper below. A part of the dataset was used in the following paper. /nAkshay Anantapadman...

  13. The GTZAN dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2013-01-01

    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge...... of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN...

  14. Technical note: An inorganic water chemistry dataset (1972–2011 ...

    African Journals Online (AJOL)

    A national dataset of inorganic chemical data of surface waters (rivers, lakes, and dams) in South Africa is presented and made freely available. The dataset comprises more than 500 000 complete water analyses from 1972 up to 2011, collected from more than 2 000 sample monitoring stations in South Africa. The dataset ...

  15. Dataset - Adviesregel PPL 2010

    NARCIS (Netherlands)

    Evert, van F.K.; Schans, van der D.A.; Geel, van W.C.A.; Slabbekoorn, J.J.; Booij, R.; Jukema, J.N.; Meurs, E.J.J.; Uenk, D.

    2011-01-01

    This dataset contains experimental data from a number of field experiments with potato in The Netherlands (Van Evert et al., 2011). The data are presented as an SQL dump of a PostgreSQL database (version 8.4.4). An outline of the entity-relationship diagram of the database is given in an

  16. Development of a SPARK Training Dataset

    Energy Technology Data Exchange (ETDEWEB)

    Sayre, Amanda M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Olson, Jarrod R. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2015-03-01

    In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the

  17. Development of a SPARK Training Dataset

    International Nuclear Information System (INIS)

    Sayre, Amanda M.; Olson, Jarrod R.

    2015-01-01

    In its first five years, the National Nuclear Security Administration's (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK's intended analysis capability. The analysis demonstration sought to answer

  18. A global gridded dataset of daily precipitation going back to 1950, ideal for analysing precipitation extremes

    Science.gov (United States)

    Contractor, S.; Donat, M.; Alexander, L. V.

    2017-12-01

    Reliable observations of precipitation are necessary to determine past changes in precipitation and validate models, allowing for reliable future projections. Existing gauge based gridded datasets of daily precipitation and satellite based observations contain artefacts and have a short length of record, making them unsuitable to analyse precipitation extremes. The largest limiting factor for the gauge based datasets is a dense and reliable station network. Currently, there are two major data archives of global in situ daily rainfall data, first is Global Historical Station Network (GHCN-Daily) hosted by National Oceanic and Atmospheric Administration (NOAA) and the other by Global Precipitation Climatology Centre (GPCC) part of the Deutsche Wetterdienst (DWD). We combine the two data archives and use automated quality control techniques to create a reliable long term network of raw station data, which we then interpolate using block kriging to create a global gridded dataset of daily precipitation going back to 1950. We compare our interpolated dataset with existing global gridded data of daily precipitation: NOAA Climate Prediction Centre (CPC) Global V1.0 and GPCC Full Data Daily Version 1.0, as well as various regional datasets. We find that our raw station density is much higher than other datasets. To avoid artefacts due to station network variability, we provide multiple versions of our dataset based on various completeness criteria, as well as provide the standard deviation, kriging error and number of stations for each grid cell and timestep to encourage responsible use of our dataset. Despite our efforts to increase the raw data density, the in situ station network remains sparse in India after the 1960s and in Africa throughout the timespan of the dataset. Our dataset would allow for more reliable global analyses of rainfall including its extremes and pave the way for better global precipitation observations with lower and more transparent uncertainties.

  19. BiRD (Biaxin [clarithromycin]/Revlimid [lenalidomide]/dexamethasone) combination therapy results in high complete- and overall-response rates in treatment-naive symptomatic multiple myeloma.

    Science.gov (United States)

    Niesvizky, Ruben; Jayabalan, David S; Christos, Paul J; Furst, Jessica R; Naib, Tara; Ely, Scott; Jalbrzikowski, Jessica; Pearse, Roger N; Zafar, Faiza; Pekle, Karen; Larow, April; Lent, Richard; Mark, Tomer; Cho, Hearn J; Shore, Tsiporah; Tepler, Jeffrey; Harpel, John; Schuster, Michael W; Mathew, Susan; Leonard, John P; Mazumdar, Madhu; Chen-Kiang, Selina; Coleman, Morton

    2008-02-01

    This trial determined the safety and efficacy of the combination regimen clarithromycin (Biaxin), lenalidomide (Revlimid), and dexamethasone (BiRD) as first-line therapy for multiple myeloma. Patients received BiRD in 28-day cycles. Dexamethasone (40 mg) was given orally once weekly, clarithromycin (500 mg) was given orally twice daily, and lenalidomide (25 mg) was given orally daily on days 1 to 21. Objective response was defined by standard criteria (ie, decrease in serum monoclonal protein [M-protein] by at least 50%, and a decrease in urine M-protein by at least 90%). Of the 72 patients enrolled, 65 had an objective response (90.3%). A combined stringent and conventional complete response rate of 38.9% was achieved, and 73.6% of the patients achieved at least a 90% decrease in M-protein levels. This regimen did not interfere with hematopoietic stem-cell harvest. Fifty-two patients who did not go on to receive transplants received continued therapy (complete response, 37%; very good partial response, 33%). The major adverse events were thromboembolic events, corticosteroid-related morbidity, and cytopenias. BiRD is an effective regimen with manageable side effects in the treatment of symptomatic, newly diagnosed multiple myeloma. This trial was registered at www.clinicaltrials.gov as #NCT00151203.

  20. [A case of transverse colon cancer with multiple liver metastases and hepatic pedicle lymph node involvement showing pathological complete response by XELOX plus bevacizumab].

    Science.gov (United States)

    Mukai, Toshiki; Akiyoshi, Takashi; Koga, Rintaro; Arita, Junichi; Saiura, Akio; Ikeda, Atsushi; Nagasue, Yasutomo; Oikawa, Yoshinori; Yamakawa, Keiko; Konishi, Tsuyoshi; Fujimoto, Yoshiya; Nagayama, Satoshi; Fukunaga, Yosuke; Ueno, Masashi; Suenaga, Mitsukuni; Mizunuma, Nobuyuki; Shinozaki, Eiji; Yamamoto, Chiriko; Yamaguchi, Toshiharu

    2012-12-01

    A 70-year-old woman was referred to our hospital because of abdominal pain. Abdominal computed tomography(CT)and colonoscopy revealed transverse colon cancer with multiple liver metastases, with involvement of the hepatic pedicle and superior mesenteric artery lymph nodes. The patient received eight courses of XELOX plus bevacizumab, and CT showed a decrease in the size of the liver metastases and hepatic pedicle lymphadenopathy. Right hemicolectomy, partial hepatectomy, and hepatic pedicle lymph node resection were performed. Histopathological examination of the resected tissue revealed no residual cancer cells, suggesting a pathological complete response. The patient remains well 7 months after operation, without any signs of recurrence. Surgical resection should be considered for patients with initially unresectable colon cancer with liver metastases and hepatic pedicle lymph nodes involvement if systemic chemotherapy is effective.

  1. The utilization of an ultrasound-guided 8-gauge vacuum-assisted breast biopsy system as an innovative approach to accomplishing complete eradication of multiple bilateral breast fibroadenomas

    Directory of Open Access Journals (Sweden)

    Povoski Stephen P

    2007-10-01

    Full Text Available Abstract Background Ultrasound-guided vacuum-assisted breast biopsy technology is extremely useful for diagnostic biopsy of suspicious breast lesions and for attempted complete excision of appropriately selected presumed benign breast lesions. Case presentation A female patient presented with 16 breast lesions (eight within each breast, documented on ultrasound and all presumed to be fibroadenomas. Over a ten and one-half month period of time, 14 of these 16 breast lesions were removed under ultrasound guidance during a total of 11 separate 8-gauge Mammotome® excision procedures performed during seven separate sessions. Additionally, two of these 16 breast lesions were removed by open surgical excision. A histopathologic diagnosis of fibroadenoma and/or fibroadenomatous changes was confirmed at all lesion excision sites. Interval follow-up ultrasound imaging revealed no evidence of a residual lesion at the site of any of the 16 original breast lesions. Conclusion This report describes an innovative approach of utilizing ultrasound-guided 8-gauge vacuum-assisted breast biopsy technology for assisting in achieving complete eradication of multiple bilateral fibroadenomas in a patient who presented with 16 documented breast lesions. As such, this innovative approach is highly recommended in similar appropriately selected patients.

  2. Evaluation of accuracy of complete-arch multiple-unit abutment-level dental implant impressions using different impression and splinting materials.

    Science.gov (United States)

    Buzayan, Muaiyed; Baig, Mirza Rustum; Yunus, Norsiah

    2013-01-01

    This in vitro study evaluated the accuracy of multiple-unit dental implant casts obtained from splinted or nonsplinted direct impression techniques using various splinting materials by comparing the casts to the reference models. The effect of two different impression materials on the accuracy of the implant casts was also evaluated for abutment-level impressions. A reference model with six internal-connection implant replicas placed in the completely edentulous mandibular arch and connected to multi-base abutments was fabricated from heat-curing acrylic resin. Forty impressions of the reference model were made, 20 each with polyether (PE) and polyvinylsiloxane (PVS) impression materials using the open tray technique. The PE and PVS groups were further subdivided into four subgroups of five each on the bases of splinting type: no splinting, bite registration PE, bite registration addition silicone, or autopolymerizing acrylic resin. The positional accuracy of the implant replica heads was measured on the poured casts using a coordinate measuring machine to assess linear differences in interimplant distances in all three axes. The collected data (linear and three-dimensional [3D] displacement values) were compared with the measurements calculated on the reference resin model and analyzed with nonparametric tests (Kruskal-Wallis and Mann-Whitney). No significant differences were found between the various splinting groups for both PE and PVS impression materials in terms of linear and 3D distortions. However, small but significant differences were found between the two impression materials (PVS, 91 μm; PE, 103 μm) in terms of 3D discrepancies, irrespective of the splinting technique employed. Casts obtained from both impression materials exhibited differences from the reference model. The impression material influenced impression inaccuracy more than the splinting material for multiple-unit abutment-level impressions.

  3. Harvard Aging Brain Study: Dataset and accessibility.

    Science.gov (United States)

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G; Chatwal, Jasmeer P; Papp, Kathryn V; Amariglio, Rebecca E; Blacker, Deborah; Rentz, Dorene M; Johnson, Keith A; Sperling, Reisa A; Schultz, Aaron P

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging. To promote more extensive analyses, imaging data was designed to be compatible with other publicly available datasets. A cloud-based system enables access to interested researchers with blinded data available contingent upon completion of a data usage agreement and administrative approval. Data collection is ongoing and currently in its fifth year. Copyright © 2015 Elsevier Inc. All rights reserved.

  4. Multiple myeloma patients in long-term complete response after autologous stem cell transplantation express a particular immune signature with potential prognostic implication.

    Science.gov (United States)

    Arteche-López, A; Kreutzman, A; Alegre, A; Sanz Martín, P; Aguado, B; González-Pardo, M; Espiño, M; Villar, L M; García Belmonte, D; de la Cámara, R; Muñoz-Calleja, C

    2017-06-01

    The proportion of multiple myeloma patients in long-term complete response (LTCR-MM) for more than 6 years after autologous stem cell transplantation (ASCT) is small. To evaluate whether this LTCR is associated with a particular immune signature, peripheral blood samples from 13 LTCR-MM after ASCT and healthy blood donors (HBD) were analysed. Subpopulations of T-cells (naïve, effector, central memory and regulatory), B-cells (naïve, marginal zone-like, class-switched memory, transitional and plasmablasts) and NK-cells expressing inhibitory and activating receptors were quantified by multiparametric flow cytometry (MFC). Heavy/light chains (HLC) were quantified by nephelometry. The percentage of CD4 + T-cells was lower in patients, whereas an increment in the percentage of CD4 + and CD8 + effector memory T-cells was associated with the LTCR. Regulatory T-cells and NK-cells were similar in both groups but a particular redistribution of inhibitory and activating receptors in NK-cells were found in patients. Regarding B-cells, an increase in naïve cells and a corresponding reduction in marginal zone-like and class-switched memory B-cells was observed. The HLC values were normal. Our results suggest that LTCR-MM patients express a particular immune signature, which probably reflects a 'high quality' immune reconstitution that could exert a competent anti-tumor immunological surveillance along with a recovery of the humoral immunity.

  5. Human dental pulp-derived stem cells promote locomotor recovery after complete transection of the rat spinal cord by multiple neuro-regenerative mechanisms.

    Science.gov (United States)

    Sakai, Kiyoshi; Yamamoto, Akihito; Matsubara, Kohki; Nakamura, Shoko; Naruse, Mami; Yamagata, Mari; Sakamoto, Kazuma; Tauchi, Ryoji; Wakao, Norimitsu; Imagama, Shiro; Hibi, Hideharu; Kadomatsu, Kenji; Ishiguro, Naoki; Ueda, Minoru

    2012-01-01

    Spinal cord injury (SCI) often leads to persistent functional deficits due to loss of neurons and glia and to limited axonal regeneration after injury. Here we report that transplantation of human dental pulp stem cells into the completely transected adult rat spinal cord resulted in marked recovery of hind limb locomotor functions. Transplantation of human bone marrow stromal cells or skin-derived fibroblasts led to substantially less recovery of locomotor function. The human dental pulp stem cells exhibited three major neuroregenerative activities. First, they inhibited the SCI-induced apoptosis of neurons, astrocytes, and oligodendrocytes, which improved the preservation of neuronal filaments and myelin sheaths. Second, they promoted the regeneration of transected axons by directly inhibiting multiple axon growth inhibitors, including chondroitin sulfate proteoglycan and myelin-associated glycoprotein, via paracrine mechanisms. Last, they replaced lost cells by differentiating into mature oligodendrocytes under the extreme conditions of SCI. Our data demonstrate that tooth-derived stem cells may provide therapeutic benefits for treating SCI through both cell-autonomous and paracrine neuroregenerative activities.

  6. National Elevation Dataset

    Science.gov (United States)

    ,

    2002-01-01

    The National Elevation Dataset (NED) is a new raster product assembled by the U.S. Geological Survey. NED is designed to provide National elevation data in a seamless form with a consistent datum, elevation unit, and projection. Data corrections were made in the NED assembly process to minimize artifacts, perform edge matching, and fill sliver areas of missing data. NED has a resolution of one arc-second (approximately 30 meters) for the conterminous United States, Hawaii, Puerto Rico and the island territories and a resolution of two arc-seconds for Alaska. NED data sources have a variety of elevation units, horizontal datums, and map projections. In the NED assembly process the elevation values are converted to decimal meters as a consistent unit of measure, NAD83 is consistently used as horizontal datum, and all the data are recast in a geographic projection. Older DEM's produced by methods that are now obsolete have been filtered during the NED assembly process to minimize artifacts that are commonly found in data produced by these methods. Artifact removal greatly improves the quality of the slope, shaded-relief, and synthetic drainage information that can be derived from the elevation data. Figure 2 illustrates the results of this artifact removal filtering. NED processing also includes steps to adjust values where adjacent DEM's do not match well, and to fill sliver areas of missing data between DEM's. These processing steps ensure that NED has no void areas and artificial discontinuities have been minimized. The artifact removal filtering process does not eliminate all of the artifacts. In areas where the only available DEM is produced by older methods, then "striping" may still occur.

  7. Multiple pregnancies with complete mole and coexisting normal fetus in North and South America: A retrospective multicenter cohort and literature review.

    Science.gov (United States)

    Lin, Lawrence H; Maestá, Izildinha; Braga, Antonio; Sun, Sue Y; Fushida, Koji; Francisco, Rossana P V; Elias, Kevin M; Horowitz, Neil; Goldstein, Donald P; Berkowitz, Ross S

    2017-04-01

    To determine the clinical characteristics of multiple gestation with complete mole and coexisting fetus (CHMCF) in North and South America. Retrospective non-concurrent cohorts compromised of CHMCF from New England Trophoblastic Disease Center (NETDC) (1966-2015) and four Brazilian Trophoblastic Disease Centers (BTDC) (1990-2015). From a total of 12,455 cases of gestational trophoblastic disease seen, 72 CHMCF were identified. Clinical characteristics were similar between BTDC (n=46) and NETDC (n=13) from 1990 to 2015, apart from a much higher frequency of potentially life-threatening conditions in Brazil (p=0.046). There were no significant changes in the clinical presentation or outcomes over the past 5 decades in NETDC (13 cases in 1966-1989 vs 13 cases in 1990-2015). Ten pregnancies were electively terminated and 35 cases resulted in viable live births (60% of 60 continued pregnancies). The overall rate of gestational trophoblastic neoplasia (GTN) was 46%; the cases which progressed to GTN presented with higher chorionic gonadotropin levels (p=0.026) and higher frequency of termination of pregnancy due to medical complications (p=0.006) when compared to those with spontaneous remission. The main regional difference in CHMCF presentation is related to a higher rate of potentially life-threatening conditions in South America. Sixty percent of the expectantly managed CHMCF delivered a viable infant, and the overall rate of GTN in this study was 46%. Elective termination of pregnancy did not influence the risk for GTN; however the need for termination due to complications and higher hCG levels were associated with development of GTN in CHMCF. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. NP-PAH Interaction Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  9. Geostatistics for Large Datasets

    KAUST Repository

    Sun, Ying

    2011-10-31

    Each chapter should be preceded by an abstract (10–15 lines long) that summarizes the content. The abstract will appear onlineat www.SpringerLink.com and be available with unrestricted access. This allows unregistered users to read the abstract as a teaser for the complete chapter. As a general rule the abstracts will not appear in the printed version of your book unless it is the style of your particular book or that of the series to which your book belongs. Please use the ’starred’ version of the new Springer abstractcommand for typesetting the text of the online abstracts (cf. source file of this chapter template abstract) and include them with the source files of your manuscript. Use the plain abstractcommand if the abstract is also to appear in the printed version of the book.

  10. Geostatistics for Large Datasets

    KAUST Repository

    Sun, Ying; Li, Bo; Genton, Marc G.

    2011-01-01

    Each chapter should be preceded by an abstract (10–15 lines long) that summarizes the content. The abstract will appear onlineat www.SpringerLink.com and be available with unrestricted access. This allows unregistered users to read the abstract as a teaser for the complete chapter. As a general rule the abstracts will not appear in the printed version of your book unless it is the style of your particular book or that of the series to which your book belongs. Please use the ’starred’ version of the new Springer abstractcommand for typesetting the text of the online abstracts (cf. source file of this chapter template abstract) and include them with the source files of your manuscript. Use the plain abstractcommand if the abstract is also to appear in the printed version of the book.

  11. Editorial: Datasets for Learning Analytics

    NARCIS (Netherlands)

    Dietze, Stefan; George, Siemens; Davide, Taibi; Drachsler, Hendrik

    2018-01-01

    The European LinkedUp and LACE (Learning Analytics Community Exchange) project have been responsible for setting up a series of data challenges at the LAK conferences 2013 and 2014 around the LAK dataset. The LAK datasets consists of a rich collection of full text publications in the domain of

  12. Open University Learning Analytics dataset.

    Science.gov (United States)

    Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

    2017-11-28

    Learning Analytics focuses on the collection and analysis of learners' data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students' interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license.

  13. Analyzing the Impacts of Alternated Number of Iterations in Multiple Imputation Method on Explanatory Factor Analysis

    Directory of Open Access Journals (Sweden)

    Duygu KOÇAK

    2017-11-01

    Full Text Available The study aims to identify the effects of iteration numbers used in multiple iteration method, one of the methods used to cope with missing values, on the results of factor analysis. With this aim, artificial datasets of different sample sizes were created. Missing values at random and missing values at complete random were created in various ratios by deleting data. For the data in random missing values, a second variable was iterated at ordinal scale level and datasets with different ratios of missing values were obtained based on the levels of this variable. The data were generated using “psych” program in R software, while “dplyr” program was used to create codes that would delete values according to predetermined conditions of missing value mechanism. Different datasets were generated by applying different iteration numbers. Explanatory factor analysis was conducted on the datasets completed and the factors and total explained variances are presented. These values were first evaluated based on the number of factors and total variance explained of the complete datasets. The results indicate that multiple iteration method yields a better performance in cases of missing values at random compared to datasets with missing values at complete random. Also, it was found that increasing the number of iterations in both missing value datasets decreases the difference in the results obtained from complete datasets.

  14. PHYSICS PERFORMANCE & DATASET (PPD)

    CERN Multimedia

    L. Silvestris, P. Azzi, C. Cerminara, M. Pierini with contributions from R. Castello, M. De Mattia, S. Di Guida, A. Pfeiffer, F. De Guio, D. Duggan, M. Ojeda-Sandonis, M. Rovere, G. Boudoul, G. Franzoni, P. Srimanobhas, J.-R. Vlimant. Edited by K. Aspola.

    2013-01-01

      Despite the LHC shutdown, the past six months of 2013 have been extremely intense for our coordination area. All the PPD teams are engaged on three major fronts: the exploitation of the 2011 and 2012 data, the preparation of the post-LS1 data taking and the support to the studies for the upgrade of the detector. Alignment and Calibration and Database (AlCaDB) Work in the AlCaDB project followed the planning and moved into a consolidation phase. On the AlCa side, efforts mainly concentrated on providing and validating new calibration and alignment conditions as needed for the re-processing campaigns of 2011 and 2012 data and simulation of multiple upgrade scenarios. Also, work on improvements on the Global Tag Collector tool to manage these are ongoing. On the DB side, the major redesign of the core conditions software, as discussed in various meetings in the last months of 2012, is being finalised according to schedule. We plan to change over to the new DB structure in early 2014, well before t...

  15. Robust computational analysis of rRNA hypervariable tag datasets.

    Directory of Open Access Journals (Sweden)

    Maksim Sipos

    Full Text Available Next-generation DNA sequencing is increasingly being utilized to probe microbial communities, such as gastrointestinal microbiomes, where it is important to be able to quantify measures of abundance and diversity. The fragmented nature of the 16S rRNA datasets obtained, coupled with their unprecedented size, has led to the recognition that the results of such analyses are potentially contaminated by a variety of artifacts, both experimental and computational. Here we quantify how multiple alignment and clustering errors contribute to overestimates of abundance and diversity, reflected by incorrect OTU assignment, corrupted phylogenies, inaccurate species diversity estimators, and rank abundance distribution functions. We show that straightforward procedural optimizations, combining preexisting tools, are effective in handling large (10(5-10(6 16S rRNA datasets, and we describe metrics to measure the effectiveness and quality of the estimators obtained. We introduce two metrics to ascertain the quality of clustering of pyrosequenced rRNA data, and show that complete linkage clustering greatly outperforms other widely used methods.

  16. Turkey Run Landfill Emissions Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — landfill emissions measurements for the Turkey run landfill in Georgia. This dataset is associated with the following publication: De la Cruz, F., R. Green, G....

  17. Dataset of NRDA emission data

    Data.gov (United States)

    U.S. Environmental Protection Agency — Emissions data from open air oil burns. This dataset is associated with the following publication: Gullett, B., J. Aurell, A. Holder, B. Mitchell, D. Greenwell, M....

  18. Chemical product and function dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Merged product weight fraction and chemical function data. This dataset is associated with the following publication: Isaacs , K., M. Goldsmith, P. Egeghy , K....

  19. How meaningful are risk determinations in the absence of a complete dataset? Making the case for publishing standardized test guideline and ‘no effect’ studies for evaluating the safety of nanoparticulates versus spurious ‘high effect’ results from single investigative studies

    Science.gov (United States)

    Warheit, David B.; Donner, E. Maria

    2015-06-01

    no uptake of nanoscale or pigment-grade TiO2 particles following oral exposures. We conclude that to develop a competent risk assessment profile, results derived from standardized, guideline-type studies, and even ‘no effect’ study findings provide critically useful input for assessing safe levels of exposure; and should, in principle, be readily acceptable for publication in peer-reviewed toxicology journals. This is a necessary prerequisite for developing a complete dataset for risk assessment determinations.

  20. How meaningful are risk determinations in the absence of a complete dataset? Making the case for publishing standardized test guideline and ‘no effect’ studies for evaluating the safety of nanoparticulates versus spurious ‘high effect’ results from single investigative studies

    International Nuclear Information System (INIS)

    Warheit, David B; Donner, E Maria

    2015-01-01

    little or no uptake of nanoscale or pigment-grade TiO 2 particles following oral exposures. We conclude that to develop a competent risk assessment profile, results derived from standardized, guideline-type studies, and even ‘no effect’ study findings provide critically useful input for assessing safe levels of exposure; and should, in principle, be readily acceptable for publication in peer-reviewed toxicology journals. This is a necessary prerequisite for developing a complete dataset for risk assessment determinations. (focus issue paper)

  1. The NOAA Dataset Identifier Project

    Science.gov (United States)

    de la Beaujardiere, J.; Mccullough, H.; Casey, K. S.

    2013-12-01

    The US National Oceanic and Atmospheric Administration (NOAA) initiated a project in 2013 to assign persistent identifiers to datasets archived at NOAA and to create informational landing pages about those datasets. The goals of this project are to enable the citation of datasets used in products and results in order to help provide credit to data producers, to support traceability and reproducibility, and to enable tracking of data usage and impact. A secondary goal is to encourage the submission of datasets for long-term preservation, because only archived datasets will be eligible for a NOAA-issued identifier. A team was formed with representatives from the National Geophysical, Oceanographic, and Climatic Data Centers (NGDC, NODC, NCDC) to resolve questions including which identifier scheme to use (answer: Digital Object Identifier - DOI), whether or not to embed semantics in identifiers (no), the level of granularity at which to assign identifiers (as coarsely as reasonable), how to handle ongoing time-series data (do not break into chunks), creation mechanism for the landing page (stylesheet from formal metadata record preferred), and others. Decisions made and implementation experience gained will inform the writing of a Data Citation Procedural Directive to be issued by the Environmental Data Management Committee in 2014. Several identifiers have been issued as of July 2013, with more on the way. NOAA is now reporting the number as a metric to federal Open Government initiatives. This paper will provide further details and status of the project.

  2. The Harvard organic photovoltaic dataset.

    Science.gov (United States)

    Lopez, Steven A; Pyzer-Knapp, Edward O; Simm, Gregor N; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-09-27

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

  3. The Harvard organic photovoltaic dataset

    Science.gov (United States)

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-01-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312

  4. Querying Large Biological Network Datasets

    Science.gov (United States)

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  5. Fluxnet Synthesis Dataset Collaboration Infrastructure

    Energy Technology Data Exchange (ETDEWEB)

    Agarwal, Deborah A. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Humphrey, Marty [Univ. of Virginia, Charlottesville, VA (United States); van Ingen, Catharine [Microsoft. San Francisco, CA (United States); Beekwilder, Norm [Univ. of Virginia, Charlottesville, VA (United States); Goode, Monte [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jackson, Keith [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Rodriguez, Matt [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Weber, Robin [Univ. of California, Berkeley, CA (United States)

    2008-02-06

    The Fluxnet synthesis dataset originally compiled for the La Thuile workshop contained approximately 600 site years. Since the workshop, several additional site years have been added and the dataset now contains over 920 site years from over 240 sites. A data refresh update is expected to increase those numbers in the next few months. The ancillary data describing the sites continues to evolve as well. There are on the order of 120 site contacts and 60proposals have been approved to use thedata. These proposals involve around 120 researchers. The size and complexity of the dataset and collaboration has led to a new approach to providing access to the data and collaboration support and the support team attended the workshop and worked closely with the attendees and the Fluxnet project office to define the requirements for the support infrastructure. As a result of this effort, a new website (http://www.fluxdata.org) has been created to provide access to the Fluxnet synthesis dataset. This new web site is based on a scientific data server which enables browsing of the data on-line, data download, and version tracking. We leverage database and data analysis tools such as OLAP data cubes and web reports to enable browser and Excel pivot table access to the data.

  6. Hydraulic Fracture Induced Seismicity During A Multi-Stage Pad Completion in Western Canada: Evidence of Activation of Multiple, Parallel Faults

    Science.gov (United States)

    Maxwell, S.; Garrett, D.; Huang, J.; Usher, P.; Mamer, P.

    2017-12-01

    Following reports of injection induced seismicity in the Western Canadian Sedimentary Basin, regulators have imposed seismic monitoring and traffic light protocols for fracturing operations in specific areas. Here we describe a case study in one of these reservoirs, the Montney Shale in NE British Columbia, where induced seismicity was monitored with a local array during multi-stage hydraulic fracture stimulations on several wells from a single drilling pad. Seismicity primarily occurred during the injection time periods, and correlated with periods of high injection rates and wellhead pressures above fracturing pressures. Sequential hydraulic fracture stages were found to progressively activate several parallel, critically-stressed faults, as illuminated by multiple linear hypocenter patterns in the range between Mw 1 and 3. Moment tensor inversion of larger events indicated a double-couple mechanism consistent with the regional strike-slip stress state and the hypocenter lineations. The critically-stressed faults obliquely cross the well paths which were purposely drilled parallel to the minimum principal stress direction. Seismicity on specific faults started and stopped when fracture initiation points of individual injection stages were proximal to the intersection of the fault and well. The distance ranges when the seismicity occurs is consistent with expected hydraulic fracture dimensions, suggesting that the induced fault slip only occurs when a hydraulic fracture grows directly into the fault and the faults are temporarily exposed to significantly elevated fracture pressures during the injection. Some faults crossed multiple wells and the seismicity was found to restart during injection of proximal stages on adjacent wells, progressively expanding the seismogenic zone of the fault. Progressive fault slip is therefore inferred from the seismicity migrating further along the faults during successive injection stages. An accelerometer was also deployed close

  7. Feasibility of Optimizing Recovery and Reserves from a Mature and Geological Complex Multiple Turbidite Offshore Calif. Reservoir through the Drilling and Completion of a Trilateral Horizontal Well

    International Nuclear Information System (INIS)

    Coombs, Steven F.

    1999-01-01

    The main objective of this project is to devise an effective redevelopment strategy to combat producibility problems related to the Repetto turbidite sequences of the Carpinteria Field. The lack of adequate reservoir characterization, high-water cut production, and scaling problems have in the past contributed to the field's low productivity. To improve productivity and enhance recoverable reserves, the following specific goals are proposed: (1) Develop an integrated database of all existing data from work done by the former ownership group. (2) Expand reservoir drainage and reduce sand problems through horizontal well drilling and completion. (3) Operate and validate reservoirs' conceptual model by incorporating new data from the proposed trilateral well. (4) Transfer methodologies employed in geologic modeling and drilling multilateral wells to other operators with similar reservoirs

  8. CERC Dataset (Full Hadza Data)

    DEFF Research Database (Denmark)

    2016-01-01

    The dataset includes demographic, behavioral, and religiosity data from eight different populations from around the world. The samples were drawn from: (1) Coastal and (2) Inland Tanna, Vanuatu; (3) Hadzaland, Tanzania; (4) Lovu, Fiji; (5) Pointe aux Piment, Mauritius; (6) Pesqueiro, Brazil; (7......) Kyzyl, Tyva Republic; and (8) Yasawa, Fiji. Related publication: Purzycki, et al. (2016). Moralistic Gods, Supernatural Punishment and the Expansion of Human Sociality. Nature, 530(7590): 327-330....

  9. Viking Seismometer PDS Archive Dataset

    Science.gov (United States)

    Lorenz, R. D.

    2016-12-01

    The Viking Lander 2 seismometer operated successfully for over 500 Sols on the Martian surface, recording at least one likely candidate Marsquake. The Viking mission, in an era when data handling hardware (both on board and on the ground) was limited in capability, predated modern planetary data archiving, and ad-hoc repositories of the data, and the very low-level record at NSSDC, were neither convenient to process nor well-known. In an effort supported by the NASA Mars Data Analysis Program, we have converted the bulk of the Viking dataset (namely the 49,000 and 270,000 records made in High- and Event- modes at 20 and 1 Hz respectively) into a simple ASCII table format. Additionally, since wind-generated lander motion is a major component of the signal, contemporaneous meteorological data are included in summary records to facilitate correlation. These datasets are being archived at the PDS Geosciences Node. In addition to brief instrument and dataset descriptions, the archive includes code snippets in the freely-available language 'R' to demonstrate plotting and analysis. Further, we present examples of lander-generated noise, associated with the sampler arm, instrument dumps and other mechanical operations.

  10. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The first part of the Long Shutdown period has been dedicated to the preparation of the samples for the analysis targeting the summer conferences. In particular, the 8 TeV data acquired in 2012, including most of the “parked datasets”, have been reconstructed profiting from improved alignment and calibration conditions for all the sub-detectors. A careful planning of the resources was essential in order to deliver the datasets well in time to the analysts, and to schedule the update of all the conditions and calibrations needed at the analysis level. The newly reprocessed data have undergone detailed scrutiny by the Dataset Certification team allowing to recover some of the data for analysis usage and further improving the certification efficiency, which is now at 91% of the recorded luminosity. With the aim of delivering a consistent dataset for 2011 and 2012, both in terms of conditions and release (53X), the PPD team is now working to set up a data re-reconstruction and a new MC pro...

  11. RARD: The Related-Article Recommendation Dataset

    OpenAIRE

    Beel, Joeran; Carevic, Zeljko; Schaible, Johann; Neusch, Gabor

    2017-01-01

    Recommender-system datasets are used for recommender-system evaluations, training machine-learning algorithms, and exploring user behavior. While there are many datasets for recommender systems in the domains of movies, books, and music, there are rather few datasets from research-paper recommender systems. In this paper, we introduce RARD, the Related-Article Recommendation Dataset, from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains ...

  12. Background qualitative analysis of the European reference life cycle database (ELCD) energy datasets - part II: electricity datasets.

    Science.gov (United States)

    Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

    2015-01-01

    The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.

  13. Quality Controlling CMIP datasets at GFDL

    Science.gov (United States)

    Horowitz, L. W.; Radhakrishnan, A.; Balaji, V.; Adcroft, A.; Krasting, J. P.; Nikonov, S.; Mason, E. E.; Schweitzer, R.; Nadeau, D.

    2017-12-01

    As GFDL makes the switch from model development to production in light of the Climate Model Intercomparison Project (CMIP), GFDL's efforts are shifted to testing and more importantly establishing guidelines and protocols for Quality Controlling and semi-automated data publishing. Every CMIP cycle introduces key challenges and the upcoming CMIP6 is no exception. The new CMIP experimental design comprises of multiple MIPs facilitating research in different focus areas. This paradigm has implications not only for the groups that develop the models and conduct the runs, but also for the groups that monitor, analyze and quality control the datasets before data publishing, before their knowledge makes its way into reports like the IPCC (Intergovernmental Panel on Climate Change) Assessment Reports. In this talk, we discuss some of the paths taken at GFDL to quality control the CMIP-ready datasets including: Jupyter notebooks, PrePARE, LAMP (Linux, Apache, MySQL, PHP/Python/Perl): technology-driven tracker system to monitor the status of experiments qualitatively and quantitatively, provide additional metadata and analysis services along with some in-built controlled-vocabulary validations in the workflow. In addition to this, we also discuss the integration of community-based model evaluation software (ESMValTool, PCMDI Metrics Package, and ILAMB) as part of our CMIP6 workflow.

  14. Passive Containment DataSet

    Science.gov (United States)

    This data is for Figures 6 and 7 in the journal article. The data also includes the two EPANET input files used for the analysis described in the paper, one for the looped system and one for the block system.This dataset is associated with the following publication:Grayman, W., R. Murray , and D. Savic. Redesign of Water Distribution Systems for Passive Containment of Contamination. JOURNAL OF THE AMERICAN WATER WORKS ASSOCIATION. American Water Works Association, Denver, CO, USA, 108(7): 381-391, (2016).

  15. The CMS dataset bookkeeping service

    Science.gov (United States)

    Afaq, A.; Dolgert, A.; Guo, Y.; Jones, C.; Kosyakov, S.; Kuznetsov, V.; Lueking, L.; Riley, D.; Sekhri, V.

    2008-07-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  16. The CMS dataset bookkeeping service

    Energy Technology Data Exchange (ETDEWEB)

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V [Fermilab, Batavia, Illinois 60510 (United States); Dolgert, A; Jones, C; Kuznetsov, V; Riley, D [Cornell University, Ithaca, New York 14850 (United States)

    2008-07-15

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  17. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V; Dolgert, A; Jones, C; Kuznetsov, V; Riley, D

    2008-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  18. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, Anzar; Dolgert, Andrew; Guo, Yuyi; Jones, Chris; Kosyakov, Sergey; Kuznetsov, Valentin; Lueking, Lee; Riley, Dan; Sekhri, Vijay

    2007-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  19. Fusing Multiple Satellite Datasets Toward Defining and Understanding Organized Convection

    Science.gov (United States)

    Elsaesser, G.; Del Genio, A. D.

    2017-12-01

    How do we differentiate unorganized from organized convection? We might think of organized convection as being long lasting (at least longer than the lifetime of any individual cumulus cell), clustered at larger spatial scales (>100 km), and responsible for substantial rainfall accumulation. Organized convection is sustained on such scales due to the arrangement of moist/dry and buoyant/non-buoyant mesoscale circulations. The nature of these circulations is tied to system diabatic heating profiles; in particular, the 2nd baroclinic (top-heavy), stratiform heating mode is thought to be important for organized convection maintenance/propagation. We investigate the extent to which these characteristics are jointly found in propagating convective systems. Lifecycle information comes from hi-res IR data. Diabatic heating profiles, convective fractions and rainfall are provided by GPM retrievals mapped to convective system tracks. Moisture is provided by AIRS/AMSU and passive microwave retrievals. Instead of compositing heating profile information along a system track, where information is smoothed out, we sort system heating profile structures according to their "top heaviness" and then analyze PDFs of system rainfall, system sizes, durations, convective/stratiform ratios, etc. as a function of diabatic heating structure. Perhaps contrary to expectation, we find only small differences in PDFs of rainfall rates, system sizes, and system duration for different heating profile structures. If organization is defined according to heating structures, then one possible interpretation of these results is that organization is independent of system size, duration, and many times, even lifecycle stage. Is it possible that most systems "hobble" along and exhibit varying degrees of organization, dependent on local environment moisture/buoyancy variations, unlike the archetypical MCS paradigm? This presentation will also discuss the questions posed above within the context of parameterizing organized convection in the GISS GCM. GCMs must make/sustain the right heating profile at the right time, which requires observations-based understanding of such distinctions. Such knowledge is important for simulating and understanding the deep convective contribution to cloud feedback in a changing climate.

  20. 2008 TIGER/Line Nationwide Dataset

    Data.gov (United States)

    California Natural Resource Agency — This dataset contains a nationwide build of the 2008 TIGER/Line datasets from the US Census Bureau downloaded in April 2009. The TIGER/Line Shapefiles are an extract...

  1. Satellite-Based Precipitation Datasets

    Science.gov (United States)

    Munchak, S. J.; Huffman, G. J.

    2017-12-01

    Of the possible sources of precipitation data, those based on satellites provide the greatest spatial coverage. There is a wide selection of datasets, algorithms, and versions from which to choose, which can be confusing to non-specialists wishing to use the data. The International Precipitation Working Group (IPWG) maintains tables of the major publicly available, long-term, quasi-global precipitation data sets (http://www.isac.cnr.it/ ipwg/data/datasets.html), and this talk briefly reviews the various categories. As examples, NASA provides two sets of quasi-global precipitation data sets: the older Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA) and current Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (GPM) mission (IMERG). Both provide near-real-time and post-real-time products that are uniformly gridded in space and time. The TMPA products are 3-hourly 0.25°x0.25° on the latitude band 50°N-S for about 16 years, while the IMERG products are half-hourly 0.1°x0.1° on 60°N-S for over 3 years (with plans to go to 16+ years in Spring 2018). In addition to the precipitation estimates, each data set provides fields of other variables, such as the satellite sensor providing estimates and estimated random error. The discussion concludes with advice about determining suitability for use, the necessity of being clear about product names and versions, and the need for continued support for satellite- and surface-based observation.

  2. Estimating parameters for probabilistic linkage of privacy-preserved datasets.

    Science.gov (United States)

    Brown, Adrian P; Randall, Sean M; Ferrante, Anna M; Semmens, James B; Boyd, James H

    2017-07-10

    than the F-measure using calculated probabilities. Further, the threshold estimation yielded results for F-measure that were only slightly below the highest possible for those probabilities. The method appears highly accurate across a spectrum of datasets with varying degrees of error. As there are few alternatives for parameter estimation, the approach is a major step towards providing a complete operational approach for probabilistic linkage of privacy-preserved datasets.

  3. A Research Graph dataset for connecting research data repositories using RD-Switchboard.

    Science.gov (United States)

    Aryani, Amir; Poblet, Marta; Unsworth, Kathryn; Wang, Jingbo; Evans, Ben; Devaraju, Anusuriya; Hausstein, Brigitte; Klas, Claus-Peter; Zapilko, Benjamin; Kaplun, Samuele

    2018-05-29

    This paper describes the open access graph dataset that shows the connections between Dryad, CERN, ANDS and other international data repositories to publications and grants across multiple research data infrastructures. The graph dataset was created using the Research Graph data model and the Research Data Switchboard (RD-Switchboard), a collaborative project by the Research Data Alliance DDRI Working Group (DDRI WG) with the aim to discover and connect the related research datasets based on publication co-authorship or jointly funded grants. The graph dataset allows researchers to trace and follow the paths to understanding a body of work. By mapping the links between research datasets and related resources, the graph dataset improves both their discovery and visibility, while avoiding duplicate efforts in data creation. Ultimately, the linked datasets may spur novel ideas, facilitate reproducibility and re-use in new applications, stimulate combinatorial creativity, and foster collaborations across institutions.

  4. A new bed elevation dataset for Greenland

    Directory of Open Access Journals (Sweden)

    J. L. Bamber

    2013-03-01

    Full Text Available We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2012. Around 420 000 line kilometres of airborne data were used, with roughly 70% of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non-glaciated terrain to produce a consistent bed digital elevation model (DEM over the entire island including across the glaciated–ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice thickness was determined where an ice shelf exists from a combination of surface elevation and radar soundings. The across-track spacing between flight lines warranted interpolation at 1 km postings for significant sectors of the ice sheet. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±10 m to about ±300 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new datasets, particularly along the ice sheet margin, where ice velocity is highest and changes in ice dynamics most marked. We estimate that the volume of ice included in our land-ice mask would raise mean sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  5. Internationally coordinated glacier monitoring: strategy and datasets

    Science.gov (United States)

    Hoelzle, Martin; Armstrong, Richard; Fetterer, Florence; Gärtner-Roer, Isabelle; Haeberli, Wilfried; Kääb, Andreas; Kargel, Jeff; Nussbaumer, Samuel; Paul, Frank; Raup, Bruce; Zemp, Michael

    2014-05-01

    (c) the Randolph Glacier Inventory (RGI), a new and globally complete digital dataset of outlines from about 180,000 glaciers with some meta-information, which has been used for many applications relating to the IPCC AR5 report. Concerning glacier changes, a database (Fluctuations of Glaciers) exists containing information about mass balance, front variations including past reconstructed time series, geodetic changes and special events. Annual mass balance reporting contains information for about 125 glaciers with a subset of 37 glaciers with continuous observational series since 1980 or earlier. Front variation observations of around 1800 glaciers are available from most of the mountain ranges world-wide. This database was recently updated with 26 glaciers having an unprecedented dataset of length changes from from reconstructions of well-dated historical evidence going back as far as the 16th century. Geodetic observations of about 430 glaciers are available. The database is completed by a dataset containing information on special events including glacier surges, glacier lake outbursts, ice avalanches, eruptions of ice-clad volcanoes, etc. related to about 200 glaciers. A special database of glacier photographs contains 13,000 pictures from around 500 glaciers, some of them dating back to the 19th century. A key challenge is to combine and extend the traditional observations with fast evolving datasets from new technologies.

  6. Interactive visualization and analysis of multimodal datasets for surgical applications.

    Science.gov (United States)

    Kirmizibayrak, Can; Yim, Yeny; Wakid, Mike; Hahn, James

    2012-12-01

    Surgeons use information from multiple sources when making surgical decisions. These include volumetric datasets (such as CT, PET, MRI, and their variants), 2D datasets (such as endoscopic videos), and vector-valued datasets (such as computer simulations). Presenting all the information to the user in an effective manner is a challenging problem. In this paper, we present a visualization approach that displays the information from various sources in a single coherent view. The system allows the user to explore and manipulate volumetric datasets, display analysis of dataset values in local regions, combine 2D and 3D imaging modalities and display results of vector-based computer simulations. Several interaction methods are discussed: in addition to traditional interfaces including mouse and trackers, gesture-based natural interaction methods are shown to control these visualizations with real-time performance. An example of a medical application (medialization laryngoplasty) is presented to demonstrate how the combination of different modalities can be used in a surgical setting with our approach.

  7. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2012-01-01

      Introduction The first part of the year presented an important test for the new Physics Performance and Dataset (PPD) group (cf. its mandate: http://cern.ch/go/8f77). The activity was focused on the validation of the new releases meant for the Monte Carlo (MC) production and the data-processing in 2012 (CMSSW 50X and 52X), and on the preparation of the 2012 operations. In view of the Chamonix meeting, the PPD and physics groups worked to understand the impact of the higher pile-up scenario on some of the flagship Higgs analyses to better quantify the impact of the high luminosity on the CMS physics potential. A task force is working on the optimisation of the reconstruction algorithms and on the code to cope with the performance requirements imposed by the higher event occupancy as foreseen for 2012. Concerning the preparation for the analysis of the new data, a new MC production has been prepared. The new samples, simulated at 8 TeV, are already being produced and the digitisation and recons...

  8. Pattern Analysis On Banking Dataset

    Directory of Open Access Journals (Sweden)

    Amritpal Singh

    2015-06-01

    Full Text Available Abstract Everyday refinement and development of technology has led to an increase in the competition between the Tech companies and their going out of way to crack the system andbreak down. Thus providing Data mining a strategically and security-wise important area for many business organizations including banking sector. It allows the analyzes of important information in the data warehouse and assists the banks to look for obscure patterns in a group and discover unknown relationship in the data.Banking systems needs to process ample amount of data on daily basis related to customer information their credit card details limit and collateral details transaction details risk profiles Anti Money Laundering related information trade finance data. Thousands of decisionsbased on the related data are taken in a bank daily. This paper analyzes the banking dataset in the weka environment for the detection of interesting patterns based on its applications ofcustomer acquisition customer retention management and marketing and management of risk fraudulence detections.

  9. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The PPD activities, in the first part of 2013, have been focused mostly on the final physics validation and preparation for the data reprocessing of the full 8 TeV datasets with the latest calibrations. These samples will be the basis for the preliminary results for summer 2013 but most importantly for the final publications on the 8 TeV Run 1 data. The reprocessing involves also the reconstruction of a significant fraction of “parked data” that will allow CMS to perform a whole new set of precision analyses and searches. In this way the CMSSW release 53X is becoming the legacy release for the 8 TeV Run 1 data. The regular operation activities have included taking care of the prolonged proton-proton data taking and the run with proton-lead collisions that ended in February. The DQM and Data Certification team has deployed a continuous effort to promptly certify the quality of the data. The luminosity-weighted certification efficiency (requiring all sub-detectors to be certified as usab...

  10. The LANDFIRE Refresh strategy: updating the national dataset

    Science.gov (United States)

    Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit E.; Martin, Charley

    2013-01-01

    The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.

  11. On sample size and different interpretations of snow stability datasets

    Science.gov (United States)

    Schirmer, M.; Mitterer, C.; Schweizer, J.

    2009-04-01

    Interpretations of snow stability variations need an assessment of the stability itself, independent of the scale investigated in the study. Studies on stability variations at a regional scale have often chosen stability tests such as the Rutschblock test or combinations of various tests in order to detect differences in aspect and elevation. The question arose: ‘how capable are such stability interpretations in drawing conclusions'. There are at least three possible errors sources: (i) the variance of the stability test itself; (ii) the stability variance at an underlying slope scale, and (iii) that the stability interpretation might not be directly related to the probability of skier triggering. Various stability interpretations have been proposed in the past that provide partly different results. We compared a subjective one based on expert knowledge with a more objective one based on a measure derived from comparing skier-triggered slopes vs. slopes that have been skied but not triggered. In this study, the uncertainties are discussed and their effects on regional scale stability variations will be quantified in a pragmatic way. An existing dataset with very large sample sizes was revisited. This dataset contained the variance of stability at a regional scale for several situations. The stability in this dataset was determined using the subjective interpretation scheme based on expert knowledge. The question to be answered was how many measurements were needed to obtain similar results (mainly stability differences in aspect or elevation) as with the complete dataset. The optimal sample size was obtained in several ways: (i) assuming a nominal data scale the sample size was determined with a given test, significance level and power, and by calculating the mean and standard deviation of the complete dataset. With this method it can also be determined if the complete dataset consists of an appropriate sample size. (ii) Smaller subsets were created with similar

  12. The Geometry of Finite Equilibrium Datasets

    DEFF Research Database (Denmark)

    Balasko, Yves; Tvede, Mich

    We investigate the geometry of finite datasets defined by equilibrium prices, income distributions, and total resources. We show that the equilibrium condition imposes no restrictions if total resources are collinear, a property that is robust to small perturbations. We also show that the set...... of equilibrium datasets is pathconnected when the equilibrium condition does impose restrictions on datasets, as for example when total resources are widely non collinear....

  13. IPCC Socio-Economic Baseline Dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — The Intergovernmental Panel on Climate Change (IPCC) Socio-Economic Baseline Dataset consists of population, human development, economic, water resources, land...

  14. Veterans Affairs Suicide Prevention Synthetic Dataset

    Data.gov (United States)

    Department of Veterans Affairs — The VA's Veteran Health Administration, in support of the Open Data Initiative, is providing the Veterans Affairs Suicide Prevention Synthetic Dataset (VASPSD). The...

  15. Nanoparticle-organic pollutant interaction dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  16. An Annotated Dataset of 14 Meat Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  17. A new dataset validation system for the Planetary Science Archive

    Science.gov (United States)

    Manaud, N.; Zender, J.; Heather, D.; Martinez, S.

    2007-08-01

    The Planetary Science Archive is the official archive for the Mars Express mission. It has received its first data by the end of 2004. These data are delivered by the PI teams to the PSA team as datasets, which are formatted conform to the Planetary Data System (PDS). The PI teams are responsible for analyzing and calibrating the instrument data as well as the production of reduced and calibrated data. They are also responsible of the scientific validation of these data. ESA is responsible of the long-term data archiving and distribution to the scientific community and must ensure, in this regard, that all archived products meet quality. To do so, an archive peer-review is used to control the quality of the Mars Express science data archiving process. However a full validation of its content is missing. An independent review board recently recommended that the completeness of the archive as well as the consistency of the delivered data should be validated following well-defined procedures. A new validation software tool is being developed to complete the overall data quality control system functionality. This new tool aims to improve the quality of data and services provided to the scientific community through the PSA, and shall allow to track anomalies in and to control the completeness of datasets. It shall ensure that the PSA end-users: (1) can rely on the result of their queries, (2) will get data products that are suitable for scientific analysis, (3) can find all science data acquired during a mission. We defined dataset validation as the verification and assessment process to check the dataset content against pre-defined top-level criteria, which represent the general characteristics of good quality datasets. The dataset content that is checked includes the data and all types of information that are essential in the process of deriving scientific results and those interfacing with the PSA database. The validation software tool is a multi-mission tool that

  18. Tensor Completion Algorithms in Big Data Analytics

    OpenAIRE

    Song, Qingquan; Ge, Hancheng; Caverlee, James; Hu, Xia

    2017-01-01

    Tensor completion is a problem of filling the missing or unobserved entries of partially observed tensors. Due to the multidimensional character of tensors in describing complex datasets, tensor completion algorithms and their applications have received wide attention and achievement in areas like data mining, computer vision, signal processing, and neuroscience. In this survey, we provide a modern overview of recent advances in tensor completion algorithms from the perspective of big data an...

  19. VideoWeb Dataset for Multi-camera Activities and Non-verbal Communication

    Science.gov (United States)

    Denina, Giovanni; Bhanu, Bir; Nguyen, Hoang Thanh; Ding, Chong; Kamal, Ahmed; Ravishankar, Chinya; Roy-Chowdhury, Amit; Ivers, Allen; Varda, Brenda

    Human-activity recognition is one of the most challenging problems in computer vision. Researchers from around the world have tried to solve this problem and have come a long way in recognizing simple motions and atomic activities. As the computer vision community heads toward fully recognizing human activities, a challenging and labeled dataset is needed. To respond to that need, we collected a dataset of realistic scenarios in a multi-camera network environment (VideoWeb) involving multiple persons performing dozens of different repetitive and non-repetitive activities. This chapter describes the details of the dataset. We believe that this VideoWeb Activities dataset is unique and it is one of the most challenging datasets available today. The dataset is publicly available online at http://vwdata.ee.ucr.edu/ along with the data annotation.

  20. SIMADL: Simulated Activities of Daily Living Dataset

    Directory of Open Access Journals (Sweden)

    Talal Alshammari

    2018-04-01

    Full Text Available With the realisation of the Internet of Things (IoT paradigm, the analysis of the Activities of Daily Living (ADLs, in a smart home environment, is becoming an active research domain. The existence of representative datasets is a key requirement to advance the research in smart home design. Such datasets are an integral part of the visualisation of new smart home concepts as well as the validation and evaluation of emerging machine learning models. Machine learning techniques that can learn ADLs from sensor readings are used to classify, predict and detect anomalous patterns. Such techniques require data that represent relevant smart home scenarios, for training, testing and validation. However, the development of such machine learning techniques is limited by the lack of real smart home datasets, due to the excessive cost of building real smart homes. This paper provides two datasets for classification and anomaly detection. The datasets are generated using OpenSHS, (Open Smart Home Simulator, which is a simulation software for dataset generation. OpenSHS records the daily activities of a participant within a virtual environment. Seven participants simulated their ADLs for different contexts, e.g., weekdays, weekends, mornings and evenings. Eighty-four files in total were generated, representing approximately 63 days worth of activities. Forty-two files of classification of ADLs were simulated in the classification dataset and the other forty-two files are for anomaly detection problems in which anomalous patterns were simulated and injected into the anomaly detection dataset.

  1. Synthetic and Empirical Capsicum Annuum Image Dataset

    NARCIS (Netherlands)

    Barth, R.

    2016-01-01

    This dataset consists of per-pixel annotated synthetic (10500) and empirical images (50) of Capsicum annuum, also known as sweet or bell pepper, situated in a commercial greenhouse. Furthermore, the source models to generate the synthetic images are included. The aim of the datasets are to

  2. The NASA Subsonic Jet Particle Image Velocimetry (PIV) Dataset

    Science.gov (United States)

    Bridges, James; Wernet, Mark P.

    2011-01-01

    Many tasks in fluids engineering require prediction of turbulence of jet flows. The present document documents the single-point statistics of velocity, mean and variance, of cold and hot jet flows. The jet velocities ranged from 0.5 to 1.4 times the ambient speed of sound, and temperatures ranged from unheated to static temperature ratio 2.7. Further, the report assesses the accuracies of the data, e.g., establish uncertainties for the data. This paper covers the following five tasks: (1) Document acquisition and processing procedures used to create the particle image velocimetry (PIV) datasets. (2) Compare PIV data with hotwire and laser Doppler velocimetry (LDV) data published in the open literature. (3) Compare different datasets acquired at the same flow conditions in multiple tests to establish uncertainties. (4) Create a consensus dataset for a range of hot jet flows, including uncertainty bands. (5) Analyze this consensus dataset for self-consistency and compare jet characteristics to those of the open literature. The final objective was fulfilled by using the potential core length and the spread rate of the half-velocity radius to collapse of the mean and turbulent velocity fields over the first 20 jet diameters.

  3. Visualization of conserved structures by fusing highly variable datasets.

    Science.gov (United States)

    Silverstein, Jonathan C; Chhadia, Ankur; Dech, Fred

    2002-01-01

    Skill, effort, and time are required to identify and visualize anatomic structures in three-dimensions from radiological data. Fundamentally, automating these processes requires a technique that uses symbolic information not in the dynamic range of the voxel data. We were developing such a technique based on mutual information for automatic multi-modality image fusion (MIAMI Fuse, University of Michigan). This system previously demonstrated facility at fusing one voxel dataset with integrated symbolic structure information to a CT dataset (different scale and resolution) from the same person. The next step of development of our technique was aimed at accommodating the variability of anatomy from patient to patient by using warping to fuse our standard dataset to arbitrary patient CT datasets. A standard symbolic information dataset was created from the full color Visible Human Female by segmenting the liver parenchyma, portal veins, and hepatic veins and overwriting each set of voxels with a fixed color. Two arbitrarily selected patient CT scans of the abdomen were used for reference datasets. We used the warping functions in MIAMI Fuse to align the standard structure data to each patient scan. The key to successful fusion was the focused use of multiple warping control points that place themselves around the structure of interest automatically. The user assigns only a few initial control points to align the scans. Fusion 1 and 2 transformed the atlas with 27 points around the liver to CT1 and CT2 respectively. Fusion 3 transformed the atlas with 45 control points around the liver to CT1 and Fusion 4 transformed the atlas with 5 control points around the portal vein. The CT dataset is augmented with the transformed standard structure dataset, such that the warped structure masks are visualized in combination with the original patient dataset. This combined volume visualization is then rendered interactively in stereo on the ImmersaDesk in an immersive Virtual

  4. Reconstructing missing information on precipitation datasets: impact of tails on adopted statistical distributions.

    Science.gov (United States)

    Pedretti, Daniele; Beckie, Roger Daniel

    2014-05-01

    Missing data in hydrological time-series databases are ubiquitous in practical applications, yet it is of fundamental importance to make educated decisions in problems involving exhaustive time-series knowledge. This includes precipitation datasets, since recording or human failures can produce gaps in these time series. For some applications, directly involving the ratio between precipitation and some other quantity, lack of complete information can result in poor understanding of basic physical and chemical dynamics involving precipitated water. For instance, the ratio between precipitation (recharge) and outflow rates at a discharge point of an aquifer (e.g. rivers, pumping wells, lysimeters) can be used to obtain aquifer parameters and thus to constrain model-based predictions. We tested a suite of methodologies to reconstruct missing information in rainfall datasets. The goal was to obtain a suitable and versatile method to reduce the errors given by the lack of data in specific time windows. Our analyses included both a classical chronologically-pairing approach between rainfall stations and a probability-based approached, which accounted for the probability of exceedence of rain depths measured at two or multiple stations. Our analyses proved that it is not clear a priori which method delivers the best methodology. Rather, this selection should be based considering the specific statistical properties of the rainfall dataset. In this presentation, our emphasis is to discuss the effects of a few typical parametric distributions used to model the behavior of rainfall. Specifically, we analyzed the role of distributional "tails", which have an important control on the occurrence of extreme rainfall events. The latter strongly affect several hydrological applications, including recharge-discharge relationships. The heavy-tailed distributions we considered were parametric Log-Normal, Generalized Pareto, Generalized Extreme and Gamma distributions. The methods were

  5. A high-resolution European dataset for hydrologic modeling

    Science.gov (United States)

    Ntegeka, Victor; Salamon, Peter; Gomes, Goncalo; Sint, Hadewij; Lorini, Valerio; Thielen, Jutta

    2013-04-01

    There is an increasing demand for large scale hydrological models not only in the field of modeling the impact of climate change on water resources but also for disaster risk assessments and flood or drought early warning systems. These large scale models need to be calibrated and verified against large amounts of observations in order to judge their capabilities to predict the future. However, the creation of large scale datasets is challenging for it requires collection, harmonization, and quality checking of large amounts of observations. For this reason, only a limited number of such datasets exist. In this work, we present a pan European, high-resolution gridded dataset of meteorological observations (EFAS-Meteo) which was designed with the aim to drive a large scale hydrological model. Similar European and global gridded datasets already exist, such as the HadGHCND (Caesar et al., 2006), the JRC MARS-STAT database (van der Goot and Orlandi, 2003) and the E-OBS gridded dataset (Haylock et al., 2008). However, none of those provide similarly high spatial resolution and/or a complete set of variables to force a hydrologic model. EFAS-Meteo contains daily maps of precipitation, surface temperature (mean, minimum and maximum), wind speed and vapour pressure at a spatial grid resolution of 5 x 5 km for the time period 1 January 1990 - 31 December 2011. It furthermore contains calculated radiation, which is calculated by using a staggered approach depending on the availability of sunshine duration, cloud cover and minimum and maximum temperature, and evapotranspiration (potential evapotranspiration, bare soil and open water evapotranspiration). The potential evapotranspiration was calculated using the Penman-Monteith equation with the above-mentioned meteorological variables. The dataset was created as part of the development of the European Flood Awareness System (EFAS) and has been continuously updated throughout the last years. The dataset variables are used as

  6. The Kinetics Human Action Video Dataset

    OpenAIRE

    Kay, Will; Carreira, Joao; Simonyan, Karen; Zhang, Brian; Hillier, Chloe; Vijayanarasimhan, Sudheendra; Viola, Fabio; Green, Tim; Back, Trevor; Natsev, Paul; Suleyman, Mustafa; Zisserman, Andrew

    2017-01-01

    We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. We describe the statistics of the dataset, how it was collected, and give some ...

  7. Principal Component Analysis of Process Datasets with Missing Values

    Directory of Open Access Journals (Sweden)

    Kristen A. Severson

    2017-07-01

    Full Text Available Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building. This article considers missing data within the context of principal component analysis (PCA, which is a method originally developed for complete data that has widespread industrial application in multivariate statistical process control. Due to the prevalence of missing data and the success of PCA for handling complete data, several PCA algorithms that can act on incomplete data have been proposed. Here, algorithms for applying PCA to datasets with missing values are reviewed. A case study is presented to demonstrate the performance of the algorithms and suggestions are made with respect to choosing which algorithm is most appropriate for particular settings. An alternating algorithm based on the singular value decomposition achieved the best results in the majority of test cases involving process datasets.

  8. GUDM: Automatic Generation of Unified Datasets for Learning and Reasoning in Healthcare.

    Science.gov (United States)

    Ali, Rahman; Siddiqi, Muhammad Hameed; Idris, Muhammad; Ali, Taqdir; Hussain, Shujaat; Huh, Eui-Nam; Kang, Byeong Ho; Lee, Sungyoung

    2015-07-02

    A wide array of biomedical data are generated and made available to healthcare experts. However, due to the diverse nature of data, it is difficult to predict outcomes from it. It is therefore necessary to combine these diverse data sources into a single unified dataset. This paper proposes a global unified data model (GUDM) to provide a global unified data structure for all data sources and generate a unified dataset by a "data modeler" tool. The proposed tool implements user-centric priority based approach which can easily resolve the problems of unified data modeling and overlapping attributes across multiple datasets. The tool is illustrated using sample diabetes mellitus data. The diverse data sources to generate the unified dataset for diabetes mellitus include clinical trial information, a social media interaction dataset and physical activity data collected using different sensors. To realize the significance of the unified dataset, we adopted a well-known rough set theory based rules creation process to create rules from the unified dataset. The evaluation of the tool on six different sets of locally created diverse datasets shows that the tool, on average, reduces 94.1% time efforts of the experts and knowledge engineer while creating unified datasets.

  9. BASE MAP DATASET, LOS ANGELES COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  10. BASE MAP DATASET, CHEROKEE COUNTY, SOUTH CAROLINA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  11. SIAM 2007 Text Mining Competition dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...

  12. Harvard Aging Brain Study : Dataset and accessibility

    NARCIS (Netherlands)

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G.; Chatwal, Jasmeer P.; Papp, Kathryn V.; Amariglio, Rebecca E.; Blacker, Deborah; Rentz, Dorene M.; Johnson, Keith A.; Sperling, Reisa A.; Schultz, Aaron P.

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging.

  13. BASE MAP DATASET, HONOLULU COUNTY, HAWAII, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  14. BASE MAP DATASET, EDGEFIELD COUNTY, SOUTH CAROLINA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  15. Simulation of Smart Home Activity Datasets

    Directory of Open Access Journals (Sweden)

    Jonathan Synnott

    2015-06-01

    Full Text Available A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  16. Simulation of Smart Home Activity Datasets.

    Science.gov (United States)

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-06-16

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  17. Environmental Dataset Gateway (EDG) REST Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  18. BASE MAP DATASET, INYO COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  19. BASE MAP DATASET, JACKSON COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  20. BASE MAP DATASET, SANTA CRIZ COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  1. Climate Prediction Center IR 4km Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — CPC IR 4km dataset was created from all available individual geostationary satellite data which have been merged to form nearly seamless global (60N-60S) IR...

  2. BASE MAP DATASET, MAYES COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications: cadastral, geodetic control,...

  3. BASE MAP DATASET, KINGFISHER COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  4. Comparison of recent SnIa datasets

    International Nuclear Information System (INIS)

    Sanchez, J.C. Bueno; Perivolaropoulos, L.; Nesseris, S.

    2009-01-01

    We rank the six latest Type Ia supernova (SnIa) datasets (Constitution (C), Union (U), ESSENCE (Davis) (E), Gold06 (G), SNLS 1yr (S) and SDSS-II (D)) in the context of the Chevalier-Polarski-Linder (CPL) parametrization w(a) = w 0 +w 1 (1−a), according to their Figure of Merit (FoM), their consistency with the cosmological constant (ΛCDM), their consistency with standard rulers (Cosmic Microwave Background (CMB) and Baryon Acoustic Oscillations (BAO)) and their mutual consistency. We find a significant improvement of the FoM (defined as the inverse area of the 95.4% parameter contour) with the number of SnIa of these datasets ((C) highest FoM, (U), (G), (D), (E), (S) lowest FoM). Standard rulers (CMB+BAO) have a better FoM by about a factor of 3, compared to the highest FoM SnIa dataset (C). We also find that the ranking sequence based on consistency with ΛCDM is identical with the corresponding ranking based on consistency with standard rulers ((S) most consistent, (D), (C), (E), (U), (G) least consistent). The ranking sequence of the datasets however changes when we consider the consistency with an expansion history corresponding to evolving dark energy (w 0 ,w 1 ) = (−1.4,2) crossing the phantom divide line w = −1 (it is practically reversed to (G), (U), (E), (S), (D), (C)). The SALT2 and MLCS2k2 fitters are also compared and some peculiar features of the SDSS-II dataset when standardized with the MLCS2k2 fitter are pointed out. Finally, we construct a statistic to estimate the internal consistency of a collection of SnIa datasets. We find that even though there is good consistency among most samples taken from the above datasets, this consistency decreases significantly when the Gold06 (G) dataset is included in the sample

  5. Comparison of Shallow Survey 2012 Multibeam Datasets

    Science.gov (United States)

    Ramirez, T. M.

    2012-12-01

    The purpose of the Shallow Survey common dataset is a comparison of the different technologies utilized for data acquisition in the shallow survey marine environment. The common dataset consists of a series of surveys conducted over a common area of seabed using a variety of systems. It provides equipment manufacturers the opportunity to showcase their latest systems while giving hydrographic researchers and scientists a chance to test their latest algorithms on the dataset so that rigorous comparisons can be made. Five companies collected data for the Common Dataset in the Wellington Harbor area in New Zealand between May 2010 and May 2011; including Kongsberg, Reson, R2Sonic, GeoAcoustics, and Applied Acoustics. The Wellington harbor and surrounding coastal area was selected since it has a number of well-defined features, including the HMNZS South Seas and HMNZS Wellington wrecks, an armored seawall constructed of Tetrapods and Akmons, aquifers, wharves and marinas. The seabed inside the harbor basin is largely fine-grained sediment, with gravel and reefs around the coast. The area outside the harbor on the southern coast is an active environment, with moving sand and exposed reefs. A marine reserve is also in this area. For consistency between datasets, the coastal research vessel R/V Ikatere and crew were used for all surveys conducted for the common dataset. Using Triton's Perspective processing software multibeam datasets collected for the Shallow Survey were processed for detail analysis. Datasets from each sonar manufacturer were processed using the CUBE algorithm developed by the Center for Coastal and Ocean Mapping/Joint Hydrographic Center (CCOM/JHC). Each dataset was gridded at 0.5 and 1.0 meter resolutions for cross comparison and compliance with International Hydrographic Organization (IHO) requirements. Detailed comparisons were made of equipment specifications (transmit frequency, number of beams, beam width), data density, total uncertainty, and

  6. Resolution testing and limitations of geodetic and tsunami datasets for finite fault inversions along subduction zones

    Science.gov (United States)

    Williamson, A.; Newman, A. V.

    2017-12-01

    Finite fault inversions utilizing multiple datasets have become commonplace for large earthquakes pending data availability. The mixture of geodetic datasets such as Global Navigational Satellite Systems (GNSS) and InSAR, seismic waveforms, and when applicable, tsunami waveforms from Deep-Ocean Assessment and Reporting of Tsunami (DART) gauges, provide slightly different observations that when incorporated together lead to a more robust model of fault slip distribution. The merging of different datasets is of particular importance along subduction zones where direct observations of seafloor deformation over the rupture area are extremely limited. Instead, instrumentation measures related ground motion from tens to hundreds of kilometers away. The distance from the event and dataset type can lead to a variable degree of resolution, affecting the ability to accurately model the spatial distribution of slip. This study analyzes the spatial resolution attained individually from geodetic and tsunami datasets as well as in a combined dataset. We constrain the importance of distance between estimated parameters and observed data and how that varies between land-based and open ocean datasets. Analysis focuses on accurately scaled subduction zone synthetic models as well as analysis of the relationship between slip and data in recent large subduction zone earthquakes. This study shows that seafloor deformation sensitive datasets, like open-ocean tsunami waveforms or seafloor geodetic instrumentation, can provide unique offshore resolution for understanding most large and particularly tsunamigenic megathrust earthquake activity. In most environments, we simply lack the capability to resolve static displacements using land-based geodetic observations.

  7. Structure completion for facade layouts

    KAUST Repository

    Fan, Lubin

    2014-11-18

    (Figure Presented) We present a method to complete missing structures in facade layouts. Starting from an abstraction of the partially observed layout as a set of shapes, we can propose one or multiple possible completed layouts. Structure completion with large missing parts is an ill-posed problem. Therefore, we combine two sources of information to derive our solution: the observed shapes and a database of complete layouts. The problem is also very difficult, because shape positions and attributes have to be estimated jointly. Our proposed solution is to break the problem into two components: a statistical model to evaluate layouts and a planning algorithm to generate candidate layouts. This ensures that the completed result is consistent with the observation and the layouts in the database.

  8. Review and Analysis of Algorithmic Approaches Developed for Prognostics on CMAPSS Dataset

    Science.gov (United States)

    2014-12-23

    training datasets that have complete (run-to-failure) tra- jectories. Using data with complete trajectories gives access to the true End-of-Life ( EOL ) to...2010) as the actual EOL is known apriori. This allows testing the critical time aspect of a prediction in addi- tion to accuracy and precision

  9. Power analysis dataset for QCA based multiplexer circuits

    Directory of Open Access Journals (Sweden)

    Md. Abdullah-Al-Shafi

    2017-04-01

    Full Text Available Power consumption in irreversible QCA logic circuits is a vital and a major issue; however in the practical cases, this focus is mostly omitted.The complete power depletion dataset of different QCA multiplexers have been worked out in this paper. At −271.15 °C temperature, the depletion is evaluated under three separate tunneling energy levels. All the circuits are designed with QCADesigner, a broadly used simulation engine and QCAPro tool has been applied for estimating the power dissipation.

  10. 3DSEM: A 3D microscopy dataset

    Directory of Open Access Journals (Sweden)

    Ahmad P. Tafti

    2016-03-01

    Full Text Available The Scanning Electron Microscope (SEM as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples. Keywords: 3D microscopy dataset, 3D microscopy vision, 3D SEM surface reconstruction, Scanning Electron Microscope (SEM

  11. Data Mining for Imbalanced Datasets: An Overview

    Science.gov (United States)

    Chawla, Nitesh V.

    A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying machine learning techniques to difficult "real-world" problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performance of a classifier, might not be appropriate when the data is imbalanced and/or the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniques used for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.

  12. Genomics dataset of unidentified disclosed isolates

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-09-01

    Full Text Available Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis. Keywords: BioLABs, Blunt ends, Genomics, NEB cutter, Restriction digestion, Short DNA sequences, Sticky ends

  13. Long-term dataset on aquatic responses to concurrent climate change and recovery from acidification

    Science.gov (United States)

    Leach, Taylor H.; Winslow, Luke A.; Acker, Frank W.; Bloomfield, Jay A.; Boylen, Charles W.; Bukaveckas, Paul A.; Charles, Donald F.; Daniels, Robert A.; Driscoll, Charles T.; Eichler, Lawrence W.; Farrell, Jeremy L.; Funk, Clara S.; Goodrich, Christine A.; Michelena, Toby M.; Nierzwicki-Bauer, Sandra A.; Roy, Karen M.; Shaw, William H.; Sutherland, James W.; Swinton, Mark W.; Winkler, David A.; Rose, Kevin C.

    2018-04-01

    Concurrent regional and global environmental changes are affecting freshwater ecosystems. Decadal-scale data on lake ecosystems that can describe processes affected by these changes are important as multiple stressors often interact to alter the trajectory of key ecological phenomena in complex ways. Due to the practical challenges associated with long-term data collections, the majority of existing long-term data sets focus on only a small number of lakes or few response variables. Here we present physical, chemical, and biological data from 28 lakes in the Adirondack Mountains of northern New York State. These data span the period from 1994-2012 and harmonize multiple open and as-yet unpublished data sources. The dataset creation is reproducible and transparent; R code and all original files used to create the dataset are provided in an appendix. This dataset will be useful for examining ecological change in lakes undergoing multiple stressors.

  14. Completely continuous and weakly completely continuous abstract ...

    Indian Academy of Sciences (India)

    An algebra A is called right completely continuous (right weakly completely continuous) ... Moreover, some applications of these results in group algebras are .... A linear subspace S(G) of L1(G) is said to be a Segal algebra, if it satisfies the.

  15. Random Coefficient Logit Model for Large Datasets

    NARCIS (Netherlands)

    C. Hernández-Mireles (Carlos); D. Fok (Dennis)

    2010-01-01

    textabstractWe present an approach for analyzing market shares and products price elasticities based on large datasets containing aggregate sales data for many products, several markets and for relatively long time periods. We consider the recently proposed Bayesian approach of Jiang et al [Jiang,

  16. Thesaurus Dataset of Educational Technology in Chinese

    Science.gov (United States)

    Wu, Linjing; Liu, Qingtang; Zhao, Gang; Huang, Huan; Huang, Tao

    2015-01-01

    The thesaurus dataset of educational technology is a knowledge description of educational technology in Chinese. The aims of this thesaurus were to collect the subject terms in the domain of educational technology, facilitate the standardization of terminology and promote the communication between Chinese researchers and scholars from various…

  17. Multiple plots in R

    DEFF Research Database (Denmark)

    Edwards, Stefan McKinnon

    2012-01-01

    In this chapter I will investigate how to combine multiple plots into a single. The scenario is a dataset of a series of measurements, on three samples in three situations. There are many ways we can display this, e.g. 3d graphs or faceting. 3d graphs are not good for displaying static data so we...

  18. The Role of Datasets on Scientific Influence within Conflict Research

    Science.gov (United States)

    Van Holt, Tracy; Johnson, Jeffery C.; Moates, Shiloh; Carley, Kathleen M.

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving “conflict” in the Web of Science (WoS) over a 66-year period (1945–2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed—such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957–1971 where ideas didn’t persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped

  19. The Role of Datasets on Scientific Influence within Conflict Research.

    Directory of Open Access Journals (Sweden)

    Tracy Van Holt

    Full Text Available We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS over a 66-year period (1945-2011. We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA, a specialized social network analysis on this citation network (~1.5 million works, to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993. The critical path consisted of a number of key features: 1 Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2 Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3 We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography. Publically available conflict datasets developed early on helped

  20. The Role of Datasets on Scientific Influence within Conflict Research.

    Science.gov (United States)

    Van Holt, Tracy; Johnson, Jeffery C; Moates, Shiloh; Carley, Kathleen M

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS) over a 66-year period (1945-2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped shape the

  1. Vector Nonlinear Time-Series Analysis of Gamma-Ray Burst Datasets on Heterogeneous Clusters

    Directory of Open Access Journals (Sweden)

    Ioana Banicescu

    2005-01-01

    Full Text Available The simultaneous analysis of a number of related datasets using a single statistical model is an important problem in statistical computing. A parameterized statistical model is to be fitted on multiple datasets and tested for goodness of fit within a fixed analytical framework. Definitive conclusions are hopefully achieved by analyzing the datasets together. This paper proposes a strategy for the efficient execution of this type of analysis on heterogeneous clusters. Based on partitioning processors into groups for efficient communications and a dynamic loop scheduling approach for load balancing, the strategy addresses the variability of the computational loads of the datasets, as well as the unpredictable irregularities of the cluster environment. Results from preliminary tests of using this strategy to fit gamma-ray burst time profiles with vector functional coefficient autoregressive models on 64 processors of a general purpose Linux cluster demonstrate the effectiveness of the strategy.

  2. Sharing Video Datasets in Design Research

    DEFF Research Database (Denmark)

    Christensen, Bo; Abildgaard, Sille Julie Jøhnk

    2017-01-01

    This paper examines how design researchers, design practitioners and design education can benefit from sharing a dataset. We present the Design Thinking Research Symposium 11 (DTRS11) as an exemplary project that implied sharing video data of design processes and design activity in natural settings...... with a large group of fellow academics from the international community of Design Thinking Research, for the purpose of facilitating research collaboration and communication within the field of Design and Design Thinking. This approach emphasizes the social and collaborative aspects of design research, where...... a multitude of appropriate perspectives and methods may be utilized in analyzing and discussing the singular dataset. The shared data is, from this perspective, understood as a design object in itself, which facilitates new ways of working, collaborating, studying, learning and educating within the expanding...

  3. Automatic processing of multimodal tomography datasets.

    Science.gov (United States)

    Parsons, Aaron D; Price, Stephen W T; Wadeson, Nicola; Basham, Mark; Beale, Andrew M; Ashton, Alun W; Mosselmans, J Frederick W; Quinn, Paul D

    2017-01-01

    With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.

  4. Interpolation of diffusion weighted imaging datasets

    DEFF Research Database (Denmark)

    Dyrby, Tim B; Lundell, Henrik; Burke, Mark W

    2014-01-01

    anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal......Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer...... interpolation methods fail to disentangle fine anatomical details if PVE is too pronounced in the original data. As for validation we used ex-vivo DWI datasets acquired at various image resolutions as well as Nissl-stained sections. Increasing the image resolution by a factor of eight yielded finer geometrical...

  5. Data assimilation and model evaluation experiment datasets

    Science.gov (United States)

    Lai, Chung-Cheng A.; Qian, Wen; Glenn, Scott M.

    1994-01-01

    The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMEE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets. The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: (1) collection of observational data; (2) analysis and interpretation; (3) interpolation using the Optimum Thermal Interpolation System package; (4) quality control and re-analysis; and (5) data archiving and software documentation. The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement. Suggestions for DAMEE data usages include (1) ocean modeling and data assimilation studies, (2) diagnosis and theoretical studies, and (3) comparisons with locally detailed observations.

  6. A hybrid organic-inorganic perovskite dataset

    Science.gov (United States)

    Kim, Chiho; Huan, Tran Doan; Krishnan, Sridevi; Ramprasad, Rampi

    2017-05-01

    Hybrid organic-inorganic perovskites (HOIPs) have been attracting a great deal of attention due to their versatility of electronic properties and fabrication methods. We prepare a dataset of 1,346 HOIPs, which features 16 organic cations, 3 group-IV cations and 4 halide anions. Using a combination of an atomic structure search method and density functional theory calculations, the optimized structures, the bandgap, the dielectric constant, and the relative energies of the HOIPs are uniformly prepared and validated by comparing with relevant experimental and/or theoretical data. We make the dataset available at Dryad Digital Repository, NoMaD Repository, and Khazana Repository (http://khazana.uconn.edu/), hoping that it could be useful for future data-mining efforts that can explore possible structure-property relationships and phenomenological models. Progressive extension of the dataset is expected as new organic cations become appropriate within the HOIP framework, and as additional properties are calculated for the new compounds found.

  7. FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets.

    Science.gov (United States)

    Shcherbina, Anna

    2014-08-15

    High-throughput next generation sequencing technologies have enabled rapid characterization of clinical and environmental samples. Consequently, the largest bottleneck to actionable data has become sample processing and bioinformatics analysis, creating a need for accurate and rapid algorithms to process genetic data. Perfectly characterized in silico datasets are a useful tool for evaluating the performance of such algorithms. Background contaminating organisms are observed in sequenced mixtures of organisms. In silico samples provide exact truth. To create the best value for evaluating algorithms, in silico data should mimic actual sequencer data as closely as possible. FASTQSim is a tool that provides the dual functionality of NGS dataset characterization and metagenomic data generation. FASTQSim is sequencing platform-independent, and computes distributions of read length, quality scores, indel rates, single point mutation rates, indel size, and similar statistics for any sequencing platform. To create training or testing datasets, FASTQSim has the ability to convert target sequences into in silico reads with specific error profiles obtained in the characterization step. FASTQSim enables users to assess the quality of NGS datasets. The tool provides information about read length, read quality, repetitive and non-repetitive indel profiles, and single base pair substitutions. FASTQSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software. In this regard, in silico datasets generated with the FASTQsim tool hold several advantages over natural datasets: they are sequencing platform independent, extremely well characterized, and less expensive to generate. Such datasets are valuable in a number of applications, including the training of assemblers for multiple platforms, benchmarking bioinformatics algorithm performance, and creating challenge

  8. Quantifying uncertainty in observational rainfall datasets

    Science.gov (United States)

    Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

    2015-04-01

    The CO-ordinated Regional Downscaling Experiment (CORDEX) has to date seen the publication of at least ten journal papers that examine the African domain during 2012 and 2013. Five of these papers consider Africa generally (Nikulin et al. 2012, Kim et al. 2013, Hernandes-Dias et al. 2013, Laprise et al. 2013, Panitz et al. 2013) and five have regional foci: Tramblay et al. (2013) on Northern Africa, Mariotti et al. (2014) and Gbobaniyi el al. (2013) on West Africa, Endris et al. (2013) on East Africa and Kalagnoumou et al. (2013) on southern Africa. There also are a further three papers that the authors know about under review. These papers all use an observed rainfall and/or temperature data to evaluate/validate the regional model output and often proceed to assess projected changes in these variables due to climate change in the context of these observations. The most popular reference rainfall data used are the CRU, GPCP, GPCC, TRMM and UDEL datasets. However, as Kalagnoumou et al. (2013) point out there are many other rainfall datasets available for consideration, for example, CMORPH, FEWS, TAMSAT & RIANNAA, TAMORA and the WATCH & WATCH-DEI data. They, with others (Nikulin et al. 2012, Sylla et al. 2012) show that the observed datasets can have a very wide spread at a particular space-time coordinate. As more ground, space and reanalysis-based rainfall products become available, all which use different methods to produce precipitation data, the selection of reference data is becoming an important factor in model evaluation. A number of factors can contribute to a uncertainty in terms of the reliability and validity of the datasets such as radiance conversion algorithims, the quantity and quality of available station data, interpolation techniques and blending methods used to combine satellite and guage based products. However, to date no comprehensive study has been performed to evaluate the uncertainty in these observational datasets. We assess 18 gridded

  9. Accuracy of Digitally Fabricated Wax Denture Bases and Conventional Completed Complete Dentures

    Directory of Open Access Journals (Sweden)

    Bogna Stawarczyk

    2017-12-01

    Full Text Available The purpose of this investigation was to analyze the accuracy of digitally fabricated wax trial dentures and conventionally finalized complete dentures in comparison to a surface tessellation language (STL-dataset. A generated data set for the denture bases and the tooth sockets was used, converted into STL-format, and saved as reference. Five mandibular and 5 maxillary denture bases were milled from wax blanks and denture teeth were waxed into their tooth sockets. Each complete denture was checked on fit, waxed onto the dental cast, and digitized using an optical laboratory scanning device. The complete dentures were completed conventionally using the injection method, finished, and scanned. The resulting STL-datasets were exported into the three-dimensional (3D software GOM Inspect. Each of the 5 mandibular and 5 maxillary complete dentures was aligned with the STL- and the wax trial denture dataset. Alignment was performed based on a best-fit algorithm. A three-dimensional analysis of the spatial divergences in x-, y- and z-axes was performed by the 3D software and visualized in a color-coded illustration. The mean positive and negative deviations between the datasets were calculated automatically. In a direct comparison between maxillary wax trial dentures and complete dentures, complete dentures showed higher deviations from the STL-dataset than the wax trial dentures. The deviations occurred in the area of the teeth as well as in the distal area of the denture bases. In contrast, the highest deviations in both the mandibular wax trial dentures and the mandibular complete dentures were observed in the distal area. The complete dentures showed higher deviations on the occlusal surfaces of the teeth compared to the wax dentures. Computer-aided design/computer-aided manufacturing (CAD/CAM-fabricated wax dentures exhibited fewer deviations from the STL-reference than the complete dentures. The deviations were significantly greater in the

  10. Challenges and Experiences of Building Multidisciplinary Datasets across Cultures

    Science.gov (United States)

    Jamiyansharav, K.; Laituri, M.; Fernandez-Gimenez, M.; Fassnacht, S. R.; Venable, N. B. H.; Allegretti, A. M.; Reid, R.; Baival, B.; Jamsranjav, C.; Ulambayar, T.; Linn, S.; Angerer, J.

    2017-12-01

    Efficient data sharing and management are key challenges to multidisciplinary scientific research. These challenges are further complicated by adding a multicultural component. We address the construction of a complex database for social-ecological analysis in Mongolia. Funded by the National Science Foundation (NSF) Dynamics of Coupled Natural and Human (CNH) Systems, the Mongolian Rangelands and Resilience (MOR2) project focuses on the vulnerability of Mongolian pastoral systems to climate change and adaptive capacity. The MOR2 study spans over three years of fieldwork in 36 paired districts (Soum) from 18 provinces (Aimag) of Mongolia that covers steppe, mountain forest steppe, desert steppe and eastern steppe ecological zones. Our project team is composed of hydrologists, social scientists, geographers, and ecologists. The MOR2 database includes multiple ecological, social, meteorological, geospatial and hydrological datasets, as well as archives of original data and survey in multiple formats. Managing this complex database requires significant organizational skills, attention to detail and ability to communicate within collective team members from diverse disciplines and across multiple institutions in the US and Mongolia. We describe the database's rich content, organization, structure and complexity. We discuss lessons learned, best practices and recommendations for complex database management, sharing, and archiving in creating a cross-cultural and multi-disciplinary database.

  11. Orthology detection combining clustering and synteny for very large datasets.

    Science.gov (United States)

    Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K; Prohaska, Sonja J; Stadler, Peter F

    2014-01-01

    The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.

  12. Orthology detection combining clustering and synteny for very large datasets.

    Directory of Open Access Journals (Sweden)

    Marcus Lechner

    Full Text Available The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.

  13. Scientific Datasets: Discovery and Aggregation for Semantic Interpretation.

    Science.gov (United States)

    Lopez, L. A.; Scott, S.; Khalsa, S. J. S.; Duerr, R.

    2015-12-01

    One of the biggest challenges that interdisciplinary researchers face is finding suitable datasets in order to advance their science; this problem remains consistent across multiple disciplines. A surprising number of scientists, when asked what tool they use for data discovery, reply "Google", which is an acceptable solution in some cases but not even Google can find -or cares to compile- all the data that's relevant for science and particularly geo sciences. If a dataset is not discoverable through a well known search provider it will remain dark data to the scientific world.For the past year, BCube, an EarthCube Building Block project, has been developing, testing and deploying a technology stack capable of data discovery at web-scale using the ultimate dataset: The Internet. This stack has 2 principal components, a web-scale crawling infrastructure and a semantic aggregator. The web-crawler is a modified version of Apache Nutch (the originator of Hadoop and other big data technologies) that has been improved and tailored for data and data service discovery. The second component is semantic aggregation, carried out by a python-based workflow that extracts valuable metadata and stores it in the form of triples through the use semantic technologies.While implementing the BCube stack we have run into several challenges such as a) scaling the project to cover big portions of the Internet at a reasonable cost, b) making sense of very diverse and non-homogeneous data, and lastly, c) extracting facts about these datasets using semantic technologies in order to make them usable for the geosciences community. Despite all these challenges we have proven that we can discover and characterize data that otherwise would have remained in the dark corners of the Internet. Having all this data indexed and 'triplelized' will enable scientists to access a trove of information relevant to their work in a more natural way. An important characteristic of the BCube stack is that all

  14. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Science.gov (United States)

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  15. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Directory of Open Access Journals (Sweden)

    Seyhan Yazar

    Full Text Available A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR on Amazon EC2 instances and Google Compute Engine (GCE, using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2 for E.coli and 53.5% (95% CI: 34.4-72.6 for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1 and 173.9% (95% CI: 134.6-213.1 more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  16. Developing a Data-Set for Stereopsis

    Directory of Open Access Journals (Sweden)

    D.W Hunter

    2014-08-01

    Full Text Available Current research on binocular stereopsis in humans and non-human primates has been limited by a lack of available data-sets. Current data-sets fall into two categories; stereo-image sets with vergence but no ranging information (Hibbard, 2008, Vision Research, 48(12, 1427-1439 or combinations of depth information with binocular images and video taken from cameras in fixed fronto-parallel configurations exhibiting neither vergence or focus effects (Hirschmuller & Scharstein, 2007, IEEE Conf. Computer Vision and Pattern Recognition. The techniques for generating depth information are also imperfect. Depth information is normally inaccurate or simply missing near edges and on partially occluded surfaces. For many areas of vision research these are the most interesting parts of the image (Goutcher, Hunter, Hibbard, 2013, i-Perception, 4(7, 484; Scarfe & Hibbard, 2013, Vision Research. Using state-of-the-art open-source ray-tracing software (PBRT as a back-end, our intention is to release a set of tools that will allow researchers in this field to generate artificial binocular stereoscopic data-sets. Although not as realistic as photographs, computer generated images have significant advantages in terms of control over the final output and ground-truth information about scene depth is easily calculated at all points in the scene, even partially occluded areas. While individual researchers have been developing similar stimuli by hand for many decades, we hope that our software will greatly reduce the time and difficulty of creating naturalistic binocular stimuli. Our intension in making this presentation is to elicit feedback from the vision community about what sort of features would be desirable in such software.

  17. Human-machine interaction to disambiguate entities in unstructured text and structured datasets

    Science.gov (United States)

    Ward, Kevin; Davenport, Jack

    2017-05-01

    Creating entity network graphs is a manual, time consuming process for an intelligence analyst. Beyond the traditional big data problems of information overload, individuals are often referred to by multiple names and shifting titles as they advance in their organizations over time which quickly makes simple string or phonetic alignment methods for entities insufficient. Conversely, automated methods for relationship extraction and entity disambiguation typically produce questionable results with no way for users to vet results, correct mistakes or influence the algorithm's future results. We present an entity disambiguation tool, DRADIS, which aims to bridge the gap between human-centric and machinecentric methods. DRADIS automatically extracts entities from multi-source datasets and models them as a complex set of attributes and relationships. Entities are disambiguated across the corpus using a hierarchical model executed in Spark allowing it to scale to operational sized data. Resolution results are presented to the analyst complete with sourcing information for each mention and relationship allowing analysts to quickly vet the correctness of results as well as correct mistakes. Corrected results are used by the system to refine the underlying model allowing analysts to optimize the general model to better deal with their operational data. Providing analysts with the ability to validate and correct the model to produce a system they can trust enables them to better focus their time on producing higher quality analysis products.

  18. [Multiple meningiomas].

    Science.gov (United States)

    Terrier, L-M; François, P

    2016-06-01

    Multiple meningiomas (MMs) or meningiomatosis are defined by the presence of at least 2 lesions that appear simultaneously or not, at different intracranial locations, without the association of neurofibromatosis. They present 1-9 % of meningiomas with a female predominance. The occurrence of multiple meningiomas is not clear. There are 2 main hypotheses for their development, one that supports the independent evolution of these tumors and the other, completely opposite, that suggests the propagation of tumor cells of a unique clone transformation, through cerebrospinal fluid. NF2 gene mutation is an important intrinsic risk factor in the etiology of multiple meningiomas and some exogenous risk factors have been suspected but only ionizing radiation exposure has been proven. These tumors can grow anywhere in the skull but they are more frequently observed in supratentorial locations. Their histologic types are similar to unique meningiomas of psammomatous, fibroblastic, meningothelial or transitional type and in most cases are benign tumors. The prognosis of these tumors is eventually good and does not differ from the unique tumors except for the cases of radiation-induced multiple meningiomas, in the context of NF2 or when diagnosed in children where the outcome is less favorable. Each meningioma lesion should be dealt with individually and their multiple character should not justify their resection at all costs. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  19. ClimateNet: A Machine Learning dataset for Climate Science Research

    Science.gov (United States)

    Prabhat, M.; Biard, J.; Ganguly, S.; Ames, S.; Kashinath, K.; Kim, S. K.; Kahou, S.; Maharaj, T.; Beckham, C.; O'Brien, T. A.; Wehner, M. F.; Williams, D. N.; Kunkel, K.; Collins, W. D.

    2017-12-01

    Deep Learning techniques have revolutionized commercial applications in Computer vision, speech recognition and control systems. The key for all of these developments was the creation of a curated, labeled dataset ImageNet, for enabling multiple research groups around the world to develop methods, benchmark performance and compete with each other. The success of Deep Learning can be largely attributed to the broad availability of this dataset. Our empirical investigations have revealed that Deep Learning is similarly poised to benefit the task of pattern detection in climate science. Unfortunately, labeled datasets, a key pre-requisite for training, are hard to find. Individual research groups are typically interested in specialized weather patterns, making it hard to unify, and share datasets across groups and institutions. In this work, we are proposing ClimateNet: a labeled dataset that provides labeled instances of extreme weather patterns, as well as associated raw fields in model and observational output. We develop a schema in NetCDF to enumerate weather pattern classes/types, store bounding boxes, and pixel-masks. We are also working on a TensorFlow implementation to natively import such NetCDF datasets, and are providing a reference convolutional architecture for binary classification tasks. Our hope is that researchers in Climate Science, as well as ML/DL, will be able to use (and extend) ClimateNet to make rapid progress in the application of Deep Learning for Climate Science research.

  20. Reconstructing flaw image using dataset of full matrix capture technique

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Tae Hun; Kim, Yong Sik; Lee, Jeong Seok [KHNP Central Research Institute, Daejeon (Korea, Republic of)

    2017-02-15

    A conventional phased array ultrasonic system offers the ability to steer an ultrasonic beam by applying independent time delays of individual elements in the array and produce an ultrasonic image. In contrast, full matrix capture (FMC) is a data acquisition process that collects a complete matrix of A-scans from every possible independent transmit-receive combination in a phased array transducer and makes it possible to reconstruct various images that cannot be produced by conventional phased array with the post processing as well as images equivalent to a conventional phased array image. In this paper, a basic algorithm based on the LLL mode total focusing method (TFM) that can image crack type flaws is described. And this technique was applied to reconstruct flaw images from the FMC dataset obtained from the experiments and ultrasonic simulation.

  1. Common pitfalls in statistical analysis: The perils of multiple testing

    Science.gov (United States)

    Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc

    2016-01-01

    Multiple testing refers to situations where a dataset is subjected to statistical testing multiple times - either at multiple time-points or through multiple subgroups or for multiple end-points. This amplifies the probability of a false-positive finding. In this article, we look at the consequences of multiple testing and explore various methods to deal with this issue. PMID:27141478

  2. Integrated remotely sensed datasets for disaster management

    Science.gov (United States)

    McCarthy, Timothy; Farrell, Ronan; Curtis, Andrew; Fotheringham, A. Stewart

    2008-10-01

    Video imagery can be acquired from aerial, terrestrial and marine based platforms and has been exploited for a range of remote sensing applications over the past two decades. Examples include coastal surveys using aerial video, routecorridor infrastructures surveys using vehicle mounted video cameras, aerial surveys over forestry and agriculture, underwater habitat mapping and disaster management. Many of these video systems are based on interlaced, television standards such as North America's NTSC and European SECAM and PAL television systems that are then recorded using various video formats. This technology has recently being employed as a front-line, remote sensing technology for damage assessment post-disaster. This paper traces the development of spatial video as a remote sensing tool from the early 1980s to the present day. The background to a new spatial-video research initiative based at National University of Ireland, Maynooth, (NUIM) is described. New improvements are proposed and include; low-cost encoders, easy to use software decoders, timing issues and interoperability. These developments will enable specialists and non-specialists collect, process and integrate these datasets within minimal support. This integrated approach will enable decision makers to access relevant remotely sensed datasets quickly and so, carry out rapid damage assessment during and post-disaster.

  3. A Dataset for Education-Related Majors' Performance Measures with Pre/Post-Video Game Practice

    Science.gov (United States)

    Novak, Elena; Tassell, Janet Lynne

    2015-01-01

    This dataset includes a series of 30 education-related majors' performance measures before and after they completed a 10-hour video game practice in a computer lab. The goal of the experimental study was to examine the effects of action video gaming on students' mathematics performance and mathematics anxiety as mediated by the effect of attention…

  4. Feasibility of Optimizing Recovery and Reserves from a Mature and Geological Complex Multiple Turbidite Offshore Calif. Reservoir through the Drilling and Completion of a Trilateral Horizontal Well, Class III

    Energy Technology Data Exchange (ETDEWEB)

    Pacific Operators Offshore, Inc.

    2001-04-04

    The intent of this project was to increase production and extend the economic life of this mature field through the application of advanced reservoir characterization and drilling technology, demonstrating the efficacy of these technologies to other small operators of aging fields. Two study periods were proposed; the first to include data assimilation and reservoir characterization and the second to drill the demonstration well. The initial study period showed that a single tri-lateral well would not be economically efficient in redevelopment of Carpinteria's multiple deep water turbidite sand reservoirs, and the study was amended to include the drilling of a series of horizontal redrills from existing surplus well bores on Pacific Operators' Platform Hogan.

  5. Strontium removal jar test dataset for all figures and tables.

    Data.gov (United States)

    U.S. Environmental Protection Agency — The datasets where used to generate data to demonstrate strontium removal under various water quality and treatment conditions. This dataset is associated with the...

  6. Latino College Completion: Hawaii

    Science.gov (United States)

    Excelencia in Education (NJ1), 2012

    2012-01-01

    In 2009, Excelencia in Education launched the Ensuring America's Future initiative to inform, organize, and engage leaders in a tactical plan to increase Latino college completion. An executive summary of Latino College Completion in 50 states synthesizes information on 50 state factsheets and builds on the national benchmarking guide. Each…

  7. Latino College Completion: Pennsylvania

    Science.gov (United States)

    Excelencia in Education (NJ1), 2012

    2012-01-01

    In 2009, Excelencia in Education launched the Ensuring America's Future initiative to inform, organize, and engage leaders in a tactical plan to increase Latino college completion. An executive summary of Latino College Completion in 50 states synthesizes information on 50 state factsheets and builds on the national benchmarking guide. Each…

  8. Predicting dataset popularity for the CMS experiment

    CERN Document Server

    INSPIRE-00005122; Li, Ting; Giommi, Luca; Bonacorsi, Daniele; Wildish, Tony

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure.

  9. Predicting dataset popularity for the CMS experiment

    International Nuclear Information System (INIS)

    Kuznetsov, V.; Li, T.; Giommi, L.; Bonacorsi, D.; Wildish, T.

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure. (paper)

  10. MIPS bacterial genomes functional annotation benchmark dataset.

    Science.gov (United States)

    Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

    2005-05-15

    Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab

  11. 2006 Fynmeet sea clutter measurement trial: Datasets

    CSIR Research Space (South Africa)

    Herselman, PLR

    2007-09-06

    Full Text Available -011............................................................................................................................................................................................. 25 iii Dataset CAD14-001 0 5 10 15 20 25 30 35 10 20 30 40 50 60 70 80 90 R an ge G at e # Time [s] A bs ol ut e R an ge [m ] RCS [dBm2] vs. time and range for f1 = 9.000 GHz - CAD14-001 2400 2600 2800... 40 10 20 30 40 50 60 70 80 90 R an ge G at e # Time [s] A bs ol ut e R an ge [m ] RCS [dBm2] vs. time and range for f1 = 9.000 GHz - CAD14-002 2400 2600 2800 3000 3200 3400 3600 -30 -25 -20 -15 -10 -5 0 5 10...

  12. Completeness, supervenience and ontology

    International Nuclear Information System (INIS)

    Maudlin, Tim W E

    2007-01-01

    In 1935, Einstein, Podolsky and Rosen raised the issue of the completeness of the quantum description of a physical system. What they had in mind is whether or not the quantum description is informationally complete, in that all physical features of a system can be recovered from it. In a collapse theory such as the theory of Ghirardi, Rimini and Weber, the quantum wavefunction is informationally complete, and this has often been taken to suggest that according to that theory the wavefunction is all there is. If we distinguish the ontological completeness of a description from its informational completeness, we can see that the best interpretations of the GRW theory must postulate more physical ontology than just the wavefunction

  13. Completeness, supervenience and ontology

    Energy Technology Data Exchange (ETDEWEB)

    Maudlin, Tim W E [Department of Philosophy, Rutgers University, 26 Nichol Avenue, New Brunswick, NJ 08901-1411 (United States)

    2007-03-23

    In 1935, Einstein, Podolsky and Rosen raised the issue of the completeness of the quantum description of a physical system. What they had in mind is whether or not the quantum description is informationally complete, in that all physical features of a system can be recovered from it. In a collapse theory such as the theory of Ghirardi, Rimini and Weber, the quantum wavefunction is informationally complete, and this has often been taken to suggest that according to that theory the wavefunction is all there is. If we distinguish the ontological completeness of a description from its informational completeness, we can see that the best interpretations of the GRW theory must postulate more physical ontology than just the wavefunction.

  14. Wind Integration National Dataset Toolkit | Grid Modernization | NREL

    Science.gov (United States)

    Integration National Dataset Toolkit Wind Integration National Dataset Toolkit The Wind Integration National Dataset (WIND) Toolkit is an update and expansion of the Eastern Wind Integration Data Set and Western Wind Integration Data Set. It supports the next generation of wind integration studies. WIND

  15. Solar Integration National Dataset Toolkit | Grid Modernization | NREL

    Science.gov (United States)

    Solar Integration National Dataset Toolkit Solar Integration National Dataset Toolkit NREL is working on a Solar Integration National Dataset (SIND) Toolkit to enable researchers to perform U.S . regional solar generation integration studies. It will provide modeled, coherent subhourly solar power data

  16. QSAR ligand dataset for modelling mutagenicity, genotoxicity, and rodent carcinogenicity

    Directory of Open Access Journals (Sweden)

    Davy Guan

    2018-04-01

    Full Text Available Five datasets were constructed from ligand and bioassay result data from the literature. These datasets include bioassay results from the Ames mutagenicity assay, Greenscreen GADD-45a-GFP assay, Syrian Hamster Embryo (SHE assay, and 2 year rat carcinogenicity assay results. These datasets provide information about chemical mutagenicity, genotoxicity and carcinogenicity.

  17. Nephele: genotyping via complete composition vectors and MapReduce

    Directory of Open Access Journals (Sweden)

    Mardis Scott

    2011-08-01

    Full Text Available Abstract Background Current sequencing technology makes it practical to sequence many samples of a given organism, raising new challenges for the processing and interpretation of large genomics data sets with associated metadata. Traditional computational phylogenetic methods are ideal for studying the evolution of gene/protein families and using those to infer the evolution of an organism, but are less than ideal for the study of the whole organism mainly due to the presence of insertions/deletions/rearrangements. These methods provide the researcher with the ability to group a set of samples into distinct genotypic groups based on sequence similarity, which can then be associated with metadata, such as host information, pathogenicity, and time or location of occurrence. Genotyping is critical to understanding, at a genomic level, the origin and spread of infectious diseases. Increasingly, genotyping is coming into use for disease surveillance activities, as well as for microbial forensics. The classic genotyping approach has been based on phylogenetic analysis, starting with a multiple sequence alignment. Genotypes are then established by expert examination of phylogenetic trees. However, these traditional single-processor methods are suboptimal for rapidly growing sequence datasets being generated by next-generation DNA sequencing machines, because they increase in computational complexity quickly with the number of sequences. Results Nephele is a suite of tools that uses the complete composition vector algorithm to represent each sequence in the dataset as a vector derived from its constituent k-mers by passing the need for multiple sequence alignment, and affinity propagation clustering to group the sequences into genotypes based on a distance measure over the vectors. Our methods produce results that correlate well with expert-defined clades or genotypes, at a fraction of the computational cost of traditional phylogenetic methods run on

  18. Chemical elements in the environment: multi-element geochemical datasets from continental to national scale surveys on four continents

    Science.gov (United States)

    Caritat, Patrice de; Reimann, Clemens; Smith, David; Wang, Xueqiu

    2017-01-01

    During the last 10-20 years, Geological Surveys around the world have undertaken a major effort towards delivering fully harmonized and tightly quality-controlled low-density multi-element soil geochemical maps and datasets of vast regions including up to whole continents. Concentrations of between 45 and 60 elements commonly have been determined in a variety of different regolith types (e.g., sediment, soil). The multi-element datasets are published as complete geochemical atlases and made available to the general public. Several other geochemical datasets covering smaller areas but generally at a higher spatial density are also available. These datasets may, however, not be found by superficial internet-based searches because the elements are not mentioned individually either in the title or in the keyword lists of the original references. This publication attempts to increase the visibility and discoverability of these fundamental background datasets covering large areas up to whole continents.

  19. Analysis of Public Datasets for Wearable Fall Detection Systems.

    Science.gov (United States)

    Casilari, Eduardo; Santoyo-Ramón, José-Antonio; Cano-García, José-Manuel

    2017-06-27

    Due to the boom of wireless handheld devices such as smartwatches and smartphones, wearable Fall Detection Systems (FDSs) have become a major focus of attention among the research community during the last years. The effectiveness of a wearable FDS must be contrasted against a wide variety of measurements obtained from inertial sensors during the occurrence of falls and Activities of Daily Living (ADLs). In this regard, the access to public databases constitutes the basis for an open and systematic assessment of fall detection techniques. This paper reviews and appraises twelve existing available data repositories containing measurements of ADLs and emulated falls envisaged for the evaluation of fall detection algorithms in wearable FDSs. The analysis of the found datasets is performed in a comprehensive way, taking into account the multiple factors involved in the definition of the testbeds deployed for the generation of the mobility samples. The study of the traces brings to light the lack of a common experimental benchmarking procedure and, consequently, the large heterogeneity of the datasets from a number of perspectives (length and number of samples, typology of the emulated falls and ADLs, characteristics of the test subjects, features and positions of the sensors, etc.). Concerning this, the statistical analysis of the samples reveals the impact of the sensor range on the reliability of the traces. In addition, the study evidences the importance of the selection of the ADLs and the need of categorizing the ADLs depending on the intensity of the movements in order to evaluate the capability of a certain detection algorithm to discriminate falls from ADLs.

  20. Analysis of Public Datasets for Wearable Fall Detection Systems

    Directory of Open Access Journals (Sweden)

    Eduardo Casilari

    2017-06-01

    Full Text Available Due to the boom of wireless handheld devices such as smartwatches and smartphones, wearable Fall Detection Systems (FDSs have become a major focus of attention among the research community during the last years. The effectiveness of a wearable FDS must be contrasted against a wide variety of measurements obtained from inertial sensors during the occurrence of falls and Activities of Daily Living (ADLs. In this regard, the access to public databases constitutes the basis for an open and systematic assessment of fall detection techniques. This paper reviews and appraises twelve existing available data repositories containing measurements of ADLs and emulated falls envisaged for the evaluation of fall detection algorithms in wearable FDSs. The analysis of the found datasets is performed in a comprehensive way, taking into account the multiple factors involved in the definition of the testbeds deployed for the generation of the mobility samples. The study of the traces brings to light the lack of a common experimental benchmarking procedure and, consequently, the large heterogeneity of the datasets from a number of perspectives (length and number of samples, typology of the emulated falls and ADLs, characteristics of the test subjects, features and positions of the sensors, etc.. Concerning this, the statistical analysis of the samples reveals the impact of the sensor range on the reliability of the traces. In addition, the study evidences the importance of the selection of the ADLs and the need of categorizing the ADLs depending on the intensity of the movements in order to evaluate the capability of a certain detection algorithm to discriminate falls from ADLs.

  1. Advanced Neuropsychological Diagnostics Infrastructure (ANDI): A Normative Database Created from Control Datasets.

    Science.gov (United States)

    de Vent, Nathalie R; Agelink van Rentergem, Joost A; Schmand, Ben A; Murre, Jaap M J; Huizenga, Hilde M

    2016-01-01

    In the Advanced Neuropsychological Diagnostics Infrastructure (ANDI), datasets of several research groups are combined into a single database, containing scores on neuropsychological tests from healthy participants. For most popular neuropsychological tests the quantity, and range of these data surpasses that of traditional normative data, thereby enabling more accurate neuropsychological assessment. Because of the unique structure of the database, it facilitates normative comparison methods that were not feasible before, in particular those in which entire profiles of scores are evaluated. In this article, we describe the steps that were necessary to combine the separate datasets into a single database. These steps involve matching variables from multiple datasets, removing outlying values, determining the influence of demographic variables, and finding appropriate transformations to normality. Also, a brief description of the current contents of the ANDI database is given.

  2. Advanced Neuropsychological Diagnostics Infrastructure (ANDI: A Normative Database Created from Control Datasets.

    Directory of Open Access Journals (Sweden)

    Nathalie R. de Vent

    2016-10-01

    Full Text Available In the Advanced Neuropsychological Diagnostics Infrastructure (ANDI, datasets of several research groups are combined into a single database, containing scores on neuropsychological tests from healthy participants. For most popular neuropsychological tests the quantity and range of these data surpasses that of traditional normative data, thereby enabling more accurate neuropsychological assessment. Because of the unique structure of the database, it facilitates normative comparison methods that were not feasible before, in particular those in which entire profiles of scores are evaluated. In this article, we describe the steps that were necessary to combine the separate datasets into a single database. These steps involve matching variables from multiple datasets, removing outlying values, determining the influence of demographic variables, and finding appropriate transformations to normality. Also, a brief description of the current contents of the ANDI database is given.

  3. A first dataset toward a standardized community-driven global mapping of the human immunopeptidome

    Directory of Open Access Journals (Sweden)

    Pouya Faridi

    2016-06-01

    Full Text Available We present the first standardized HLA peptidomics dataset generated by the immunopeptidomics community. The dataset is composed of native HLA class I peptides as well as synthetic HLA class II peptides that were acquired in data-dependent acquisition mode using multiple types of mass spectrometers. All laboratories used the spiked-in landmark iRT peptides for retention time normalization and data analysis. The mass spectrometric data were deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier http://www.ebi.ac.uk/pride/archive/projects/PXD001872. The generated data were used to build HLA allele-specific peptide spectral and assay libraries, which were stored in the SWATHAtlas database. Data presented here are described in more detail in the original eLife article entitled ‘An open-source computational and data resource to analyze digital maps of immunopeptidomes’.

  4. Complete Ureteral Avulsion

    Directory of Open Access Journals (Sweden)

    V. Gupta

    2005-01-01

    Full Text Available Complete avulsion of the ureter is one of the most serious complications of ureteroscopy. It requires open or laparoscopic intervention for repair. This case report emphasizes its management and presents recommendations for prevention in current urological practice.

  5. Medical Image Data and Datasets in the Era of Machine Learning-Whitepaper from the 2016 C-MIMI Meeting Dataset Session.

    Science.gov (United States)

    Kohli, Marc D; Summers, Ronald M; Geis, J Raymond

    2017-08-01

    At the first annual Conference on Machine Intelligence in Medical Imaging (C-MIMI), held in September 2016, a conference session on medical image data and datasets for machine learning identified multiple issues. The common theme from attendees was that everyone participating in medical image evaluation with machine learning is data starved. There is an urgent need to find better ways to collect, annotate, and reuse medical imaging data. Unique domain issues with medical image datasets require further study, development, and dissemination of best practices and standards, and a coordinated effort among medical imaging domain experts, medical imaging informaticists, government and industry data scientists, and interested commercial, academic, and government entities. High-level attributes of reusable medical image datasets suitable to train, test, validate, verify, and regulate ML products should be better described. NIH and other government agencies should promote and, where applicable, enforce, access to medical image datasets. We should improve communication among medical imaging domain experts, medical imaging informaticists, academic clinical and basic science researchers, government and industry data scientists, and interested commercial entities.

  6. DBcloud: Semantic Dataset for the cloud

    NARCIS (Netherlands)

    Morsey, M.; Willner, A.; Loughnane, R.; Giatili, M.; Papagianni, C.; Baldin, I.; Grosso, P.; Al-Hazmi, Y.

    2016-01-01

    In cloud environments, the process of matching requests from users with the available computing resources is a challenging task. This is even more complex in federated environments, where multiple providers cooperate to offer enhanced services, suitable for distributed applications. In order to

  7. Large Scale Flood Risk Analysis using a New Hyper-resolution Population Dataset

    Science.gov (United States)

    Smith, A.; Neal, J. C.; Bates, P. D.; Quinn, N.; Wing, O.

    2017-12-01

    Here we present the first national scale flood risk analyses, using high resolution Facebook Connectivity Lab population data and data from a hyper resolution flood hazard model. In recent years the field of large scale hydraulic modelling has been transformed by new remotely sensed datasets, improved process representation, highly efficient flow algorithms and increases in computational power. These developments have allowed flood risk analysis to be undertaken in previously unmodeled territories and from continental to global scales. Flood risk analyses are typically conducted via the integration of modelled water depths with an exposure dataset. Over large scales and in data poor areas, these exposure data typically take the form of a gridded population dataset, estimating population density using remotely sensed data and/or locally available census data. The local nature of flooding dictates that for robust flood risk analysis to be undertaken both hazard and exposure data should sufficiently resolve local scale features. Global flood frameworks are enabling flood hazard data to produced at 90m resolution, resulting in a mis-match with available population datasets which are typically more coarsely resolved. Moreover, these exposure data are typically focused on urban areas and struggle to represent rural populations. In this study we integrate a new population dataset with a global flood hazard model. The population dataset was produced by the Connectivity Lab at Facebook, providing gridded population data at 5m resolution, representing a resolution increase over previous countrywide data sets of multiple orders of magnitude. Flood risk analysis undertaken over a number of developing countries are presented, along with a comparison of flood risk analyses undertaken using pre-existing population datasets.

  8. RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system

    Science.gov (United States)

    Jensen, Tue V.; Pinson, Pierre

    2017-11-01

    Future highly renewable energy systems will couple to complex weather and climate dynamics. This coupling is generally not captured in detail by the open models developed in the power and energy system communities, where such open models exist. To enable modeling such a future energy system, we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven forecasts and corresponding realizations for renewable energy generation for a period of 3 years. These may be scaled according to the envisioned degrees of renewable penetration in a future European energy system. The spatial coverage, completeness and resolution of this dataset, open the door to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecasting of renewable power generation.

  9. RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system.

    Science.gov (United States)

    Jensen, Tue V; Pinson, Pierre

    2017-11-28

    Future highly renewable energy systems will couple to complex weather and climate dynamics. This coupling is generally not captured in detail by the open models developed in the power and energy system communities, where such open models exist. To enable modeling such a future energy system, we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven forecasts and corresponding realizations for renewable energy generation for a period of 3 years. These may be scaled according to the envisioned degrees of renewable penetration in a future European energy system. The spatial coverage, completeness and resolution of this dataset, open the door to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecasting of renewable power generation.

  10. An innovative privacy preserving technique for incremental datasets on cloud computing.

    Science.gov (United States)

    Aldeen, Yousra Abdul Alsahib S; Salleh, Mazleena; Aljeroudi, Yazan

    2016-08-01

    Cloud computing (CC) is a magnificent service-based delivery with gigantic computer processing power and data storage across connected communications channels. It imparted overwhelming technological impetus in the internet (web) mediated IT industry, where users can easily share private data for further analysis and mining. Furthermore, user affable CC services enable to deploy sundry applications economically. Meanwhile, simple data sharing impelled various phishing attacks and malware assisted security threats. Some privacy sensitive applications like health services on cloud that are built with several economic and operational benefits necessitate enhanced security. Thus, absolute cyberspace security and mitigation against phishing blitz became mandatory to protect overall data privacy. Typically, diverse applications datasets are anonymized with better privacy to owners without providing all secrecy requirements to the newly added records. Some proposed techniques emphasized this issue by re-anonymizing the datasets from the scratch. The utmost privacy protection over incremental datasets on CC is far from being achieved. Certainly, the distribution of huge datasets volume across multiple storage nodes limits the privacy preservation. In this view, we propose a new anonymization technique to attain better privacy protection with high data utility over distributed and incremental datasets on CC. The proficiency of data privacy preservation and improved confidentiality requirements is demonstrated through performance evaluation. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. Statistical segmentation of multidimensional brain datasets

    Science.gov (United States)

    Desco, Manuel; Gispert, Juan D.; Reig, Santiago; Santos, Andres; Pascau, Javier; Malpica, Norberto; Garcia-Barreno, Pedro

    2001-07-01

    This paper presents an automatic segmentation procedure for MRI neuroimages that overcomes part of the problems involved in multidimensional clustering techniques like partial volume effects (PVE), processing speed and difficulty of incorporating a priori knowledge. The method is a three-stage procedure: 1) Exclusion of background and skull voxels using threshold-based region growing techniques with fully automated seed selection. 2) Expectation Maximization algorithms are used to estimate the probability density function (PDF) of the remaining pixels, which are assumed to be mixtures of gaussians. These pixels can then be classified into cerebrospinal fluid (CSF), white matter and grey matter. Using this procedure, our method takes advantage of using the full covariance matrix (instead of the diagonal) for the joint PDF estimation. On the other hand, logistic discrimination techniques are more robust against violation of multi-gaussian assumptions. 3) A priori knowledge is added using Markov Random Field techniques. The algorithm has been tested with a dataset of 30 brain MRI studies (co-registered T1 and T2 MRI). Our method was compared with clustering techniques and with template-based statistical segmentation, using manual segmentation as a gold-standard. Our results were more robust and closer to the gold-standard.

  12. ASSESSING SMALL SAMPLE WAR-GAMING DATASETS

    Directory of Open Access Journals (Sweden)

    W. J. HURLEY

    2013-10-01

    Full Text Available One of the fundamental problems faced by military planners is the assessment of changes to force structure. An example is whether to replace an existing capability with an enhanced system. This can be done directly with a comparison of measures such as accuracy, lethality, survivability, etc. However this approach does not allow an assessment of the force multiplier effects of the proposed change. To gauge these effects, planners often turn to war-gaming. For many war-gaming experiments, it is expensive, both in terms of time and dollars, to generate a large number of sample observations. This puts a premium on the statistical methodology used to examine these small datasets. In this paper we compare the power of three tests to assess population differences: the Wald-Wolfowitz test, the Mann-Whitney U test, and re-sampling. We employ a series of Monte Carlo simulation experiments. Not unexpectedly, we find that the Mann-Whitney test performs better than the Wald-Wolfowitz test. Resampling is judged to perform slightly better than the Mann-Whitney test.

  13. [Research on developping the spectral dataset for Dunhuang typical colors based on color constancy].

    Science.gov (United States)

    Liu, Qiang; Wan, Xiao-Xia; Liu, Zhen; Li, Chan; Liang, Jin-Xing

    2013-11-01

    The present paper aims at developping a method to reasonably set up the typical spectral color dataset for different kinds of Chinese cultural heritage in color rendering process. The world famous wall paintings dating from more than 1700 years ago in Dunhuang Mogao Grottoes was taken as typical case in this research. In order to maintain the color constancy during the color rendering workflow of Dunhuang culture relics, a chromatic adaptation based method for developping the spectral dataset of typical colors for those wall paintings was proposed from the view point of human vision perception ability. Under the help and guidance of researchers in the art-research institution and protection-research institution of Dunhuang Academy and according to the existing research achievement of Dunhuang Research in the past years, 48 typical known Dunhuang pigments were chosen and 240 representative color samples were made with reflective spectral ranging from 360 to 750 nm was acquired by a spectrometer. In order to find the typical colors of the above mentioned color samples, the original dataset was devided into several subgroups by clustering analysis. The grouping number, together with the most typical samples for each subgroup which made up the firstly built typical color dataset, was determined by wilcoxon signed rank test according to the color inconstancy index comprehensively calculated under 6 typical illuminating conditions. Considering the completeness of gamut of Dunhuang wall paintings, 8 complementary colors was determined and finally the typical spectral color dataset was built up which contains 100 representative spectral colors. The analytical calculating results show that the median color inconstancy index of the built dataset in 99% confidence level by wilcoxon signed rank test was 3.28 and the 100 colors are distributing in the whole gamut uniformly, which ensures that this dataset can provide reasonable reference for choosing the color with highest

  14. Automatic registration method for multisensor datasets adopted for dimensional measurements on cutting tools

    International Nuclear Information System (INIS)

    Shaw, L; Mehari, F; Weckenmann, A; Ettl, S; Häusler, G

    2013-01-01

    Multisensor systems with optical 3D sensors are frequently employed to capture complete surface information by measuring workpieces from different views. During coarse and fine registration the resulting datasets are afterward transformed into one common coordinate system. Automatic fine registration methods are well established in dimensional metrology, whereas there is a deficit in automatic coarse registration methods. The advantage of a fully automatic registration procedure is twofold: it enables a fast and contact-free alignment and further a flexible application to datasets of any kind of optical 3D sensor. In this paper, an algorithm adapted for a robust automatic coarse registration is presented. The method was originally developed for the field of object reconstruction or localization. It is based on a segmentation of planes in the datasets to calculate the transformation parameters. The rotation is defined by the normals of three corresponding segmented planes of two overlapping datasets, while the translation is calculated via the intersection point of the segmented planes. First results have shown that the translation is strongly shape dependent: 3D data of objects with non-orthogonal planar flanks cannot be registered with the current method. In the novel supplement for the algorithm, the translation is additionally calculated via the distance between centroids of corresponding segmented planes, which results in more than one option for the transformation. A newly introduced measure considering the distance between the datasets after coarse registration evaluates the best possible transformation. Results of the robust automatic registration method are presented on the example of datasets taken from a cutting tool with a fringe-projection system and a focus-variation system. The successful application in dimensional metrology is proven with evaluations of shape parameters based on the registered datasets of a calibrated workpiece. (paper)

  15. Completeness of Lyapunov Abstraction

    DEFF Research Database (Denmark)

    Wisniewski, Rafal; Sloth, Christoffer

    2013-01-01

    the vector field, which allows the generation of a complete abstraction. To compute the functions that define the subdivision of the state space in an algorithm, we formulate a sum of squares optimization problem. This optimization problem finds the best subdivisioning functions, with respect to the ability......This paper addresses the generation of complete abstractions of polynomial dynamical systems by timed automata. For the proposed abstraction, the state space is divided into cells by sublevel sets of functions. We identify a relation between these functions and their directional derivatives along...

  16. Construction completion report

    International Nuclear Information System (INIS)

    1990-01-01

    This Construction Completion Report documents the major construction projects at the Waste Isolation Pilot Plant (WIPP) site and related information on contracts, schedules, and other areas which affected construction. This report is not intended to be an exhaustive detailed analysis of construction, but is a general overview and summary of the WIPP construction. 10 refs., 29 figs

  17. Complete Rerouting Protection

    DEFF Research Database (Denmark)

    Stidsen, Thomas K.; Kjærulff, Peter

    2005-01-01

    In this paper we present a new protection method: Complete Rerouting. This is the most capacity e cient protection method for circuit switched networks and it is, to the best of our knowledge, the first time it has been described. We implement a column generation algorithm and test the performance...

  18. Complete French Teach Yourself

    CERN Document Server

    Graham, Gaelle

    2010-01-01

    The best-selling complete course for a fun and effective way to learn French. This ISBN is for the paperback book. The corresponding audio support (ISBN: 9781444100068) is also available. The book and audio support can also be purchased as a pack (ISBN: 9781444100051).

  19. Completeness of Lyapunov Abstraction

    Directory of Open Access Journals (Sweden)

    Rafael Wisniewski

    2013-08-01

    Full Text Available In this work, we continue our study on discrete abstractions of dynamical systems. To this end, we use a family of partitioning functions to generate an abstraction. The intersection of sub-level sets of the partitioning functions defines cells, which are regarded as discrete objects. The union of cells makes up the state space of the dynamical systems. Our construction gives rise to a combinatorial object - a timed automaton. We examine sound and complete abstractions. An abstraction is said to be sound when the flow of the time automata covers the flow lines of the dynamical systems. If the dynamics of the dynamical system and the time automaton are equivalent, the abstraction is complete. The commonly accepted paradigm for partitioning functions is that they ought to be transversal to the studied vector field. We show that there is no complete partitioning with transversal functions, even for particular dynamical systems whose critical sets are isolated critical points. Therefore, we allow the directional derivative along the vector field to be non-positive in this work. This considerably complicates the abstraction technique. For understanding dynamical systems, it is vital to study stable and unstable manifolds and their intersections. These objects appear naturally in this work. Indeed, we show that for an abstraction to be complete, the set of critical points of an abstraction function shall contain either the stable or unstable manifold of the dynamical system.

  20. Dual completion method

    Energy Technology Data Exchange (ETDEWEB)

    Mamedov, N Ya; Kadymova, K S; Dzhafarov, Sh T

    1963-10-28

    One type of dual completion method utilizes a single tubing string. Through the use of the proper tubing equipment, the fluid from the low-productive upper formation is lifted by utilizing the surplus energy of a submerged pump, which handles the production from the lower stratum.

  1. A complete woman

    Indian Academy of Sciences (India)

    Lawrence

    treated me like a son in the way he encouraged my education, while my mother ... cine gives me a lot of satisfaction when I see my patients getting cured. Teaching ... thing in life as a complete woman in different roles – daughter, wife, mother ...

  2. Matrix and Tensor Completion on a Human Activity Recognition Framework.

    Science.gov (United States)

    Savvaki, Sofia; Tsagkatakis, Grigorios; Panousopoulou, Athanasia; Tsakalides, Panagiotis

    2017-11-01

    Sensor-based activity recognition is encountered in innumerable applications of the arena of pervasive healthcare and plays a crucial role in biomedical research. Nonetheless, the frequent situation of unobserved measurements impairs the ability of machine learning algorithms to efficiently extract context from raw streams of data. In this paper, we study the problem of accurate estimation of missing multimodal inertial data and we propose a classification framework that considers the reconstruction of subsampled data during the test phase. We introduce the concept of forming the available data streams into low-rank two-dimensional (2-D) and 3-D Hankel structures, and we exploit data redundancies using sophisticated imputation techniques, namely matrix and tensor completion. Moreover, we examine the impact of reconstruction on the classification performance by experimenting with several state-of-the-art classifiers. The system is evaluated with respect to different data structuring scenarios, the volume of data available for reconstruction, and various levels of missing values per device. Finally, the tradeoff between subsampling accuracy and energy conservation in wearable platforms is examined. Our analysis relies on two public datasets containing inertial data, which extend to numerous activities, multiple sensing parameters, and body locations. The results highlight that robust classification accuracy can be achieved through recovery, even for extremely subsampled data streams.

  3. The Dataset of Countries at Risk of Electoral Violence

    OpenAIRE

    Birch, Sarah; Muchlinski, David

    2017-01-01

    Electoral violence is increasingly affecting elections around the world, yet researchers have been limited by a paucity of granular data on this phenomenon. This paper introduces and describes a new dataset of electoral violence – the Dataset of Countries at Risk of Electoral Violence (CREV) – that provides measures of 10 different types of electoral violence across 642 elections held around the globe between 1995 and 2013. The paper provides a detailed account of how and why the dataset was ...

  4. Norwegian Hydrological Reference Dataset for Climate Change Studies

    Energy Technology Data Exchange (ETDEWEB)

    Magnussen, Inger Helene; Killingland, Magnus; Spilde, Dag

    2012-07-01

    Based on the Norwegian hydrological measurement network, NVE has selected a Hydrological Reference Dataset for studies of hydrological change. The dataset meets international standards with high data quality. It is suitable for monitoring and studying the effects of climate change on the hydrosphere and cryosphere in Norway. The dataset includes streamflow, groundwater, snow, glacier mass balance and length change, lake ice and water temperature in rivers and lakes.(Author)

  5. SPATIO-TEMPORAL DATA MODEL FOR INTEGRATING EVOLVING NATION-LEVEL DATASETS

    Directory of Open Access Journals (Sweden)

    A. Sorokine

    2017-10-01

    Full Text Available Ability to easily combine the data from diverse sources in a single analytical workflow is one of the greatest promises of the Big Data technologies. However, such integration is often challenging as datasets originate from different vendors, governments, and research communities that results in multiple incompatibilities including data representations, formats, and semantics. Semantics differences are hardest to handle: different communities often use different attribute definitions and associate the records with different sets of evolving geographic entities. Analysis of global socioeconomic variables across multiple datasets over prolonged time is often complicated by the difference in how boundaries and histories of countries or other geographic entities are represented. Here we propose an event-based data model for depicting and tracking histories of evolving geographic units (countries, provinces, etc. and their representations in disparate data. The model addresses the semantic challenge of preserving identity of geographic entities over time by defining criteria for the entity existence, a set of events that may affect its existence, and rules for mapping between different representations (datasets. Proposed model is used for maintaining an evolving compound database of global socioeconomic and environmental data harvested from multiple sources. Practical implementation of our model is demonstrated using PostgreSQL object-relational database with the use of temporal, geospatial, and NoSQL database extensions.

  6. Spatio-Temporal Data Model for Integrating Evolving Nation-Level Datasets

    Science.gov (United States)

    Sorokine, A.; Stewart, R. N.

    2017-10-01

    Ability to easily combine the data from diverse sources in a single analytical workflow is one of the greatest promises of the Big Data technologies. However, such integration is often challenging as datasets originate from different vendors, governments, and research communities that results in multiple incompatibilities including data representations, formats, and semantics. Semantics differences are hardest to handle: different communities often use different attribute definitions and associate the records with different sets of evolving geographic entities. Analysis of global socioeconomic variables across multiple datasets over prolonged time is often complicated by the difference in how boundaries and histories of countries or other geographic entities are represented. Here we propose an event-based data model for depicting and tracking histories of evolving geographic units (countries, provinces, etc.) and their representations in disparate data. The model addresses the semantic challenge of preserving identity of geographic entities over time by defining criteria for the entity existence, a set of events that may affect its existence, and rules for mapping between different representations (datasets). Proposed model is used for maintaining an evolving compound database of global socioeconomic and environmental data harvested from multiple sources. Practical implementation of our model is demonstrated using PostgreSQL object-relational database with the use of temporal, geospatial, and NoSQL database extensions.

  7. Public Availability to ECS Collected Datasets

    Science.gov (United States)

    Henderson, J. F.; Warnken, R.; McLean, S. J.; Lim, E.; Varner, J. D.

    2013-12-01

    Coastal nations have spent considerable resources exploring the limits of their extended continental shelf (ECS) beyond 200 nm. Although these studies are funded to fulfill requirements of the UN Convention on the Law of the Sea, the investments are producing new data sets in frontier areas of Earth's oceans that will be used to understand, explore, and manage the seafloor and sub-seafloor for decades to come. Although many of these datasets are considered proprietary until a nation's potential ECS has become 'final and binding' an increasing amount of data are being released and utilized by the public. Data sets include multibeam, seismic reflection/refraction, bottom sampling, and geophysical data. The U.S. ECS Project, a multi-agency collaboration whose mission is to establish the full extent of the continental shelf of the United States consistent with international law, relies heavily on data and accurate, standard metadata. The United States has made it a priority to make available to the public all data collected with ECS-funding as quickly as possible. The National Oceanic and Atmospheric Administration's (NOAA) National Geophysical Data Center (NGDC) supports this objective by partnering with academia and other federal government mapping agencies to archive, inventory, and deliver marine mapping data in a coordinated, consistent manner. This includes ensuring quality, standard metadata and developing and maintaining data delivery capabilities built on modern digital data archives. Other countries, such as Ireland, have submitted their ECS data for public availability and many others have made pledges to participate in the future. The data services provided by NGDC support the U.S. ECS effort as well as many developing nation's ECS effort through the U.N. Environmental Program. Modern discovery, visualization, and delivery of scientific data and derived products that span national and international sources of data ensure the greatest re-use of data and

  8. The complete cosmicomics

    CERN Document Server

    Calvino, Italo

    2014-01-01

    The definitive edition of Calvino’s cosmicomics, bringing together all of these enchanting stories—including some never before translated—in one volume for the first time. In Italo Calvino’s cosmicomics, primordial beings cavort on the nearby surface of the moon, play marbles with atoms, and bear ecstatic witness to Earth’s first dawn. Exploring natural phenomena and the origins of the universe, these beloved tales relate complex scientific concepts to our common sensory, emotional, human world. Now, The Complete Cosmicomics brings together all of the cosmicomic stories for the first time. Containing works previously published in Cosmicomics, t zero, and Numbers in the Dark, this single volume also includes seven previously uncollected stories, four of which have never been published in translation in the United States. This “complete and definitive collection” (Evening Standard) reconfirms the cosmicomics as a crowning literary achievement and makes them available to new generations of reader...

  9. BIA Indian Lands Dataset (Indian Lands of the United States)

    Data.gov (United States)

    Federal Geographic Data Committee — The American Indian Reservations / Federally Recognized Tribal Entities dataset depicts feature location, selected demographics and other associated data for the 561...

  10. Framework for Interactive Parallel Dataset Analysis on the Grid

    Energy Technology Data Exchange (ETDEWEB)

    Alexander, David A.; Ananthan, Balamurali; /Tech-X Corp.; Johnson, Tony; Serbo, Victor; /SLAC

    2007-01-10

    We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.

  11. Socioeconomic Data and Applications Center (SEDAC) Treaty Status Dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — The Socioeconomic Data and Application Center (SEDAC) Treaty Status Dataset contains comprehensive treaty information for multilateral environmental agreements,...

  12. CMS Is Finally Completed

    CERN Multimedia

    2008-01-01

    Yet another step in the completion of the Large Hadron Collider was taken yesterday morning, as the final element of the Compact Muon Solenoid was lowered nearly 100 meters bellow ground. After more than eight years of work at the world's most powerful particle accelerator, scientists hope that they will be able to start initial experiments with the LHC until the end of this year.

  13. LEAR construction completed

    CERN Multimedia

    CERN PhotoLab

    1982-01-01

    In July 1982, LEAR construction was completed, the individual systems had been dry-tested. On 16 July, the first 50 MeV (309 MeV/c) protons from Linac 1 were injected and circulated. On 11 October, the first antiprotons from the AA, decelerated in the PS to 609 MeV/c, were injected. Also in 1982, acceleration, deceleration and stochastic cooling were successfully tested. See 9007366 for a more detailed description. See also 8201061, 8204131, 8309026.

  14. SHIVA laser: nearing completion

    International Nuclear Information System (INIS)

    Glaze, J.A.; Godwin, R.O.

    1977-01-01

    Construction of the Shiva laser system is nearing completion. This laser will be operating in fall 1977 and will produce over 20 terawatts of focusable power in a subnanosecond pulse. Fusion experiments will begin early in 1978. It is anticipated that thermonuclear energy release equal to one percent that of the incident light energy will be achieved with sub-millimeter deuterium-tritium targets. From other experiments densities in excess of a thousand times that of liquid are also expected

  15. Neutron multiplication measurement instrument

    International Nuclear Information System (INIS)

    Nixon, K.V.; Dowdy, E.J.; France, S.W.; Millegan, D.R.; Robba, A.A.

    1983-01-01

    The Advanced Nuclear Technology Group of the Los Alamos National Laboratory is now using intelligent data-acquisition and analysis instrumentation for determining the multiplication of nuclear material. Earlier instrumentation, such as the large NIM-crate systems, depended on house power and required additional computation to determine multiplication or to estimate error. The portable, battery-powered multiplication measurement unit, with advanced computational power, acquires data, calculates multiplication, and completes error analysis automatically. Thus, the multiplication is determined easily and an available error estimate enables the user to judge the significance of results

  16. SCT Barrel Assembly Complete

    CERN Multimedia

    L. Batchelor

    As reported in the April 2005 issue of the ATLAS eNews, the first of the four Semiconductor Tracker (SCT) barrels, complete with modules and services, arrived safely at CERN in January of 2005. In the months since January, the other three completed barrels arrived as well, and integration of the four barrels into the entire barrel assembly commenced at CERN, in the SR1 building on the ATLAS experimental site, in July. Assembly was completed on schedule in September, with the addition of the innermost layer to the 4-barrel assembly. Work is now underway to seal the barrel thermal enclosure. This is necessary in order to enclose the silicon tracker in a nitrogen atmosphere and provide it with faraday-cage protection, and is a delicate and complicated task: 352 silicon module powertapes, 352 readout-fibre bundles, and over 400 Detector Control System sensors must be carefully sealed into the thermal enclosure bulkhead. The team is currently verifying the integrity of the low mass cooling system, which must be d...

  17. The LRO Diviner Foundation Dataset: A Comprehensive Temperature Record of the Moon

    Science.gov (United States)

    Sefton-Nash, E.; Aye, K. M.; Williams, J. P.; Greenhagen, B. T.; Sullivan, M.; Paige, D. A.

    2014-12-01

    The Diviner Lunar Radiometer Experiment aboard NASA's Lunar Reconnaissance Orbiter (LRO) has been systematically mapping the thermal state of the Moon at a mean rate of >1400 observations/second since July 2009. Diviner measures solar reflectance and infrared radiance in 9 spectral channels with bandpasses from 0.3 - 400 μm. With more than 5 years of continuous data, complete spatial coverage of the lunar surface is achieved multiple times and coverage of local solar time enables the diurnal curve to be well-resolved for a given subsolar point. The Diviner Foundation Dataset (FDS) represents a coordinated effort to recalibrate raw data to improve quality, and produce a definitive and comprehensive set of products for use by the lunar science community. We present the contents and organization of the FDS, background on the enhanced processing pipeline, show how it is retrieved from NASA's Planetary Data System, and demonstrate its use with common mapping & analysis tools. The FDS comprises level 1 Reduced Data Records (RDRs) and level 2/3 Gridded Data Records (GDRs). We produce new RDRs using improved calibration algorithms that remove instrument artifacts and improve accuracy of measured radiance, particularly for polar data in permanently shadowed regions. GDRs are built using a per-orbit gridding scheme, and data are sourced from a database constructed by modeling the effective field-of-view for each observation. Notable gridded products available for lunar science include: 1) Globally mapped brightness temperatures for all channels in tiled cylindrical and polar stereographic map projections, 2) Global hourly temperature snapshots - maps of bolometric temperature binned into 1 hour intervals of local time, 3) Topographic products (elevation, slope and azimuth) for each map tile, that represent the terrain model used to process the data, and 4) Accompanying gridded maps of auxiliary quantities such as emission angle, local solar time, error etc…, for filtering

  18. A dataset from bottom trawl survey around Taiwan

    Directory of Open Access Journals (Sweden)

    Kwang-tsao Shao

    2012-05-01

    Full Text Available Bottom trawl fishery is one of the most important coastal fisheries in Taiwan both in production and economic values. However, its annual production started to decline due to overfishing since the 1980s. Its bycatch problem also damages the fishery resource seriously. Thus, the government banned the bottom fishery within 3 nautical miles along the shoreline in 1989. To evaluate the effectiveness of this policy, a four year survey was conducted from 2000–2003, in the waters around Taiwan and Penghu (Pescadore Islands, one region each year respectively. All fish specimens collected from trawling were brought back to lab for identification, individual number count and body weight measurement. These raw data have been integrated and established in Taiwan Fish Database (http://fishdb.sinica.edu.tw. They have also been published through TaiBIF (http://taibif.tw, FishBase and GBIF (website see below. This dataset contains 631 fish species and 3,529 records, making it the most complete demersal fish fauna and their temporal and spatial distributional data on the soft marine habitat in Taiwan.

  19. Genomics dataset on unclassified published organism (patent US 7547531

    Directory of Open Access Journals (Sweden)

    Mohammad Mahfuz Ali Khan Shawan

    2016-12-01

    Full Text Available Nucleotide (DNA sequence analysis provides important clues regarding the characteristics and taxonomic position of an organism. With the intention that, DNA sequence analysis is very crucial to learn about hierarchical classification of that particular organism. This dataset (patent US 7547531 is chosen to simplify all the complex raw data buried in undisclosed DNA sequences which help to open doors for new collaborations. In this data, a total of 48 unidentified DNA sequences from patent US 7547531 were selected and their complete sequences were retrieved from NCBI BioSample database. Quick response (QR code of those DNA sequences was constructed by DNA BarID tool. QR code is useful for the identification and comparison of isolates with other organisms. AT/GC content of the DNA sequences was determined using ENDMEMO GC Content Calculator, which indicates their stability at different temperature. The highest GC content was observed in GP445188 (62.5% which was followed by GP445198 (61.8% and GP445189 (59.44%, while lowest was in GP445178 (24.39%. In addition, New England BioLabs (NEB database was used to identify cleavage code indicating the 5, 3 and blunt end and enzyme code indicating the methylation site of the DNA sequences was also shown. These data will be helpful for the construction of the organisms’ hierarchical classification, determination of their phylogenetic and taxonomic position and revelation of their molecular characteristics.

  20. Handling limited datasets with neural networks in medical applications: A small-data approach.

    Science.gov (United States)

    Shaikhina, Torgyn; Khovanova, Natalia A

    2017-01-01

    Single-centre studies in medical domain are often characterised by limited samples due to the complexity and high costs of patient data collection. Machine learning methods for regression modelling of small datasets (less than 10 observations per predictor variable) remain scarce. Our work bridges this gap by developing a novel framework for application of artificial neural networks (NNs) for regression tasks involving small medical datasets. In order to address the sporadic fluctuations and validation issues that appear in regression NNs trained on small datasets, the method of multiple runs and surrogate data analysis were proposed in this work. The approach was compared to the state-of-the-art ensemble NNs; the effect of dataset size on NN performance was also investigated. The proposed framework was applied for the prediction of compressive strength (CS) of femoral trabecular bone in patients suffering from severe osteoarthritis. The NN model was able to estimate the CS of osteoarthritic trabecular bone from its structural and biological properties with a standard error of 0.85MPa. When evaluated on independent test samples, the NN achieved accuracy of 98.3%, outperforming an ensemble NN model by 11%. We reproduce this result on CS data of another porous solid (concrete) and demonstrate that the proposed framework allows for an NN modelled with as few as 56 samples to generalise on 300 independent test samples with 86.5% accuracy, which is comparable to the performance of an NN developed with 18 times larger dataset (1030 samples). The significance of this work is two-fold: the practical application allows for non-destructive prediction of bone fracture risk, while the novel methodology extends beyond the task considered in this study and provides a general framework for application of regression NNs to medical problems characterised by limited dataset sizes. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  1. The mitochondrial genomes of Atlas Geckos (Quedenfeldtia): mitogenome assembly from transcriptomes and anchored hybrid enrichment datasets

    OpenAIRE

    Lyra, Mariana L.; Joger, Ulrich; Schulte, Ulrich; Slimani, Tahar; El Mouden, El Hassan; Bouazza, Abdellah; Künzel, Sven; Lemmon, Alan R.; Moriarty Lemmon, Emily; Vences, Miguel

    2017-01-01

    The nearly complete mitogenomes of the two species of North African Atlas geckos, Quedenfeldtia moerens and Q. trachyblepharus were assembled from anchored hybrid enrichment data and RNAseq data. Congruent assemblies were obtained for four samples included in both datasets. We recovered the 13 protein-coding genes, 22 tRNA genes, and two rRNA genes for both species, including partial control region. The order of genes agrees with that of other geckos.

  2. Forest restoration: a global dataset for biodiversity and vegetation structure.

    Science.gov (United States)

    Crouzeilles, Renato; Ferreira, Mariana S; Curran, Michael

    2016-08-01

    Restoration initiatives are becoming increasingly applied around the world. Billions of dollars have been spent on ecological restoration research and initiatives, but restoration outcomes differ widely among these initiatives in part due to variable socioeconomic and ecological contexts. Here, we present the most comprehensive dataset gathered to date on forest restoration. It encompasses 269 primary studies across 221 study landscapes in 53 countries and contains 4,645 quantitative comparisons between reference ecosystems (e.g., old-growth forest) and degraded or restored ecosystems for five taxonomic groups (mammals, birds, invertebrates, herpetofauna, and plants) and five measures of vegetation structure reflecting different ecological processes (cover, density, height, biomass, and litter). We selected studies that (1) were conducted in forest ecosystems; (2) had multiple replicate sampling sites to measure indicators of biodiversity and/or vegetation structure in reference and restored and/or degraded ecosystems; and (3) used less-disturbed forests as a reference to the ecosystem under study. We recorded (1) latitude and longitude; (2) study year; (3) country; (4) biogeographic realm; (5) past disturbance type; (6) current disturbance type; (7) forest conversion class; (8) restoration activity; (9) time that a system has been disturbed; (10) time elapsed since restoration started; (11) ecological metric used to assess biodiversity; and (12) quantitative value of the ecological metric of biodiversity and/or vegetation structure for reference and restored and/or degraded ecosystems. These were the most common data available in the selected studies. We also estimated forest cover and configuration in each study landscape using a recently developed 1 km consensus land cover dataset. We measured forest configuration as the (1) mean size of all forest patches; (2) size of the largest forest patch; and (3) edge:area ratio of forest patches. Global analyses of the

  3. Automatic plankton image classification combining multiple view features via multiple kernel learning.

    Science.gov (United States)

    Zheng, Haiyong; Wang, Ruchen; Yu, Zhibin; Wang, Nan; Gu, Zhaorui; Zheng, Bing

    2017-12-28

    Plankton, including phytoplankton and zooplankton, are the main source of food for organisms in the ocean and form the base of marine food chain. As the fundamental components of marine ecosystems, plankton is very sensitive to environment changes, and the study of plankton abundance and distribution is crucial, in order to understand environment changes and protect marine ecosystems. This study was carried out to develop an extensive applicable plankton classification system with high accuracy for the increasing number of various imaging devices. Literature shows that most plankton image classification systems were limited to only one specific imaging device and a relatively narrow taxonomic scope. The real practical system for automatic plankton classification is even non-existent and this study is partly to fill this gap. Inspired by the analysis of literature and development of technology, we focused on the requirements of practical application and proposed an automatic system for plankton image classification combining multiple view features via multiple kernel learning (MKL). For one thing, in order to describe the biomorphic characteristics of plankton more completely and comprehensively, we combined general features with robust features, especially by adding features like Inner-Distance Shape Context for morphological representation. For another, we divided all the features into different types from multiple views and feed them to multiple classifiers instead of only one by combining different kernel matrices computed from different types of features optimally via multiple kernel learning. Moreover, we also applied feature selection method to choose the optimal feature subsets from redundant features for satisfying different datasets from different imaging devices. We implemented our proposed classification system on three different datasets across more than 20 categories from phytoplankton to zooplankton. The experimental results validated that our system

  4. Complete pancreas traumatic transsection

    Directory of Open Access Journals (Sweden)

    H. Hodžić

    2005-02-01

    Full Text Available This report presents a case of a twenty-year old male with complete pancreas breakdown in the middle of its corpus, which was caused by a strong abdomen compression, with injuries of the spleen, the firstjejunumcurve,mesocolon transversum, left kidney, and appereance of retroperitoneal haemathoma. Surgical treatment started 70 minutes after the injury. The treatment consisted of left pancreatectomy with previous spleenectomy, haemostasis of ruptured mesocolon transversum blood vessels, left kidney exploration, suturing of the firstjejunumcurvelession and double abdomen drainage. Posttraumatic pancreatitis which appeared on the second postoperative day and prolonged drain secretion were successfully solved by conservative treatment.

  5. Complete rerouting protection

    DEFF Research Database (Denmark)

    Stidsen, Thomas K.; Kjærulff, Peter

    2006-01-01

    Protection of communication against network failures is becoming increasingly important and in this paper we present the most capacity efficient protection method possible, the complete rerouting protection method, when requiring that all communication should be restored in case of a single link...... network failure. We present a linear programming model of the protection method and a column generation algorithm. For 6 real world networks, the minimal restoration overbuild network capacity is between 13% and 78%. We further study the importance of the density of the network, derive analytical bounds...

  6. Completion of treatment planning

    International Nuclear Information System (INIS)

    Lief, Eugene

    2008-01-01

    The outline of the lecture included the following topics: entering prescription; plan printout; print and transfer DDR; segment BEV; export to R and V; physician approval; and second check. Considerable attention, analysis and discussion. The summary is as follows: Treatment planning completion is a very responsible process which requires maximum attention; Should be independently checked by the planner, physicist, radiation oncologist and a therapist; Should not be done in a last minute rush; Proper communication between team members; Properly set procedure should prevent propagation of an error by one individual to the treatment: the error should be caught by somebody else. (P.A.)

  7. TestComplete cookbook

    CERN Document Server

    Alpaev, Gennadiy

    2013-01-01

    A practical cookbook, with a perfect package of simple, medium, and advanced recipes targeted at basic programmers as well as expert software testers, who will learn to create, manage, and run automated tests. It is packed with problem-solving recipes that are supported by simple examples.If you are a software tester or a programmer who is involved with testing automation using TestComplete, this book is ideal for you! You will be introduced to the very basics of using the tool, as well as polish any previously gained knowledge in using the tool. If you are already aware of programming basics,

  8. Multiple inflation

    International Nuclear Information System (INIS)

    Murphy, P.J.

    1987-01-01

    The Theory of Inflation, namely, that at some point the entropy content of the universe was greatly increased, has much promise. It may solve the puzzles of homogeneity and the creation of structure. However, no particle physics model has yet been found that can successfully drive inflation. The difficulty in satisfying the constraint that the isotropy of the microwave background places on the effective potential of prospective models is immense. In this work we have codified the requirements of such models in a most general form. We have carefully calculated the amounts of inflation the various problems of the Standard Model need for their solution. We have derived a completely model independent upper bond on the inflationary Hubble parameter. We have developed a general notation with which to probe the possibilities of Multiple Inflation. We have shown that only in very unlikely circumstances will any evidence of an earlier inflation, survive the de Sitter period of its successor. In particular, it is demonstrated that it is most unlikely that two bouts of inflation will yield high amplitudes of density perturbations on small scales and low amplitudes on large. We conclude that, while multiple inflation will be of great theoretical interest, it is unlikely to have any observational impact

  9. Complete atrioventricular canal.

    Science.gov (United States)

    Calabrò, Raffaele; Limongelli, Giuseppe

    2006-04-05

    Complete atrioventricular canal (CAVC), also referred to as complete atrioventricular septal defect, is characterised by an ostium primum atrial septal defect, a common atrioventricular valve and a variable deficiency of the ventricular septum inflow. CAVC is an uncommon congenital heart disease, accounting for about 3% of cardiac malformations. Atrioventricular canal occurs in two out of every 10,000 live births. Both sexes are equally affected and a striking association with Down syndrome was found. Depending on the morphology of the superior leaflet of the common atrioventricular valve, 3 types of CAVC have been delineated (type A, B and C, according to Rastelli's classification). CAVC results in a significant interatrial and interventricular systemic-to-pulmonary shunt, thus inducing right ventricular pressure and volume overload and pulmonary hypertension. It becomes symptomatic in infancy due to congestive heart failure and failure to thrive. Diagnosis of CAVC might be suspected from electrocardiographic and chest X-ray findings. Echocardiography confirms it and gives anatomical details. Over time, pulmonary hypertension becomes irreversible, thus precluding the surgical therapy. This is the reason why cardiac catheterisation is not mandatory in infants (less than 6 months) but is indicated in older patients if irreversible pulmonary hypertension is suspected. Medical treatment (digitalis, diuretics, vasodilators) plays a role only as a bridge toward surgery, usually performed between the 3rd and 6th month of life.

  10. Complete atrioventricular canal

    Directory of Open Access Journals (Sweden)

    Limongelli Giuseppe

    2006-04-01

    Full Text Available Abstract Complete atrioventricular canal (CAVC, also referred to as complete atrioventricular septal defect, is characterised by an ostium primum atrial septal defect, a common atrioventricular valve and a variable deficiency of the ventricular septum inflow. CAVC is an uncommon congenital heart disease, accounting for about 3% of cardiac malformations. Atrioventricular canal occurs in two out of every 10,000 live births. Both sexes are equally affected and a striking association with Down syndrome was found. Depending on the morphology of the superior leaflet of the common atrioventricular valve, 3 types of CAVC have been delineated (type A, B and C, according to Rastelli's classification. CAVC results in a significant interatrial and interventricular systemic-to-pulmonary shunt, thus inducing right ventricular pressure and volume overload and pulmonary hypertension. It becomes symptomatic in infancy due to congestive heart failure and failure to thrive. Diagnosis of CAVC might be suspected from electrocardiographic and chest X-ray findings. Echocardiography confirms it and gives anatomical details. Over time, pulmonary hypertension becomes irreversible, thus precluding the surgical therapy. This is the reason why cardiac catheterisation is not mandatory in infants (less than 6 months but is indicated in older patients if irreversible pulmonary hypertension is suspected. Medical treatment (digitalis, diuretics, vasodilators plays a role only as a bridge toward surgery, usually performed between the 3rd and 6th month of life.

  11. Barnett shale completions

    Energy Technology Data Exchange (ETDEWEB)

    Schein, G. [BJ Services, Dallas, TX (United States)

    2006-07-01

    Fractured shales yield oil and gas in various basins across the United States. A map indicating these fractured shale source-reservoir systems in the United States was presented along with the numerous similarities and differences that exist among these systems. Hydrocarbons in the organic rich black shale come from the bacterial decomposition of organic matter, primary thermogenic decomposition of organic matter or secondary thermogenic cracking of oil. The shale may be the reservoir or other horizons may be the primary or secondary reservoir. The reservoir has induced micro fractures or tectonic fractures. This paper described the well completions in the Barnett Shale in north Texas with reference to major players, reservoir properties, mineralogy, fluid sensitivity, previous treatments, design criteria and production examples. The Barnett Shale is an organic, black shale with thickness ranging from 100 to 1000 feet. The total organic carbon (TOC) averages 4.5 per cent. The unit has undergone high rate frac treatments. A review of the vertical wells in the Barnett Shale was presented along with the fracture treatment schedule and technology changes. A discussion of refracturing opportunities and proppant settling and transport revealed that additional proppant increases fluid recovery and enhances production. Compatible scale inhibitors and biocides can be beneficial. Horizontal completions in the Barnett Shale have shown better results than vertical wells, as demonstrated in a production comparison of 3 major horizontal wells in the basin. tabs., figs.

  12. An Analysis of the GTZAN Music Genre Dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2012-01-01

    Most research in automatic music genre recognition has used the dataset assembled by Tzanetakis et al. in 2001. The composition and integrity of this dataset, however, has never been formally analyzed. For the first time, we provide an analysis of its composition, and create a machine...

  13. Really big data: Processing and analysis of large datasets

    Science.gov (United States)

    Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...

  14. An Annotated Dataset of 14 Cardiac MR Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated cardiac MR images. Points of correspondence are placed on each image at the left ventricle (LV). As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  15. A New Outlier Detection Method for Multidimensional Datasets

    KAUST Repository

    Abdel Messih, Mario A.

    2012-07-01

    This study develops a novel hybrid method for outlier detection (HMOD) that combines the idea of distance based and density based methods. The proposed method has two main advantages over most of the other outlier detection methods. The first advantage is that it works well on both dense and sparse datasets. The second advantage is that, unlike most other outlier detection methods that require careful parameter setting and prior knowledge of the data, HMOD is not very sensitive to small changes in parameter values within certain parameter ranges. The only required parameter to set is the number of nearest neighbors. In addition, we made a fully parallelized implementation of HMOD that made it very efficient in applications. Moreover, we proposed a new way of using the outlier detection for redundancy reduction in datasets where the confidence level that evaluates how accurate the less redundant dataset can be used to represent the original dataset can be specified by users. HMOD is evaluated on synthetic datasets (dense and mixed “dense and sparse”) and a bioinformatics problem of redundancy reduction of dataset of position weight matrices (PWMs) of transcription factor binding sites. In addition, in the process of assessing the performance of our redundancy reduction method, we developed a simple tool that can be used to evaluate the confidence level of reduced dataset representing the original dataset. The evaluation of the results shows that our method can be used in a wide range of problems.

  16. SAMNet: a network-based approach to integrate multi-dimensional high throughput datasets.

    Science.gov (United States)

    Gosline, Sara J C; Spencer, Sarah J; Ursu, Oana; Fraenkel, Ernest

    2012-11-01

    The rapid development of high throughput biotechnologies has led to an onslaught of data describing genetic perturbations and changes in mRNA and protein levels in the cell. Because each assay provides a one-dimensional snapshot of active signaling pathways, it has become desirable to perform multiple assays (e.g. mRNA expression and phospho-proteomics) to measure a single condition. However, as experiments expand to accommodate various cellular conditions, proper analysis and interpretation of these data have become more challenging. Here we introduce a novel approach called SAMNet, for Simultaneous Analysis of Multiple Networks, that is able to interpret diverse assays over multiple perturbations. The algorithm uses a constrained optimization approach to integrate mRNA expression data with upstream genes, selecting edges in the protein-protein interaction network that best explain the changes across all perturbations. The result is a putative set of protein interactions that succinctly summarizes the results from all experiments, highlighting the network elements unique to each perturbation. We evaluated SAMNet in both yeast and human datasets. The yeast dataset measured the cellular response to seven different transition metals, and the human dataset measured cellular changes in four different lung cancer models of Epithelial-Mesenchymal Transition (EMT), a crucial process in tumor metastasis. SAMNet was able to identify canonical yeast metal-processing genes unique to each commodity in the yeast dataset, as well as human genes such as β-catenin and TCF7L2/TCF4 that are required for EMT signaling but escaped detection in the mRNA and phospho-proteomic data. Moreover, SAMNet also highlighted drugs likely to modulate EMT, identifying a series of less canonical genes known to be affected by the BCR-ABL inhibitor imatinib (Gleevec), suggesting a possible influence of this drug on EMT.

  17. Public Inquiry Data - Report on Incoming, Pending, and Completed Inquiries - FY 2015 Onward

    Data.gov (United States)

    Social Security Administration — This dataset provides data on the number of new incoming, pending, and completed inquiries by quarter. The data source is the Electronic Management of Assignments...

  18. Privacy preserving data anonymization of spontaneous ADE reporting system dataset.

    Science.gov (United States)

    Lin, Wen-Yang; Yang, Duen-Chuan; Wang, Jie-Teng

    2016-07-18

    To facilitate long-term safety surveillance of marketing drugs, many spontaneously reporting systems (SRSs) of ADR events have been established world-wide. Since the data collected by SRSs contain sensitive personal health information that should be protected to prevent the identification of individuals, it procures the issue of privacy preserving data publishing (PPDP), that is, how to sanitize (anonymize) raw data before publishing. Although much work has been done on PPDP, very few studies have focused on protecting privacy of SRS data and none of the anonymization methods is favorable for SRS datasets, due to which contain some characteristics such as rare events, multiple individual records, and multi-valued sensitive attributes. We propose a new privacy model called MS(k, θ (*) )-bounding for protecting published spontaneous ADE reporting data from privacy attacks. Our model has the flexibility of varying privacy thresholds, i.e., θ (*) , for different sensitive values and takes the characteristics of SRS data into consideration. We also propose an anonymization algorithm for sanitizing the raw data to meet the requirements specified through the proposed model. Our algorithm adopts a greedy-based clustering strategy to group the records into clusters, conforming to an innovative anonymization metric aiming to minimize the privacy risk as well as maintain the data utility for ADR detection. Empirical study was conducted using FAERS dataset from 2004Q1 to 2011Q4. We compared our model with four prevailing methods, including k-anonymity, (X, Y)-anonymity, Multi-sensitive l-diversity, and (α, k)-anonymity, evaluated via two measures, Danger Ratio (DR) and Information Loss (IL), and considered three different scenarios of threshold setting for θ (*) , including uniform setting, level-wise setting and frequency-based setting. We also conducted experiments to inspect the impact of anonymized data on the strengths of discovered ADR signals. With all three

  19. 2001 - 2010 Danish design reference year. Reference climate dataset for technical dimensioning in building, construction and other sectors

    Energy Technology Data Exchange (ETDEWEB)

    Grunnet Wang, P.; Scharling, M.; Pagh Nielsen, K.; Kern-Hansen, C. [Danish Meteorological Institute (DMI), Copenhagen (Denmark); Wittchen, K.B. [Aalborg Univ., Danish Building Research Institute (SBi), Copenhagen (Denmark)

    2013-09-15

    This report presents the Danish Design Reference Year based on observed data from 2001 - 2010. In various sectors - i.e. building and construction, energy, etc. - the climate and weather usually plays a part in a given project. The Danish Design Reference Year dataset is a collection of data series for eleven specific parameters, that each represents a typical year in Denmark. The uses of the dataset may vary from simulations to statistical analysis, graphical overviews etc. The Danish land areas have been sectionalised into five to six climatological zones depending on the parameter, each characterized by distinct diurnal and yearly variations. The dataset consists of observed data from one station located within and representing each zone. In addition to the complete Danish Design Reference Year dataset, a subset specifically selected to be used for energy performance calculations for obtaining a building permit is included. (Author)

  20. ATLAS File and Dataset Metadata Collection and Use

    CERN Document Server

    Albrand, S; The ATLAS collaboration; Lambert, F; Gallas, E J

    2012-01-01

    The ATLAS Metadata Interface (“AMI”) was designed as a generic cataloguing system, and as such it has found many uses in the experiment including software release management, tracking of reconstructed event sizes and control of dataset nomenclature. The primary use of AMI is to provide a catalogue of datasets (file collections) which is searchable using physics criteria. In this paper we discuss the various mechanisms used for filling the AMI dataset and file catalogues. By correlating information from different sources we can derive aggregate information which is important for physics analysis; for example the total number of events contained in dataset, and possible reasons for missing events such as a lost file. Finally we will describe some specialized interfaces which were developed for the Data Preparation and reprocessing coordinators. These interfaces manipulate information from both the dataset domain held in AMI, and the run-indexed information held in the ATLAS COMA application (Conditions and ...

  1. A dataset on tail risk of commodities markets.

    Science.gov (United States)

    Powell, Robert J; Vo, Duc H; Pham, Thach N; Singh, Abhay K

    2017-12-01

    This article contains the datasets related to the research article "The long and short of commodity tails and their relationship to Asian equity markets"(Powell et al., 2017) [1]. The datasets contain the daily prices (and price movements) of 24 different commodities decomposed from the S&P GSCI index and the daily prices (and price movements) of three share market indices including World, Asia, and South East Asia for the period 2004-2015. Then, the dataset is divided into annual periods, showing the worst 5% of price movements for each year. The datasets are convenient to examine the tail risk of different commodities as measured by Conditional Value at Risk (CVaR) as well as their changes over periods. The datasets can also be used to investigate the association between commodity markets and share markets.

  2. Merged SAGE II, Ozone_cci and OMPS ozone profile dataset and evaluation of ozone trends in the stratosphere

    Directory of Open Access Journals (Sweden)

    V. F. Sofieva

    2017-10-01

    Full Text Available In this paper, we present a merged dataset of ozone profiles from several satellite instruments: SAGE II on ERBS, GOMOS, SCIAMACHY and MIPAS on Envisat, OSIRIS on Odin, ACE-FTS on SCISAT, and OMPS on Suomi-NPP. The merged dataset is created in the framework of the European Space Agency Climate Change Initiative (Ozone_cci with the aim of analyzing stratospheric ozone trends. For the merged dataset, we used the latest versions of the original ozone datasets. The datasets from the individual instruments have been extensively validated and intercompared; only those datasets which are in good agreement, and do not exhibit significant drifts with respect to collocated ground-based observations and with respect to each other, are used for merging. The long-term SAGE–CCI–OMPS dataset is created by computation and merging of deseasonalized anomalies from individual instruments. The merged SAGE–CCI–OMPS dataset consists of deseasonalized anomalies of ozone in 10° latitude bands from 90° S to 90° N and from 10 to 50 km in steps of 1 km covering the period from October 1984 to July 2016. This newly created dataset is used for evaluating ozone trends in the stratosphere through multiple linear regression. Negative ozone trends in the upper stratosphere are observed before 1997 and positive trends are found after 1997. The upper stratospheric trends are statistically significant at midlatitudes and indicate ozone recovery, as expected from the decrease of stratospheric halogens that started in the middle of the 1990s and stratospheric cooling.

  3. GOGOL: ACADEMIC AND COMPLETE

    Directory of Open Access Journals (Sweden)

    Yuri V. Mann

    2016-12-01

    Full Text Available The ever-increasing international interest to Gogol explains the necessity of publishing a new edition of his works. The present Complete Collection of Gogol’s Works and Letters is an academic edition prepared and published by the A. M. Gorky Institute of World Literature of the Russian Academy of Sciences. It draws on rich experience of studying and publishing Gogol’s heritage in Russia but at the same time questions and underscores Gogol’s relevance for the modern reader and his place in the world culture of our time. It intends to fill in the gaps left by the previous scholarly tradition that failed to recognize some of Gogol’s texts as part of his heritage. Such are, for example, dedicatory descriptions in books and business notes. The present edition accounts not only for the completeness of texts but also for their place within the body of Gogol’s work, as part of his life-long creative process. By counterpoising different editions, it attempts to trace down the dynamics of Gogol’s creative thought while at the same time underscores the autonomy and relevance of each period in his career. For example, this collection publishes two different versions (editions of the same work: while the most recent version has become canonical at the expense of the preceding one, the latter still preserves its meaning and historical relevance. The present edition has the advantage over its predecessors since it has an actual, physical opportunity to erase the gaps, e.g. to publish the hitherto unpublished texts. However, the editors realize that new, hitherto unknown gaps may appear and the present edition will become, in its turn, outdated. At this point, there will be a necessity in the new edition.

  4. Towards a climatology of tropical cyclone morphometric structures using a newly standardized passive microwave satellite dataset

    Science.gov (United States)

    Cossuth, J.; Hart, R. E.

    2013-12-01

    storm's rainband and eyewall organization. Ultimately, this project develops a consistent climatology of TC structures using a new database of research-quality historical TC satellite microwave observations. Not only can such data sets more accurately study TC structural evolution, but they may facilitate automated TC intensity estimates and provide methods to enhance current operational and research products, such as at the NRL TC webpage (http://www.nrlmry.navy.mil/TC.html). The process of developing the dataset and possible objective definitions of TC structures using passive microwave imagery will be described, with preliminary results suggesting new methods to identify TC structures that may interrogate and expand upon physical and dynamical theories. Structural metrics such as threshold analysis of the outlines of the TC shape as well as methods to diagnose the inner-core size, completion, and magnitude will be introduced.

  5. Evaluating SPARQL queries on massive RDF datasets

    KAUST Repository

    Al-Harbi, Razen

    2015-08-01

    Distributed RDF systems partition data across multiple computer nodes. Partitioning is typically based on heuristics that minimize inter-node communication and it is performed in an initial, data pre-processing phase. Therefore, the resulting partitions are static and do not adapt to changes in the query workload; as a result, existing systems are unable to consistently avoid communication for queries that are not favored by the initial data partitioning. Furthermore, for very large RDF knowledge bases, the partitioning phase becomes prohibitively expensive, leading to high startup costs. In this paper, we propose AdHash, a distributed RDF system which addresses the shortcomings of previous work. First, AdHash initially applies lightweight hash partitioning, which drastically minimizes the startup cost, while favoring the parallel processing of join patterns on subjects, without any data communication. Using a locality-aware planner, queries that cannot be processed in parallel are evaluated with minimal communication. Second, AdHash monitors the data access patterns and adapts dynamically to the query load by incrementally redistributing and replicating frequently accessed data. As a result, the communication cost for future queries is drastically reduced or even eliminated. Our experiments with synthetic and real data verify that AdHash (i) starts faster than all existing systems, (ii) processes thousands of queries before other systems become online, and (iii) gracefully adapts to the query load, being able to evaluate queries on billion-scale RDF data in sub-seconds. In this demonstration, audience can use a graphical interface of AdHash to verify its performance superiority compared to state-of-the-art distributed RDF systems.

  6. Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets

    Directory of Open Access Journals (Sweden)

    Min-Wei Huang

    2018-01-01

    Full Text Available Many real-world medical datasets contain some proportion of missing (attribute values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.

  7. Technical note: Space-time analysis of rainfall extremes in Italy: clues from a reconciled dataset

    Science.gov (United States)

    Libertino, Andrea; Ganora, Daniele; Claps, Pierluigi

    2018-05-01

    Like other Mediterranean areas, Italy is prone to the development of events with significant rainfall intensity, lasting for several hours. The main triggering mechanisms of these events are quite well known, but the aim of developing rainstorm hazard maps compatible with their actual probability of occurrence is still far from being reached. A systematic frequency analysis of these occasional highly intense events would require a complete countrywide dataset of sub-daily rainfall records, but this kind of information was still lacking for the Italian territory. In this work several sources of data are gathered, for assembling the first comprehensive and updated dataset of extreme rainfall of short duration in Italy. The resulting dataset, referred to as the Italian Rainfall Extreme Dataset (I-RED), includes the annual maximum rainfalls recorded in 1 to 24 consecutive hours from more than 4500 stations across the country, spanning the period between 1916 and 2014. A detailed description of the spatial and temporal coverage of the I-RED is presented, together with an exploratory statistical analysis aimed at providing preliminary information on the climatology of extreme rainfall at the national scale. Due to some legal restrictions, the database can be provided only under certain conditions. Taking into account the potentialities emerging from the analysis, a description of the ongoing and planned future work activities on the database is provided.

  8. An enhanced topologically significant directed random walk in cancer classification using gene expression datasets

    Directory of Open Access Journals (Sweden)

    Choon Sen Seah

    2017-12-01

    Full Text Available Microarray technology has become one of the elementary tools for researchers to study the genome of organisms. As the complexity and heterogeneity of cancer is being increasingly appreciated through genomic analysis, cancerous classification is an emerging important trend. Significant directed random walk is proposed as one of the cancerous classification approach which have higher sensitivity of risk gene prediction and higher accuracy of cancer classification. In this paper, the methodology and material used for the experiment are presented. Tuning parameter selection method and weight as parameter are applied in proposed approach. Gene expression dataset is used as the input datasets while pathway dataset is used to build a directed graph, as reference datasets, to complete the bias process in random walk approach. In addition, we demonstrate that our approach can improve sensitive predictions with higher accuracy and biological meaningful classification result. Comparison result takes place between significant directed random walk and directed random walk to show the improvement in term of sensitivity of prediction and accuracy of cancer classification.

  9. SPICE: exploration and analysis of post-cytometric complex multivariate datasets.

    Science.gov (United States)

    Roederer, Mario; Nozzi, Joshua L; Nason, Martha C

    2011-02-01

    Polychromatic flow cytometry results in complex, multivariate datasets. To date, tools for the aggregate analysis of these datasets across multiple specimens grouped by different categorical variables, such as demographic information, have not been optimized. Often, the exploration of such datasets is accomplished by visualization of patterns with pie charts or bar charts, without easy access to statistical comparisons of measurements that comprise multiple components. Here we report on algorithms and a graphical interface we developed for these purposes. In particular, we discuss thresholding necessary for accurate representation of data in pie charts, the implications for display and comparison of normalized versus unnormalized data, and the effects of averaging when samples with significant background noise are present. Finally, we define a statistic for the nonparametric comparison of complex distributions to test for difference between groups of samples based on multi-component measurements. While originally developed to support the analysis of T cell functional profiles, these techniques are amenable to a broad range of datatypes. Published 2011 Wiley-Liss, Inc.

  10. Discovery and Reuse of Open Datasets: An Exploratory Study

    Directory of Open Access Journals (Sweden)

    Sara

    2016-07-01

    Full Text Available Objective: This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories. Methods: Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric. The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description. Results: Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories. Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers. The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates. Conclusions: The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets. Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

  11. Viability of Controlling Prosthetic Hand Utilizing Electroencephalograph (EEG) Dataset Signal

    Science.gov (United States)

    Miskon, Azizi; A/L Thanakodi, Suresh; Raihan Mazlan, Mohd; Mohd Haziq Azhar, Satria; Nooraya Mohd Tawil, Siti

    2016-11-01

    This project presents the development of an artificial hand controlled by Electroencephalograph (EEG) signal datasets for the prosthetic application. The EEG signal datasets were used as to improvise the way to control the prosthetic hand compared to the Electromyograph (EMG). The EMG has disadvantages to a person, who has not used the muscle for a long time and also to person with degenerative issues due to age factor. Thus, the EEG datasets found to be an alternative for EMG. The datasets used in this work were taken from Brain Computer Interface (BCI) Project. The datasets were already classified for open, close and combined movement operations. It served the purpose as an input to control the prosthetic hand by using an Interface system between Microsoft Visual Studio and Arduino. The obtained results reveal the prosthetic hand to be more efficient and faster in response to the EEG datasets with an additional LiPo (Lithium Polymer) battery attached to the prosthetic. Some limitations were also identified in terms of the hand movements, weight of the prosthetic, and the suggestions to improve were concluded in this paper. Overall, the objective of this paper were achieved when the prosthetic hand found to be feasible in operation utilizing the EEG datasets.

  12. PROVIDING GEOGRAPHIC DATASETS AS LINKED DATA IN SDI

    Directory of Open Access Journals (Sweden)

    E. Hietanen

    2016-06-01

    Full Text Available In this study, a prototype service to provide data from Web Feature Service (WFS as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF data format. Next, a Web Ontology Language (OWL ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID. The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

  13. Homogenised Australian climate datasets used for climate change monitoring

    International Nuclear Information System (INIS)

    Trewin, Blair; Jones, David; Collins; Dean; Jovanovic, Branislava; Braganza, Karl

    2007-01-01

    Full text: The Australian Bureau of Meteorology has developed a number of datasets for use in climate change monitoring. These datasets typically cover 50-200 stations distributed as evenly as possible over the Australian continent, and have been subject to detailed quality control and homogenisation.The time period over which data are available for each element is largely determined by the availability of data in digital form. Whilst nearly all Australian monthly and daily precipitation data have been digitised, a significant quantity of pre-1957 data (for temperature and evaporation) or pre-1987 data (for some other elements) remains to be digitised, and is not currently available for use in the climate change monitoring datasets. In the case of temperature and evaporation, the start date of the datasets is also determined by major changes in instruments or observing practices for which no adjustment is feasible at the present time. The datasets currently available cover: Monthly and daily precipitation (most stations commence 1915 or earlier, with many extending back to the late 19th century, and a few to the mid-19th century); Annual temperature (commences 1910); Daily temperature (commences 1910, with limited station coverage pre-1957); Twice-daily dewpoint/relative humidity (commences 1957); Monthly pan evaporation (commences 1970); Cloud amount (commences 1957) (Jovanovic etal. 2007). As well as the station-based datasets listed above, an additional dataset being developed for use in climate change monitoring (and other applications) covers tropical cyclones in the Australian region. This is described in more detail in Trewin (2007). The datasets already developed are used in analyses of observed climate change, which are available through the Australian Bureau of Meteorology website (http://www.bom.gov.au/silo/products/cli_chg/). They are also used as a basis for routine climate monitoring, and in the datasets used for the development of seasonal

  14. Does standardised structured reporting contribute to quality in diagnostic pathology? The importance of evidence-based datasets.

    Science.gov (United States)

    Ellis, D W; Srigley, J

    2016-01-01

    Key quality parameters in diagnostic pathology include timeliness, accuracy, completeness, conformance with current agreed standards, consistency and clarity in communication. In this review, we argue that with worldwide developments in eHealth and big data, generally, there are two further, often overlooked, parameters if our reports are to be fit for purpose. Firstly, population-level studies have clearly demonstrated the value of providing timely structured reporting data in standardised electronic format as part of system-wide quality improvement programmes. Moreover, when combined with multiple health data sources through eHealth and data linkage, structured pathology reports become central to population-level quality monitoring, benchmarking, interventions and benefit analyses in public health management. Secondly, population-level studies, particularly for benchmarking, require a single agreed international and evidence-based standard to ensure interoperability and comparability. This has been taken for granted in tumour classification and staging for many years, yet international standardisation of cancer datasets is only now underway through the International Collaboration on Cancer Reporting (ICCR). In this review, we present evidence supporting the role of structured pathology reporting in quality improvement for both clinical care and population-level health management. Although this review of available evidence largely relates to structured reporting of cancer, it is clear that the same principles can be applied throughout anatomical pathology generally, as they are elsewhere in the health system.

  15. Tension in the recent Type Ia supernovae datasets

    International Nuclear Information System (INIS)

    Wei, Hao

    2010-01-01

    In the present work, we investigate the tension in the recent Type Ia supernovae (SNIa) datasets Constitution and Union. We show that they are in tension not only with the observations of the cosmic microwave background (CMB) anisotropy and the baryon acoustic oscillations (BAO), but also with other SNIa datasets such as Davis and SNLS. Then, we find the main sources responsible for the tension. Further, we make this more robust by employing the method of random truncation. Based on the results of this work, we suggest two truncated versions of the Union and Constitution datasets, namely the UnionT and ConstitutionT SNIa samples, whose behaviors are more regular.

  16. An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings

    Directory of Open Access Journals (Sweden)

    Hubbard Alan E

    2010-06-01

    Full Text Available Abstract Background As computational power improves, the application of more advanced machine learning techniques to the analysis of large genome-wide association (GWA datasets becomes possible. While most traditional statistical methods can only elucidate main effects of genetic variants on risk for disease, certain machine learning approaches are particularly suited to discover higher order and non-linear effects. One such approach is the Random Forests (RF algorithm. The use of RF for SNP discovery related to human disease has grown in recent years; however, most work has focused on small datasets or simulation studies which are limited. Results Using a multiple sclerosis (MS case-control dataset comprised of 300 K SNP genotypes across the genome, we outline an approach and some considerations for optimally tuning the RF algorithm based on the empirical dataset. Importantly, results show that typical default parameter values are not appropriate for large GWA datasets. Furthermore, gains can be made by sub-sampling the data, pruning based on linkage disequilibrium (LD, and removing strong effects from RF analyses. The new RF results are compared to findings from the original MS GWA study and demonstrate overlap. In addition, four new interesting candidate MS genes are identified, MPHOSPH9, CTNNA3, PHACTR2 and IL7, by RF analysis and warrant further follow-up in independent studies. Conclusions This study presents one of the first illustrations of successfully analyzing GWA data with a machine learning algorithm. It is shown that RF is computationally feasible for GWA data and the results obtained make biologic sense based on previous studies. More importantly, new genes were identified as potentially being associated with MS, suggesting new avenues of investigation for this complex disease.

  17. A comprehensive dataset of genes with a loss-of-function mutant phenotype in Arabidopsis.

    Science.gov (United States)

    Lloyd, Johnny; Meinke, David

    2012-03-01

    Despite the widespread use of Arabidopsis (Arabidopsis thaliana) as a model plant, a curated dataset of Arabidopsis genes with mutant phenotypes remains to be established. A preliminary list published nine years ago in Plant Physiology is outdated, and genome-wide phenotype information remains difficult to obtain. We describe here a comprehensive dataset of 2,400 genes with a loss-of-function mutant phenotype in Arabidopsis. Phenotype descriptions were gathered primarily from manual curation of the scientific literature. Genes were placed into prioritized groups (essential, morphological, cellular-biochemical, and conditional) based on the documented phenotypes of putative knockout alleles. Phenotype classes (e.g. vegetative, reproductive, and timing, for the morphological group) and subsets (e.g. flowering time, senescence, circadian rhythms, and miscellaneous, for the timing class) were also established. Gene identities were classified as confirmed (through molecular complementation or multiple alleles) or not confirmed. Relationships between mutant phenotype and protein function, genetic redundancy, protein connectivity, and subcellular protein localization were explored. A complementary dataset of 401 genes that exhibit a mutant phenotype only when disrupted in combination with a putative paralog was also compiled. The importance of these genes in confirming functional redundancy and enhancing the value of single gene datasets is discussed. With further input and curation from the Arabidopsis community, these datasets should help to address a variety of important biological questions, provide a foundation for exploring the relationship between genotype and phenotype in angiosperms, enhance the utility of Arabidopsis as a reference plant, and facilitate comparative studies with model genetic organisms.

  18. A framework for automatic creation of gold-standard rigid 3D-2D registration datasets.

    Science.gov (United States)

    Madan, Hennadii; Pernuš, Franjo; Likar, Boštjan; Špiclin, Žiga

    2017-02-01

    Advanced image-guided medical procedures incorporate 2D intra-interventional information into pre-interventional 3D image and plan of the procedure through 3D/2D image registration (32R). To enter clinical use, and even for publication purposes, novel and existing 32R methods have to be rigorously validated. The performance of a 32R method can be estimated by comparing it to an accurate reference or gold standard method (usually based on fiducial markers) on the same set of images (gold standard dataset). Objective validation and comparison of methods are possible only if evaluation methodology is standardized, and the gold standard  dataset is made publicly available. Currently, very few such datasets exist and only one contains images of multiple patients acquired during a procedure. To encourage the creation of gold standard 32R datasets, we propose an automatic framework. The framework is based on rigid registration of fiducial markers. The main novelty is spatial grouping of fiducial markers on the carrier device, which enables automatic marker localization and identification across the 3D and 2D images. The proposed framework was demonstrated on clinical angiograms of 20 patients. Rigid 32R computed by the framework was more accurate than that obtained manually, with the respective target registration error below 0.027 mm compared to 0.040 mm. The framework is applicable for gold standard setup on any rigid anatomy, provided that the acquired images contain spatially grouped fiducial markers. The gold standard datasets and software will be made publicly available.

  19. Software ion scan functions in analysis of glycomic and lipidomic MS/MS datasets.

    Science.gov (United States)

    Haramija, Marko

    2018-03-01

    Hardware ion scan functions unique to tandem mass spectrometry (MS/MS) mode of data acquisition, such as precursor ion scan (PIS) and neutral loss scan (NLS), are important for selective extraction of key structural data from complex MS/MS spectra. However, their software counterparts, software ion scan (SIS) functions, are still not regularly available. Software ion scan functions can be easily coded for additional functionalities, such as software multiple precursor ion scan, software no ion scan, and software variable ion scan functions. These are often necessary, since they allow more efficient analysis of complex MS/MS datasets, often encountered in glycomics and lipidomics. Software ion scan functions can be easily coded by using modern script languages and can be independent of instrument manufacturer. Here we demonstrate the utility of SIS functions on a medium-size glycomic MS/MS dataset. Knowledge of sample properties, as well as of diagnostic and conditional diagnostic ions crucial for data analysis, was needed. Based on the tables constructed with the output data from the SIS functions performed, a detailed analysis of a complex MS/MS glycomic dataset could be carried out in a quick, accurate, and efficient manner. Glycomic research is progressing slowly, and with respect to the MS experiments, one of the key obstacles for moving forward is the lack of appropriate bioinformatic tools necessary for fast analysis of glycomic MS/MS datasets. Adding novel SIS functionalities to the glycomic MS/MS toolbox has a potential to significantly speed up the glycomic data analysis process. Similar tools are useful for analysis of lipidomic MS/MS datasets as well, as will be discussed briefly. Copyright © 2017 John Wiley & Sons, Ltd.

  20. Dataset definition for CMS operations and physics analyses

    Science.gov (United States)

    Franzoni, Giovanni; Compact Muon Solenoid Collaboration

    2016-04-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets and secondary datasets/dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concepts of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the LHC run I, and we discuss the plans for the run II.

  1. U.S. Climate Divisional Dataset (Version Superseded)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This data has been superseded by a newer version of the dataset. Please refer to NOAA's Climate Divisional Database for more information. The U.S. Climate Divisional...

  2. Karna Particle Size Dataset for Tables and Figures

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset contains 1) table of bulk Pb-XAS LCF results, 2) table of bulk As-XAS LCF results, 3) figure data of particle size distribution, and 4) figure data for...

  3. NOAA Global Surface Temperature Dataset, Version 4.0

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The NOAA Global Surface Temperature Dataset (NOAAGlobalTemp) is derived from two independent analyses: the Extended Reconstructed Sea Surface Temperature (ERSST)...

  4. National Hydrography Dataset (NHD) - USGS National Map Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The USGS National Hydrography Dataset (NHD) Downloadable Data Collection from The National Map (TNM) is a comprehensive set of digital spatial data that encodes...

  5. Watershed Boundary Dataset (WBD) - USGS National Map Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The Watershed Boundary Dataset (WBD) from The National Map (TNM) defines the perimeter of drainage areas formed by the terrain and other landscape characteristics....

  6. BASE MAP DATASET, LE FLORE COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  7. USGS National Hydrography Dataset from The National Map

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — USGS The National Map - National Hydrography Dataset (NHD) is a comprehensive set of digital spatial data that encodes information about naturally occurring and...

  8. A robust dataset-agnostic heart disease classifier from Phonocardiogram.

    Science.gov (United States)

    Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M

    2017-07-01

    Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.

  9. AFSC/REFM: Seabird Necropsy dataset of North Pacific

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The seabird necropsy dataset contains information on seabird specimens that were collected under salvage and scientific collection permits primarily by...

  10. Dataset definition for CMS operations and physics analyses

    CERN Document Server

    AUTHOR|(CDS)2051291

    2016-01-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets, secondary datasets, and dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concept of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the first run, and we discuss the plans for the second LHC run.

  11. USGS National Boundary Dataset (NBD) Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The USGS Governmental Unit Boundaries dataset from The National Map (TNM) represents major civil areas for the Nation, including States or Territories, counties (or...

  12. Environmental Dataset Gateway (EDG) CS-W Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  13. Global Man-made Impervious Surface (GMIS) Dataset From Landsat

    Data.gov (United States)

    National Aeronautics and Space Administration — The Global Man-made Impervious Surface (GMIS) Dataset From Landsat consists of global estimates of fractional impervious cover derived from the Global Land Survey...

  14. A Comparative Analysis of Classification Algorithms on Diverse Datasets

    Directory of Open Access Journals (Sweden)

    M. Alghobiri

    2018-04-01

    Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

  15. Newton SSANTA Dr Water using POU filters dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset contains information about all the features extracted from the raw data files, the formulas that were assigned to some of these features, and the...

  16. Toward computational cumulative biology by combining models of biological datasets.

    Science.gov (United States)

    Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

    2014-01-01

    A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.

  17. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets

    OpenAIRE

    Li, Lianwei; Ma, Zhanshan (Sam)

    2016-01-01

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health?the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples...

  18. General Purpose Multimedia Dataset - GarageBand 2008

    DEFF Research Database (Denmark)

    Meng, Anders

    This document describes a general purpose multimedia data-set to be used in cross-media machine learning problems. In more detail we describe the genre taxonomy applied at http://www.garageband.com, from where the data-set was collected, and how the taxonomy have been fused into a more human...... understandable taxonomy. Finally, a description of various features extracted from both the audio and text are presented....

  19. Artificial intelligence (AI) systems for interpreting complex medical datasets.

    Science.gov (United States)

    Altman, R B

    2017-05-01

    Advances in machine intelligence have created powerful capabilities in algorithms that find hidden patterns in data, classify objects based on their measured characteristics, and associate similar patients/diseases/drugs based on common features. However, artificial intelligence (AI) applications in medical data have several technical challenges: complex and heterogeneous datasets, noisy medical datasets, and explaining their output to users. There are also social challenges related to intellectual property, data provenance, regulatory issues, economics, and liability. © 2017 ASCPT.

  20. Multiple Perspectives / Multiple Readings

    Directory of Open Access Journals (Sweden)

    Simon Biggs

    2005-01-01

    Full Text Available People experience things from their own physical point of view. What they see is usually a function of where they are and what physical attitude they adopt relative to the subject. With augmented vision (periscopes, mirrors, remote cameras, etc we are able to see things from places where we are not present. With time-shifting technologies, such as the video recorder, we can also see things from the past; a time and a place we may never have visited.In recent artistic work I have been exploring the implications of digital technology, interactivity and internet connectivity that allow people to not so much space/time-shift their visual experience of things but rather see what happens when everybody is simultaneously able to see what everybody else can see. This is extrapolated through the remote networking of sites that are actual installation spaces; where the physical movements of viewers in the space generate multiple perspectives, linked to other similar sites at remote locations or to other viewers entering the shared data-space through a web based version of the work.This text explores the processes involved in such a practice and reflects on related questions regarding the non-singularity of being and the sense of self as linked to time and place.

  1. INTEGRATING SMARTPHONE IMAGES AND AIRBORNE LIDAR DATA FOR COMPLETE URBAN BUILDING MODELLING

    Directory of Open Access Journals (Sweden)

    S. Zhang

    2016-06-01

    Full Text Available A complete building model reconstruction needs data collected from both air and ground. The former often has sparse coverage on building façades, while the latter usually is unable to observe the building rooftops. Attempting to solve the missing data issues in building reconstruction from single data source, we describe an approach for complete building reconstruction that integrates airborne LiDAR data and ground smartphone imagery. First, by taking advantages of GPS and digital compass information embedded in the image metadata of smartphones, we are able to find airborne LiDAR point clouds for the corresponding buildings in the images. In the next step, Structure-from-Motion and dense multi-view stereo algorithms are applied to generate building point cloud from multiple ground images. The third step extracts building outlines respectively from the LiDAR point cloud and the ground image point cloud. An automated correspondence between these two sets of building outlines allows us to achieve a precise registration and combination of the two point clouds, which ultimately results in a complete and full resolution building model. The developed approach overcomes the problem of sparse points on building façades in airborne LiDAR and the deficiency of rooftops in ground images such that the merits of both datasets are utilized.

  2. Heuristics for Relevancy Ranking of Earth Dataset Search Results

    Science.gov (United States)

    Lynnes, Christopher; Quinn, Patrick; Norton, James

    2016-01-01

    As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.

  3. 3D Modeling of Iran and Surrounding Areas From Simultaneous Inversion of Multiple Geophysical Datasets

    Science.gov (United States)

    2010-09-01

    shorter periods). Figure 4 shows example fits to the dispersion values and the Bouguer gravity variations. As seen in earlier studies (Maceira and Ammon...50 100 150 150 100 50 Figure 4. Sample dispersion (top) and Bouguer gravity (bottom) for the preliminary inversion. As for other

  4. A Cost-Effective Strategy for Storing Scientific Datasets with Multiple Service Providers in the Cloud

    OpenAIRE

    Yuan, Dong; Cui, Lizhen; Liu, Xiao; Fu, Erjiang; Yang, Yun

    2016-01-01

    Cloud computing provides scientists a platform that can deploy computation and data intensive applications without infrastructure investment. With excessive cloud resources and a decision support system, large generated data sets can be flexibly 1 stored locally in the current cloud, 2 deleted and regenerated whenever reused or 3 transferred to cheaper cloud service for storage. However, due to the pay for use model, the total application cost largely depends on the usage of computation, stor...

  5. Integrating Multiple Analytical Datasets to Compare Metabolite Profiles of Mouse Colonic-Cecal Contents and Feces.

    Science.gov (United States)

    Zeng, Huawei; Grapov, Dmitry; Jackson, Matthew I; Fahrmann, Johannes; Fiehn, Oliver; Combs, Gerald F

    2015-09-11

    The pattern of metabolites produced by the gut microbiome comprises a phenotype indicative of the means by which that microbiome affects the gut. We characterized that phenotype in mice by conducting metabolomic analyses of the colonic-cecal contents, comparing that to the metabolite patterns of feces in order to determine the suitability of fecal specimens as proxies for assessing the metabolic impact of the gut microbiome. We detected a total of 270 low molecular weight metabolites in colonic-cecal contents and feces by gas chromatograph, time-of-flight mass spectrometry (GC-TOF) and ultra-high performance liquid chromatography, quadrapole time-of-flight mass spectrometry (UPLC-Q-TOF). Of that number, 251 (93%) were present in both types of specimen, representing almost all known biochemical pathways related to the amino acid, carbohydrate, energy, lipid, membrane transport, nucleotide, genetic information processing, and cancer-related metabolism. A total of 115 metabolites differed significantly in relative abundance between both colonic-cecal contents and feces. These data comprise the first characterization of relationships among metabolites present in the colonic-cecal contents and feces in a healthy mouse model, and shows that feces can be a useful proxy for assessing the pattern of metabolites to which the colonic mucosum is exposed.

  6. Integrating Multiple Analytical Datasets to Compare Metabolite Profiles of Mouse Colonic-Cecal Contents and Feces

    Directory of Open Access Journals (Sweden)

    Huawei Zeng

    2015-09-01

    Full Text Available The pattern of metabolites produced by the gut microbiome comprises a phenotype indicative of the means by which that microbiome affects the gut. We characterized that phenotype in mice by conducting metabolomic analyses of the colonic-cecal contents, comparing that to the metabolite patterns of feces in order to determine the suitability of fecal specimens as proxies for assessing the metabolic impact of the gut microbiome. We detected a total of 270 low molecular weight metabolites in colonic-cecal contents and feces by gas chromatograph, time-of-flight mass spectrometry (GC-TOF and ultra-high performance liquid chromatography, quadrapole time-of-flight mass spectrometry (UPLC-Q-TOF. Of that number, 251 (93% were present in both types of specimen, representing almost all known biochemical pathways related to the amino acid, carbohydrate, energy, lipid, membrane transport, nucleotide, genetic information processing, and cancer-related metabolism. A total of 115 metabolites differed significantly in relative abundance between both colonic-cecal contents and feces. These data comprise the first characterization of relationships among metabolites present in the colonic-cecal contents and feces in a healthy mouse model, and shows that feces can be a useful proxy for assessing the pattern of metabolites to which the colonic mucosum is exposed.

  7. Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to mine and utilize the combination of Earth Science dataset, metadata with usage metrics and user feedback to objectively extract relevance for improved...

  8. EEG datasets for motor imagery brain-computer interface.

    Science.gov (United States)

    Cho, Hohyun; Ahn, Minkyu; Ahn, Sangtae; Kwon, Moonyoung; Jun, Sung Chan

    2017-07-01

    Most investigators of brain-computer interface (BCI) research believe that BCI can be achieved through induced neuronal activity from the cortex, but not by evoked neuronal activity. Motor imagery (MI)-based BCI is one of the standard concepts of BCI, in that the user can generate induced activity by imagining motor movements. However, variations in performance over sessions and subjects are too severe to overcome easily; therefore, a basic understanding and investigation of BCI performance variation is necessary to find critical evidence of performance variation. Here we present not only EEG datasets for MI BCI from 52 subjects, but also the results of a psychological and physiological questionnaire, EMG datasets, the locations of 3D EEG electrodes, and EEGs for non-task-related states. We validated our EEG datasets by using the percentage of bad trials, event-related desynchronization/synchronization (ERD/ERS) analysis, and classification analysis. After conventional rejection of bad trials, we showed contralateral ERD and ipsilateral ERS in the somatosensory area, which are well-known patterns of MI. Finally, we showed that 73.08% of datasets (38 subjects) included reasonably discriminative information. Our EEG datasets included the information necessary to determine statistical significance; they consisted of well-discriminated datasets (38 subjects) and less-discriminative datasets. These may provide researchers with opportunities to investigate human factors related to MI BCI performance variation, and may also achieve subject-to-subject transfer by using metadata, including a questionnaire, EEG coordinates, and EEGs for non-task-related states. © The Authors 2017. Published by Oxford University Press.

  9. Treatments of Missing Values in Large National Data Affect Conclusions: The Impact of Multiple Imputation on Arthroplasty Research.

    Science.gov (United States)

    Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Su, Edwin P; Grauer, Jonathan N

    2018-03-01

    Despite the advantages of large, national datasets, one continuing concern is missing data values. Complete case analysis, where only cases with complete data are analyzed, is commonly used rather than more statistically rigorous approaches such as multiple imputation. This study characterizes the potential selection bias introduced using complete case analysis and compares the results of common regressions using both techniques following unicompartmental knee arthroplasty. Patients undergoing unicompartmental knee arthroplasty were extracted from the 2005 to 2015 National Surgical Quality Improvement Program. As examples, the demographics of patients with and without missing preoperative albumin and hematocrit values were compared. Missing data were then treated with both complete case analysis and multiple imputation (an approach that reproduces the variation and associations that would have been present in a full dataset) and the conclusions of common regressions for adverse outcomes were compared. A total of 6117 patients were included, of which 56.7% were missing at least one value. Younger, female, and healthier patients were more likely to have missing preoperative albumin and hematocrit values. The use of complete case analysis removed 3467 patients from the study in comparison with multiple imputation which included all 6117 patients. The 2 methods of handling missing values led to differing associations of low preoperative laboratory values with commonly studied adverse outcomes. The use of complete case analysis can introduce selection bias and may lead to different conclusions in comparison with the statistically rigorous multiple imputation approach. Joint surgeons should consider the methods of handling missing values when interpreting arthroplasty research. Copyright © 2017 Elsevier Inc. All rights reserved.

  10. Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm

    DEFF Research Database (Denmark)

    Grotkjær, Thomas; Winther, Ole; Regenberg, Birgitte

    2006-01-01

    Motivation: Hierarchical and relocation clustering (e.g. K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods...... analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset...

  11. Concatenated image completion via tensor augmentation and completion

    OpenAIRE

    Bengua, Johann A.; Tuan, Hoang D.; Phien, Ho N.; Do, Minh N.

    2016-01-01

    This paper proposes a novel framework called concatenated image completion via tensor augmentation and completion (ICTAC), which recovers missing entries of color images with high accuracy. Typical images are second- or third-order tensors (2D/3D) depending if they are grayscale or color, hence tensor completion algorithms are ideal for their recovery. The proposed framework performs image completion by concatenating copies of a single image that has missing entries into a third-order tensor,...

  12. Using large hydrological datasets to create a robust, physically based, spatially distributed model for Great Britain

    Science.gov (United States)

    Lewis, Elizabeth; Kilsby, Chris; Fowler, Hayley

    2014-05-01

    The impact of climate change on hydrological systems requires further quantification in order to inform water management. This study intends to conduct such analysis using hydrological models. Such models are of varying forms, of which conceptual, lumped parameter models and physically-based models are two important types. The majority of hydrological studies use conceptual models calibrated against measured river flow time series in order to represent catchment behaviour. This method often shows impressive results for specific problems in gauged catchments. However, the results may not be robust under non-stationary conditions such as climate change, as physical processes and relationships amenable to change are not accounted for explicitly. Moreover, conceptual models are less readily applicable to ungauged catchments, in which hydrological predictions are also required. As such, the physically based, spatially distributed model SHETRAN is used in this study to develop a robust and reliable framework for modelling historic and future behaviour of gauged and ungauged catchments across the whole of Great Britain. In order to achieve this, a large array of data completely covering Great Britain for the period 1960-2006 has been collated and efficiently stored ready for model input. The data processed include a DEM, rainfall, PE and maps of geology, soil and land cover. A desire to make the modelling system easy for others to work with led to the development of a user-friendly graphical interface. This allows non-experts to set up and run a catchment model in a few seconds, a process that can normally take weeks or months. The quality and reliability of the extensive dataset for modelling hydrological processes has also been evaluated. One aspect of this has been an assessment of error and uncertainty in rainfall input data, as well as the effects of temporal resolution in precipitation inputs on model calibration. SHETRAN has been updated to accept gridded rainfall

  13. Comparison of CORA and EN4 in-situ datasets validation methods, toward a better quality merged dataset.

    Science.gov (United States)

    Szekely, Tanguy; Killick, Rachel; Gourrion, Jerome; Reverdin, Gilles

    2017-04-01

    CORA and EN4 are both global delayed time mode validated in-situ ocean temperature and salinity datasets distributed by the Met Office (http://www.metoffice.gov.uk/) and Copernicus (www.marine.copernicus.eu). A large part of the profiles distributed by CORA and EN4 in recent years are Argo profiles from the ARGO DAC, but profiles are also extracted from the World Ocean Database and TESAC profiles from GTSPP. In the case of CORA, data coming from the EUROGOOS Regional operationnal oserving system( ROOS) operated by European institutes no managed by National Data Centres and other datasets of profiles povided by scientific sources can also be found (Sea mammals profiles from MEOP, XBT datasets from cruises ...). (EN4 also takes data from the ASBO dataset to supplement observations in the Arctic). First advantage of this new merge product is to enhance the space and time coverage at global and european scales for the period covering 1950 till a year before the current year. This product is updated once a year and T&S gridded fields are alos generated for the period 1990-year n-1. The enhancement compared to the revious CORA product will be presented Despite the fact that the profiles distributed by both datasets are mostly the same, the quality control procedures developed by the Met Office and Copernicus teams differ, sometimes leading to different quality control flags for the same profile. Started in 2016 a new study started that aims to compare both validation procedures to move towards a Copernicus Marine Service dataset with the best features of CORA and EN4 validation.A reference data set composed of the full set of in-situ temperature and salinity measurements collected by Coriolis during 2015 is used. These measurements have been made thanks to wide range of instruments (XBTs, CTDs, Argo floats, Instrumented sea mammals,...), covering the global ocean. The reference dataset has been validated simultaneously by both teams.An exhaustive comparison of the

  14. Multiple sclerosis

    Science.gov (United States)

    ... indwelling catheter Osteoporosis or thinning of the bones Pressure sores Side effects of medicines used to treat the ... Daily bowel care program Multiple sclerosis - discharge Preventing pressure ulcers Swallowing problems Images Multiple sclerosis MRI of the ...

  15. A conceptual prototype for the next-generation national elevation dataset

    Science.gov (United States)

    Stoker, Jason M.; Heidemann, Hans Karl; Evans, Gayla A.; Greenlee, Susan K.

    2013-01-01

    In 2012 the U.S. Geological Survey's (USGS) National Geospatial Program (NGP) funded a study to develop a conceptual prototype for a new National Elevation Dataset (NED) design with expanded capabilities to generate and deliver a suite of bare earth and above ground feature information over the United States. This report details the research on identifying operational requirements based on prior research, evaluation of what is needed for the USGS to meet these requirements, and development of a possible conceptual framework that could potentially deliver the kinds of information that are needed to support NGP's partners and constituents. This report provides an initial proof-of-concept demonstration using an existing dataset, and recommendations for the future, to inform NGP's ongoing and future elevation program planning and management decisions. The demonstration shows that this type of functional process can robustly create derivatives from lidar point cloud data; however, more research needs to be done to see how well it extends to multiple datasets.

  16. Validating a continental-scale groundwater diffuse pollution model using regional datasets.

    Science.gov (United States)

    Ouedraogo, Issoufou; Defourny, Pierre; Vanclooster, Marnik

    2017-12-11

    In this study, we assess the validity of an African-scale groundwater pollution model for nitrates. In a previous study, we identified a statistical continental-scale groundwater pollution model for nitrate. The model was identified using a pan-African meta-analysis of available nitrate groundwater pollution studies. The model was implemented in both Random Forest (RF) and multiple regression formats. For both approaches, we collected as predictors a comprehensive GIS database of 13 spatial attributes, related to land use, soil type, hydrogeology, topography, climatology, region typology, nitrogen fertiliser application rate, and population density. In this paper, we validate the continental-scale model of groundwater contamination by using a nitrate measurement dataset from three African countries. We discuss the issue of data availability, and quality and scale issues, as challenges in validation. Notwithstanding that the modelling procedure exhibited very good success using a continental-scale dataset (e.g. R 2  = 0.97 in the RF format using a cross-validation approach), the continental-scale model could not be used without recalibration to predict nitrate pollution at the country scale using regional data. In addition, when recalibrating the model using country-scale datasets, the order of model exploratory factors changes. This suggests that the structure and the parameters of a statistical spatially distributed groundwater degradation model for the African continent are strongly scale dependent.

  17. Wind and wave dataset for Matara, Sri Lanka

    Science.gov (United States)

    Luo, Yao; Wang, Dongxiao; Priyadarshana Gamage, Tilak; Zhou, Fenghua; Madusanka Widanage, Charith; Liu, Taiwei

    2018-01-01

    We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1) is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017) is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447).

  18. Wind and wave dataset for Matara, Sri Lanka

    Directory of Open Access Journals (Sweden)

    Y. Luo

    2018-01-01

    Full Text Available We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1 is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017 is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447.

  19. Process mining in oncology using the MIMIC-III dataset

    Science.gov (United States)

    Prima Kurniati, Angelina; Hall, Geoff; Hogg, David; Johnson, Owen

    2018-03-01

    Process mining is a data analytics approach to discover and analyse process models based on the real activities captured in information systems. There is a growing body of literature on process mining in healthcare, including oncology, the study of cancer. In earlier work we found 37 peer-reviewed papers describing process mining research in oncology with a regular complaint being the limited availability and accessibility of datasets with suitable information for process mining. Publicly available datasets are one option and this paper describes the potential to use MIMIC-III, for process mining in oncology. MIMIC-III is a large open access dataset of de-identified patient records. There are 134 publications listed as using the MIMIC dataset, but none of them have used process mining. The MIMIC-III dataset has 16 event tables which are potentially useful for process mining and this paper demonstrates the opportunities to use MIMIC-III for process mining in oncology. Our research applied the L* lifecycle method to provide a worked example showing how process mining can be used to analyse cancer pathways. The results and data quality limitations are discussed along with opportunities for further work and reflection on the value of MIMIC-III for reproducible process mining research.

  20. Ensuring America's Future by Increasing Latino College Completion: Latino College Completion in 50 States. Executive Summary

    Science.gov (United States)

    Santiago, Deborah; Soliz, Megan

    2012-01-01

    In 2009, Excelencia in Education launched the Ensuring America's Future initiative to inform, organize, and engage leaders in a tactical plan to increase Latino college completion. This initiative included the release of a benchmarking guide for projections of degree attainment disaggregated by race/ethnicity that offered multiple metrics to track…

  1. Use of country of birth as an indicator of refugee background in health datasets

    Science.gov (United States)

    2014-01-01

    Background Routine public health databases contain a wealth of data useful for research among vulnerable or isolated groups, who may be under-represented in traditional medical research. Identifying specific vulnerable populations, such as resettled refugees, can be particularly challenging; often country of birth is the sole indicator of whether an individual has a refugee background. The objective of this article was to review strengths and weaknesses of different methodological approaches to identifying resettled refugees and comparison groups from routine health datasets and to propose the application of additional methodological rigour in future research. Discussion Methodological approaches to selecting refugee and comparison groups from existing routine health datasets vary widely and are often explained in insufficient detail. Linked data systems or datasets from specialized refugee health services can accurately select resettled refugee and asylum seeker groups but have limited availability and can be selective. In contrast, country of birth is commonly collected in routine health datasets but a robust method for selecting humanitarian source countries based solely on this information is required. The authors recommend use of national immigration data to objectively identify countries of birth with high proportions of humanitarian entrants, matched by time period to the study dataset. When available, additional migration indicators may help to better understand migration as a health determinant. Methodologically, if multiple countries of birth are combined, the proportion of the sample represented by each country of birth should be included, with sub-analysis of individual countries of birth potentially providing further insights, if population size allows. United Nations-defined world regions provide an objective framework for combining countries of birth when necessary. A comparison group of economic migrants from the same world region may be appropriate

  2. p-topological Cauchy completions

    Directory of Open Access Journals (Sweden)

    J. Wig

    1999-01-01

    Full Text Available The duality between “regular” and “topological” as convergence space properties extends in a natural way to the more general properties “p-regular” and “p-topological.” Since earlier papers have investigated regular, p-regular, and topological Cauchy completions, we hereby initiate a study of p-topological Cauchy completions. A p-topological Cauchy space has a p-topological completion if and only if it is “cushioned,” meaning that each equivalence class of nonconvergent Cauchy filters contains a smallest filter. For a Cauchy space allowing a p-topological completion, it is shown that a certain class of Reed completions preserve the p-topological property, including the Wyler and Kowalsky completions, which are, respectively, the finest and the coarsest p-topological completions. However, not all p-topological completions are Reed completions. Several extension theorems for p-topological completions are obtained. The most interesting of these states that any Cauchy-continuous map between Cauchy spaces allowing p-topological and p′-topological completions, respectively, can always be extended to a θ-continuous map between any p-topological completion of the first space and any p′-topological completion of the second.

  3. Recent Development on the NOAA's Global Surface Temperature Dataset

    Science.gov (United States)

    Zhang, H. M.; Huang, B.; Boyer, T.; Lawrimore, J. H.; Menne, M. J.; Rennie, J.

    2016-12-01

    Global Surface Temperature (GST) is one of the most widely used indicators for climate trend and extreme analyses. A widely used GST dataset is the NOAA merged land-ocean surface temperature dataset known as NOAAGlobalTemp (formerly MLOST). The NOAAGlobalTemp had recently been updated from version 3.5.4 to version 4. The update includes a significant improvement in the ocean surface component (Extended Reconstructed Sea Surface Temperature or ERSST, from version 3b to version 4) which resulted in an increased temperature trends in recent decades. Since then, advancements in both the ocean component (ERSST) and land component (GHCN-Monthly) have been made, including the inclusion of Argo float SSTs and expanded EOT modes in ERSST, and the use of ISTI databank in GHCN-Monthly. In this presentation, we describe the impact of those improvements on the merged global temperature dataset, in terms of global trends and other aspects.

  4. Synthetic ALSPAC longitudinal datasets for the Big Data VR project.

    Science.gov (United States)

    Avraam, Demetris; Wilson, Rebecca C; Burton, Paul

    2017-01-01

    Three synthetic datasets - of observation size 15,000, 155,000 and 1,555,000 participants, respectively - were created by simulating eleven cardiac and anthropometric variables from nine collection ages of the ALSAPC birth cohort study. The synthetic datasets retain similar data properties to the ALSPAC study data they are simulated from (co-variance matrices, as well as the mean and variance values of the variables) without including the original data itself or disclosing participant information.  In this instance, the three synthetic datasets have been utilised in an academia-industry collaboration to build a prototype virtual reality data analysis software, but they could have a broader use in method and software development projects where sensitive data cannot be freely shared.

  5. The OXL format for the exchange of integrated datasets

    Directory of Open Access Journals (Sweden)

    Taubert Jan

    2007-12-01

    Full Text Available A prerequisite for systems biology is the integration and analysis of heterogeneous experimental data stored in hundreds of life-science databases and millions of scientific publications. Several standardised formats for the exchange of specific kinds of biological information exist. Such exchange languages facilitate the integration process; however they are not designed to transport integrated datasets. A format for exchanging integrated datasets needs to i cover data from a broad range of application domains, ii be flexible and extensible to combine many different complex data structures, iii include metadata and semantic definitions, iv include inferred information, v identify the original data source for integrated entities and vi transport large integrated datasets. Unfortunately, none of the exchange formats from the biological domain (e.g. BioPAX, MAGE-ML, PSI-MI, SBML or the generic approaches (RDF, OWL fulfil these requirements in a systematic way.

  6. Dataset of transcriptional landscape of B cell early activation

    Directory of Open Access Journals (Sweden)

    Alexander S. Garruss

    2015-09-01

    Full Text Available Signaling via B cell receptors (BCR and Toll-like receptors (TLRs result in activation of B cells with distinct physiological outcomes, but transcriptional regulatory mechanisms that drive activation and distinguish these pathways remain unknown. At early time points after BCR and TLR ligand exposure, 0.5 and 2 h, RNA-seq was performed allowing observations on rapid transcriptional changes. At 2 h, ChIP-seq was performed to allow observations on important regulatory mechanisms potentially driving transcriptional change. The dataset includes RNA-seq, ChIP-seq of control (Input, RNA Pol II, H3K4me3, H3K27me3, and a separate RNA-seq for miRNA expression, which can be found at Gene Expression Omnibus Dataset GSE61608. Here, we provide details on the experimental and analysis methods used to obtain and analyze this dataset and to examine the transcriptional landscape of B cell early activation.

  7. The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset

    Science.gov (United States)

    Huffman, George J.; Adler, Robert F.; Arkin, Philip; Chang, Alfred; Ferraro, Ralph; Gruber, Arnold; Janowiak, John; McNab, Alan; Rudolf, Bruno; Schneider, Udo

    1997-01-01

    The Global Precipitation Climatology Project (GPCP) has released the GPCP Version 1 Combined Precipitation Data Set, a global, monthly precipitation dataset covering the period July 1987 through December 1995. The primary product in the dataset is a merged analysis incorporating precipitation estimates from low-orbit-satellite microwave data, geosynchronous-orbit -satellite infrared data, and rain gauge observations. The dataset also contains the individual input fields, a combination of the microwave and infrared satellite estimates, and error estimates for each field. The data are provided on 2.5 deg x 2.5 deg latitude-longitude global grids. Preliminary analyses show general agreement with prior studies of global precipitation and extends prior studies of El Nino-Southern Oscillation precipitation patterns. At the regional scale there are systematic differences with standard climatologies.

  8. A cross-country Exchange Market Pressure (EMP) dataset.

    Science.gov (United States)

    Desai, Mohit; Patnaik, Ila; Felman, Joshua; Shah, Ajay

    2017-06-01

    The data presented in this article are related to the research article titled - "An exchange market pressure measure for cross country analysis" (Patnaik et al. [1]). In this article, we present the dataset for Exchange Market Pressure values (EMP) for 139 countries along with their conversion factors, ρ (rho). Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values) for the point estimates of ρ 's. Using the standard errors of estimates of ρ 's, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.

  9. Structure completion for facade layouts

    KAUST Repository

    Fan, Lubin; Musialski, Przemyslaw; Liu, Ligang; Wonka, Peter

    2014-01-01

    completion with large missing parts is an ill-posed problem. Therefore, we combine two sources of information to derive our solution: the observed shapes and a database of complete layouts. The problem is also very difficult, because shape positions

  10. Being an honest broker of hydrology: Uncovering, communicating and addressing model error in a climate change streamflow dataset

    Science.gov (United States)

    Chegwidden, O.; Nijssen, B.; Pytlak, E.

    2017-12-01

    Any model simulation has errors, including errors in meteorological data, process understanding, model structure, and model parameters. These errors may express themselves as bias, timing lags, and differences in sensitivity between the model and the physical world. The evaluation and handling of these errors can greatly affect the legitimacy, validity and usefulness of the resulting scientific product. In this presentation we will discuss a case study of handling and communicating model errors during the development of a hydrologic climate change dataset for the Pacific Northwestern United States. The dataset was the result of a four-year collaboration between the University of Washington, Oregon State University, the Bonneville Power Administration, the United States Army Corps of Engineers and the Bureau of Reclamation. Along the way, the partnership facilitated the discovery of multiple systematic errors in the streamflow dataset. Through an iterative review process, some of those errors could be resolved. For the errors that remained, honest communication of the shortcomings promoted the dataset's legitimacy. Thoroughly explaining errors also improved ways in which the dataset would be used in follow-on impact studies. Finally, we will discuss the development of the "streamflow bias-correction" step often applied to climate change datasets that will be used in impact modeling contexts. We will describe the development of a series of bias-correction techniques through close collaboration among universities and stakeholders. Through that process, both universities and stakeholders learned about the others' expectations and workflows. This mutual learning process allowed for the development of methods that accommodated the stakeholders' specific engineering requirements. The iterative revision process also produced a functional and actionable dataset while preserving its scientific merit. We will describe how encountering earlier techniques' pitfalls allowed us

  11. SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

    KAUST Repository

    Giancola, Silvio; Amine, Mohieddine; Dghaily, Tarek; Ghanem, Bernard

    2018-01-01

    In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The dataset is composed of 500 complete soccer games from six main European leagues, covering three seasons from 2014 to 2017 and a total duration of 764 hours. A total of 6,637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution). As such, the dataset is easily scalable. These annotations are manually refined to a one second resolution by anchoring them at a single timestamp following well-defined soccer rules. With an average of one event every 6.9 minutes, this dataset focuses on the problem of localizing very sparse events within long videos. We define the task of spotting as finding the anchors of soccer events in a video. Making use of recent developments in the realm of generic action recognition and detection in video, we provide strong baselines for detecting soccer events. We show that our best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8%. For the spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances $\\delta$ ranging from 5 to 60 seconds.

  12. SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

    KAUST Repository

    Giancola, Silvio

    2018-04-12

    In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The dataset is composed of 500 complete soccer games from six main European leagues, covering three seasons from 2014 to 2017 and a total duration of 764 hours. A total of 6,637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution). As such, the dataset is easily scalable. These annotations are manually refined to a one second resolution by anchoring them at a single timestamp following well-defined soccer rules. With an average of one event every 6.9 minutes, this dataset focuses on the problem of localizing very sparse events within long videos. We define the task of spotting as finding the anchors of soccer events in a video. Making use of recent developments in the realm of generic action recognition and detection in video, we provide strong baselines for detecting soccer events. We show that our best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8%. For the spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances $\\\\delta$ ranging from 5 to 60 seconds.

  13. Making of a solar spectral irradiance dataset I: observations, uncertainties, and methods

    Directory of Open Access Journals (Sweden)

    Schöll Micha

    2016-01-01

    Full Text Available Context. Changes in the spectral solar irradiance (SSI are a key driver of the variability of the Earth’s environment, strongly affecting the upper atmosphere, but also impacting climate. However, its measurements have been sparse and of different quality. The “First European Comprehensive Solar Irradiance Data Exploitation project” (SOLID aims at merging the complete set of European irradiance data, complemented by archive data that include data from non-European missions. Aims. As part of SOLID, we present all available space-based SSI measurements, reference spectra, and relevant proxies in a unified format with regular temporal re-gridding, interpolation, gap-filling as well as associated uncertainty estimations. Methods. We apply a coherent methodology to all available SSI datasets. Our pipeline approach consists of the pre-processing of the data, the interpolation of missing data by utilizing the spectral coherency of SSI, the temporal re-gridding of the data, an instrumental outlier detection routine, and a proxy-based interpolation for missing and flagged values. In particular, to detect instrumental outliers, we combine an autoregressive model with proxy data. We independently estimate the precision and stability of each individual dataset and flag all changes due to processing in an accompanying quality mask. Results. We present a unified database of solar activity records with accompanying meta-data and uncertainties. Conclusions. This dataset can be used for further investigations of the long-term trend of solar activity and the construction of a homogeneous SSI record.

  14. A Novel Technique for Time-Centric Analysis of Massive Remotely-Sensed Datasets

    Directory of Open Access Journals (Sweden)

    Glenn E. Grant

    2015-04-01

    Full Text Available Analyzing massive remotely-sensed datasets presents formidable challenges. The volume of satellite imagery collected often outpaces analytical capabilities, however thorough analyses of complete datasets may provide new insights into processes that would otherwise be unseen. In this study we present a novel, object-oriented approach to storing, retrieving, and analyzing large remotely-sensed datasets. The objective is to provide a new structure for scalable storage and rapid, Internet-based analysis of climatology data. The concept of a “data rod” is introduced, a conceptual data object that organizes time-series information into a temporally-oriented vertical column at any given location. To demonstrate one possible use, we ingest 25 years of Greenland imagery into a series of pure-object databases, then retrieve and analyze the data. The results provide a basis for evaluating the database performance and scientific analysis capabilities. The project succeeds in demonstrating the effectiveness of the prototype database architecture and analysis approach, not because new scientific information is discovered, but because quality control issues are revealed in the source data that had gone undetected for years.

  15. Publishing datasets with eSciDoc and panMetaDocs

    Science.gov (United States)

    Ulbricht, D.; Klump, J.; Bertelmann, R.

    2012-04-01

    Currently serveral research institutions worldwide undertake considerable efforts to have their scientific datasets published and to syndicate them to data portals as extensively described objects identified by a persistent identifier. This is done to foster the reuse of data, to make scientific work more transparent, and to create a citable entity that can be referenced unambigously in written publications. GFZ Potsdam established a publishing workflow for file based research datasets. Key software components are an eSciDoc infrastructure [1] and multiple instances of the data curation tool panMetaDocs [2]. The eSciDoc repository holds data objects and their associated metadata in container objects, called eSciDoc items. A key metadata element in this context is the publication status of the referenced data set. PanMetaDocs, which is based on PanMetaWorks [3], is a PHP based web application that allows to describe data with any XML-based metadata schema. The metadata fields can be filled with static or dynamic content to reduce the number of fields that require manual entries to a minimum and make use of contextual information in a project setting. Access rights can be applied to set visibility of datasets to other project members and allow collaboration on and notifying about datasets (RSS) and interaction with the internal messaging system, that was inherited from panMetaWorks. When a dataset is to be published, panMetaDocs allows to change the publication status of the eSciDoc item from status "private" to "submitted" and prepare the dataset for verification by an external reviewer. After quality checks, the item publication status can be changed to "published". This makes the data and metadata available through the internet worldwide. PanMetaDocs is developed as an eSciDoc application. It is an easy to use graphical user interface to eSciDoc items, their data and metadata. It is also an application supporting a DOI publication agent during the process of

  16. A collection of annotated and harmonized human breast cancer transcriptome datasets, including immunologic classification [version 2; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Jessica Roelands

    2018-02-01

    Full Text Available The increased application of high-throughput approaches in translational research has expanded the number of publicly available data repositories. Gathering additional valuable information contained in the datasets represents a crucial opportunity in the biomedical field. To facilitate and stimulate utilization of these datasets, we have recently developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB. In this note, we describe a curated compendium of 13 public datasets on human breast cancer, representing a total of 2142 transcriptome profiles. We classified the samples according to different immune based classification systems and integrated this information into the datasets. Annotated and harmonized datasets were uploaded to GXB. Study samples were categorized in different groups based on their immunologic tumor response profiles, intrinsic molecular subtypes and multiple clinical parameters. Ranked gene lists were generated based on relevant group comparisons. In this data note, we demonstrate the utility of GXB to evaluate the expression of a gene of interest, find differential gene expression between groups and investigate potential associations between variables with a specific focus on immunologic classification in breast cancer. This interactive resource is publicly available online at: http://breastcancer.gxbsidra.org/dm3/geneBrowser/list.

  17. Dataset of herbarium specimens of threatened vascular plants in Catalonia.

    Science.gov (United States)

    Nualart, Neus; Ibáñez, Neus; Luque, Pere; Pedrol, Joan; Vilar, Lluís; Guàrdia, Roser

    2017-01-01

    This data paper describes a specimens' dataset of the Catalonian threatened vascular plants conserved in five public Catalonian herbaria (BC, BCN, HGI, HBIL and MTTE). Catalonia is an administrative region of Spain that includes large autochthon plants diversity and 199 taxa with IUCN threatened categories (EX, EW, RE, CR, EN and VU). This dataset includes 1,618 records collected from 17 th century to nowadays. For each specimen, the species name, locality indication, collection date, collector, ecology and revision label are recorded. More than 94% of the taxa are represented in the herbaria, which evidence the paper of the botanical collections as an essential source of occurrence data.

  18. A Large-Scale 3D Object Recognition dataset

    DEFF Research Database (Denmark)

    Sølund, Thomas; Glent Buch, Anders; Krüger, Norbert

    2016-01-01

    geometric groups; concave, convex, cylindrical and flat 3D object models. The object models have varying amount of local geometric features to challenge existing local shape feature descriptors in terms of descriptiveness and robustness. The dataset is validated in a benchmark which evaluates the matching...... performance of 7 different state-of-the-art local shape descriptors. Further, we validate the dataset in a 3D object recognition pipeline. Our benchmark shows as expected that local shape feature descriptors without any global point relation across the surface have a poor matching performance with flat...

  19. Traffic sign classification with dataset augmentation and convolutional neural network

    Science.gov (United States)

    Tang, Qing; Kurnianggoro, Laksono; Jo, Kang-Hyun

    2018-04-01

    This paper presents a method for traffic sign classification using a convolutional neural network (CNN). In this method, firstly we transfer a color image into grayscale, and then normalize it in the range (-1,1) as the preprocessing step. To increase robustness of classification model, we apply a dataset augmentation algorithm and create new images to train the model. To avoid overfitting, we utilize a dropout module before the last fully connection layer. To assess the performance of the proposed method, the German traffic sign recognition benchmark (GTSRB) dataset is utilized. Experimental results show that the method is effective in classifying traffic signs.

  20. The Wind Integration National Dataset (WIND) toolkit (Presentation)

    Energy Technology Data Exchange (ETDEWEB)

    Caroline Draxl: NREL

    2014-01-01

    Regional wind integration studies require detailed wind power output data at many locations to perform simulations of how the power system will operate under high penetration scenarios. The wind datasets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as being time synchronized with available load profiles.As described in this presentation, the WIND Toolkit fulfills these requirements by providing a state-of-the-art national (US) wind resource, power production and forecast dataset.

  1. MULTIPLE OBJECTS

    Directory of Open Access Journals (Sweden)

    A. A. Bosov

    2015-04-01

    Full Text Available Purpose. The development of complicated techniques of production and management processes, information systems, computer science, applied objects of systems theory and others requires improvement of mathematical methods, new approaches for researches of application systems. And the variety and diversity of subject systems makes necessary the development of a model that generalizes the classical sets and their development – sets of sets. Multiple objects unlike sets are constructed by multiple structures and represented by the structure and content. The aim of the work is the analysis of multiple structures, generating multiple objects, the further development of operations on these objects in application systems. Methodology. To achieve the objectives of the researches, the structure of multiple objects represents as constructive trio, consisting of media, signatures and axiomatic. Multiple object is determined by the structure and content, as well as represented by hybrid superposition, composed of sets, multi-sets, ordered sets (lists and heterogeneous sets (sequences, corteges. Findings. In this paper we study the properties and characteristics of the components of hybrid multiple objects of complex systems, proposed assessments of their complexity, shown the rules of internal and external operations on objects of implementation. We introduce the relation of arbitrary order over multiple objects, we define the description of functions and display on objects of multiple structures. Originality.In this paper we consider the development of multiple structures, generating multiple objects.Practical value. The transition from the abstract to the subject of multiple structures requires the transformation of the system and multiple objects. Transformation involves three successive stages: specification (binding to the domain, interpretation (multiple sites and particularization (goals. The proposed describe systems approach based on hybrid sets

  2. Quality-control of an hourly rainfall dataset and climatology of extremes for the UK.

    Science.gov (United States)

    Blenkinsop, Stephen; Lewis, Elizabeth; Chan, Steven C; Fowler, Hayley J

    2017-02-01

    Sub-daily rainfall extremes may be associated with flash flooding, particularly in urban areas but, compared with extremes on daily timescales, have been relatively little studied in many regions. This paper describes a new, hourly rainfall dataset for the UK based on ∼1600 rain gauges from three different data sources. This includes tipping bucket rain gauge data from the UK Environment Agency (EA), which has been collected for operational purposes, principally flood forecasting. Significant problems in the use of such data for the analysis of extreme events include the recording of accumulated totals, high frequency bucket tips, rain gauge recording errors and the non-operation of gauges. Given the prospect of an intensification of short-duration rainfall in a warming climate, the identification of such errors is essential if sub-daily datasets are to be used to better understand extreme events. We therefore first describe a series of procedures developed to quality control this new dataset. We then analyse ∼380 gauges with near-complete hourly records for 1992-2011 and map the seasonal climatology of intense rainfall based on UK hourly extremes using annual maxima, n-largest events and fixed threshold approaches. We find that the highest frequencies and intensities of hourly extreme rainfall occur during summer when the usual orographically defined pattern of extreme rainfall is replaced by a weaker, north-south pattern. A strong diurnal cycle in hourly extremes, peaking in late afternoon to early evening, is also identified in summer and, for some areas, in spring. This likely reflects the different mechanisms that generate sub-daily rainfall, with convection dominating during summer. The resulting quality-controlled hourly rainfall dataset will provide considerable value in several contexts, including the development of standard, globally applicable quality-control procedures for sub-daily data, the validation of the new generation of very high

  3. One tree to link them all: a phylogenetic dataset for the European tetrapoda.

    Science.gov (United States)

    Roquet, Cristina; Lavergne, Sébastien; Thuiller, Wilfried

    2014-08-08

    Since the ever-increasing availability of phylogenetic informative data, the last decade has seen an upsurge of ecological studies incorporating information on evolutionary relationships among species. However, detailed species-level phylogenies are still lacking for many large groups and regions, which are necessary for comprehensive large-scale eco-phylogenetic analyses. Here, we provide a dataset of 100 dated phylogenetic trees for all European tetrapods based on a mixture of supermatrix and supertree approaches. Phylogenetic inference was performed separately for each of the main Tetrapoda groups of Europe except mammals (i.e. amphibians, birds, squamates and turtles) by means of maximum likelihood (ML) analyses of supermatrix applying a tree constraint at the family (amphibians and squamates) or order (birds and turtles) levels based on consensus knowledge. For each group, we inferred 100 ML trees to be able to provide a phylogenetic dataset that accounts for phylogenetic uncertainty, and assessed node support with bootstrap analyses. Each tree was dated using penalized-likelihood and fossil calibration. The trees obtained were well-supported by existing knowledge and previous phylogenetic studies. For mammals, we modified the most complete supertree dataset available on the literature to include a recent update of the Carnivora clade. As a final step, we merged the phylogenetic trees of all groups to obtain a set of 100 phylogenetic trees for all European Tetrapoda species for which data was available (91%). We provide this phylogenetic dataset (100 chronograms) for the purpose of comparative analyses, macro-ecological or community ecology studies aiming to incorporate phylogenetic information while accounting for phylogenetic uncertainty.

  4. Missing data treatments matter: an analysis of multiple imputation for anterior cervical discectomy and fusion procedures.

    Science.gov (United States)

    Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Cui, Jonathan J; Basques, Bryce A; Albert, Todd J; Grauer, Jonathan N

    2018-04-09

    The presence of missing data is a limitation of large datasets, including the National Surgical Quality Improvement Program (NSQIP). In addressing this issue, most studies use complete case analysis, which excludes cases with missing data, thus potentially introducing selection bias. Multiple imputation, a statistically rigorous approach that approximates missing data and preserves sample size, may be an improvement over complete case analysis. The present study aims to evaluate the impact of using multiple imputation in comparison with complete case analysis for assessing the associations between preoperative laboratory values and adverse outcomes following anterior cervical discectomy and fusion (ACDF) procedures. This is a retrospective review of prospectively collected data. Patients undergoing one-level ACDF were identified in NSQIP 2012-2015. Perioperative adverse outcome variables assessed included the occurrence of any adverse event, severe adverse events, and hospital readmission. Missing preoperative albumin and hematocrit values were handled using complete case analysis and multiple imputation. These preoperative laboratory levels were then tested for associations with 30-day postoperative outcomes using logistic regression. A total of 11,999 patients were included. Of this cohort, 63.5% of patients had missing preoperative albumin and 9.9% had missing preoperative hematocrit. When using complete case analysis, only 4,311 patients were studied. The removed patients were significantly younger, healthier, of a common body mass index, and male. Logistic regression analysis failed to identify either preoperative hypoalbuminemia or preoperative anemia as significantly associated with adverse outcomes. When employing multiple imputation, all 11,999 patients were included. Preoperative hypoalbuminemia was significantly associated with the occurrence of any adverse event and severe adverse events. Preoperative anemia was significantly associated with the

  5. Development of Gridded Ensemble Precipitation and Temperature Datasets for the Contiguous United States Plus Hawai'i and Alaska

    Science.gov (United States)

    Newman, A. J.; Clark, M. P.; Nijssen, B.; Wood, A.; Gutmann, E. D.; Mizukami, N.; Longman, R. J.; Giambelluca, T. W.; Cherry, J.; Nowak, K.; Arnold, J.; Prein, A. F.

    2016-12-01

    Gridded precipitation and temperature products are inherently uncertain due to myriad factors. These include interpolation from a sparse observation network, measurement representativeness, and measurement errors. Despite this inherent uncertainty, uncertainty is typically not included, or is a specific addition to each dataset without much general applicability across different datasets. A lack of quantitative uncertainty estimates for hydrometeorological forcing fields limits their utility to support land surface and hydrologic modeling techniques such as data assimilation, probabilistic forecasting and verification. To address this gap, we have developed a first of its kind gridded, observation-based ensemble of precipitation and temperature at a daily increment for the period 1980-2012 over the United States (including Alaska and Hawaii). A longer, higher resolution version (1970-present, 1/16th degree) has also been implemented to support real-time hydrologic- monitoring and prediction in several regional US domains. We will present the development and evaluation of the dataset, along with initial applications of the dataset for ensemble data assimilation and probabilistic evaluation of high resolution regional climate model simulations. We will also present results on the new high resolution products for Alaska and Hawaii (2 km and 250 m respectively), to complete the first ensemble observation based product suite for the entire 50 states. Finally, we will present plans to improve the ensemble dataset, focusing on efforts to improve the methods used for station interpolation and ensemble generation, as well as methods to fuse station data with numerical weather prediction model output.

  6. Would the ‘real’ observed dataset stand up? A critical examination of eight observed gridded climate datasets for China

    International Nuclear Information System (INIS)

    Sun, Qiaohong; Miao, Chiyuan; Duan, Qingyun; Kong, Dongxian; Ye, Aizhong; Di, Zhenhua; Gong, Wei

    2014-01-01

    This research compared and evaluated the spatio-temporal similarities and differences of eight widely used gridded datasets. The datasets include daily precipitation over East Asia (EA), the Climate Research Unit (CRU) product, the Global Precipitation Climatology Centre (GPCC) product, the University of Delaware (UDEL) product, Precipitation Reconstruction over Land (PREC/L), the Asian Precipitation Highly Resolved Observational (APHRO) product, the Institute of Atmospheric Physics (IAP) dataset from the Chinese Academy of Sciences, and the National Meteorological Information Center dataset from the China Meteorological Administration (CN05). The meteorological variables focus on surface air temperature (SAT) or precipitation (PR) in China. All datasets presented general agreement on the whole spatio-temporal scale, but some differences appeared for specific periods and regions. On a temporal scale, EA shows the highest amount of PR, while APHRO shows the lowest. CRU and UDEL show higher SAT than IAP or CN05. On a spatial scale, the most significant differences occur in western China for PR and SAT. For PR, the difference between EA and CRU is the largest. When compared with CN05, CRU shows higher SAT in the central and southern Northwest river drainage basin, UDEL exhibits higher SAT over the Southwest river drainage system, and IAP has lower SAT in the Tibetan Plateau. The differences in annual mean PR and SAT primarily come from summer and winter, respectively. Finally, potential factors impacting agreement among gridded climate datasets are discussed, including raw data sources, quality control (QC) schemes, orographic correction, and interpolation techniques. The implications and challenges of these results for climate research are also briefly addressed. (paper)

  7. A NEARLY VOLUME-COMPLETE SPECTROSCOPIC SURVEY OF THE CLOSESTMID-TO-LATE M DWARFS

    Science.gov (United States)

    Winters, Jennifer; Irwin, Jonathan; Newton, Elisabeth; Charbonneau, David; Latham, David W.; Mink, Jessica; Esquerdo, Gil; Berlind, Perry; Calkins, Mike

    2018-01-01

    Recent results from Kepler estimate that M dwarfs harbor 2.5 planets per star. Yet, we will understand our exoplanet discoveries only as well as we understand their host stars, and much remains unknown about our low-mass stellar neighbors, such as their kinematics, ages, and multiplicity. A nearly volume-complete sample of M dwarfs lies within 15 pc of the Sun, and it is only for planets orbiting these nearest and smallest stars that thorough follow-up work for characterization will be possible. Unfortunately, more than half of this sample have only low-resolution (R SMARTS) 1.5m. We present here results from year one of our TRES survey. We have measured radial velocities, rotational broadening, and H-alpha equivalent widths for 305 mid-to-late M dwarfs. We have discovered five new spectroscopic binaries, one of which is a rare M dwarf - (likely) brown dwarf binary within 10 pc, for which we have determined the orbit.Our survey more than doubles the number of mid-M dwarfs within 15 pc with complete high-resolution spectroscopic and trigonometric characterization. We hope to provide a legacy dataset for the use of future generations of astronomers.This work is being supported by grants from the National Science Foundation and the John Templeton Foundation.

  8. Using Real Datasets for Interdisciplinary Business/Economics Projects

    Science.gov (United States)

    Goel, Rajni; Straight, Ronald L.

    2005-01-01

    The workplace's global and dynamic nature allows and requires improved approaches for providing business and economics education. In this article, the authors explore ways of enhancing students' understanding of course material by using nontraditional, real-world datasets of particular interest to them. Teaching at a historically Black university,…

  9. Dataset-driven research for improving recommender systems for learning

    NARCIS (Netherlands)

    Verbert, Katrien; Drachsler, Hendrik; Manouselis, Nikos; Wolpers, Martin; Vuorikari, Riina; Duval, Erik

    2011-01-01

    Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M., Vuorikari, R., & Duval, E. (2011). Dataset-driven research for improving recommender systems for learning. In Ph. Long, & G. Siemens (Eds.), Proceedings of 1st International Conference Learning Analytics & Knowledge (pp. 44-53). February,

  10. dataTEL - Datasets for Technology Enhanced Learning

    NARCIS (Netherlands)

    Drachsler, Hendrik; Verbert, Katrien; Sicilia, Miguel-Angel; Wolpers, Martin; Manouselis, Nikos; Vuorikari, Riina; Lindstaedt, Stefanie; Fischer, Frank

    2011-01-01

    Drachsler, H., Verbert, K., Sicilia, M. A., Wolpers, M., Manouselis, N., Vuorikari, R., Lindstaedt, S., & Fischer, F. (2011). dataTEL - Datasets for Technology Enhanced Learning. STELLAR Alpine Rendez-Vous White Paper. Alpine Rendez-Vous 2011 White paper collection, Nr. 13., France (2011)

  11. A dataset of forest biomass structure for Eurasia.

    Science.gov (United States)

    Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael

    2017-05-16

    The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930-2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.

  12. A reanalysis dataset of the South China Sea

    Science.gov (United States)

    Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

    2014-01-01

    Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992–2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability. PMID:25977803

  13. Comparision of analysis of the QTLMAS XII common dataset

    DEFF Research Database (Denmark)

    Crooks, Lucy; Sahana, Goutam; de Koning, Dirk-Jan

    2009-01-01

    As part of the QTLMAS XII workshop, a simulated dataset was distributed and participants were invited to submit analyses of the data based on genome-wide association, fine mapping and genomic selection. We have evaluated the findings from the groups that reported fine mapping and genome-wide asso...

  14. The LAMBADA dataset: Word prediction requiring a broad discourse context

    NARCIS (Netherlands)

    Paperno, D.; Kruszewski, G.; Lazaridou, A.; Pham, Q.N.; Bernardi, R.; Pezzelle, S.; Baroni, M.; Boleda, G.; Fernández, R.; Erk, K.; Smith, N.A.

    2016-01-01

    We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the

  15. NEW WEB-BASED ACCESS TO NUCLEAR STRUCTURE DATASETS.

    Energy Technology Data Exchange (ETDEWEB)

    WINCHELL,D.F.

    2004-09-26

    As part of an effort to migrate the National Nuclear Data Center (NNDC) databases to a relational platform, a new web interface has been developed for the dissemination of the nuclear structure datasets stored in the Evaluated Nuclear Structure Data File and Experimental Unevaluated Nuclear Data List.

  16. Cross-Cultural Concept Mapping of Standardized Datasets

    DEFF Research Database (Denmark)

    Kano Glückstad, Fumiko

    2012-01-01

    This work compares four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain [1]. Here, datasets based...

  17. Level-1 muon trigger performance with the full 2017 dataset

    CERN Document Server

    CMS Collaboration

    2018-01-01

    This document describes the performance of the CMS Level-1 Muon Trigger with the full dataset of 2017. Efficiency plots are included for each track finder (TF) individually and for the system as a whole. The efficiency is measured to be greater than 90% for all track finders.

  18. A Dataset for Visual Navigation with Neuromorphic Methods

    Directory of Open Access Journals (Sweden)

    Francisco eBarranco

    2016-02-01

    Full Text Available Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets.

  19. Evaluation of Uncertainty in Precipitation Datasets for New Mexico, USA

    Science.gov (United States)

    Besha, A. A.; Steele, C. M.; Fernald, A.

    2014-12-01

    Climate change, population growth and other factors are endangering water availability and sustainability in semiarid/arid areas particularly in the southwestern United States. Wide coverage of spatial and temporal measurements of precipitation are key for regional water budget analysis and hydrological operations which themselves are valuable tool for water resource planning and management. Rain gauge measurements are usually reliable and accurate at a point. They measure rainfall continuously, but spatial sampling is limited. Ground based radar and satellite remotely sensed precipitation have wide spatial and temporal coverage. However, these measurements are indirect and subject to errors because of equipment, meteorological variability, the heterogeneity of the land surface itself and lack of regular recording. This study seeks to understand precipitation uncertainty and in doing so, lessen uncertainty propagation into hydrological applications and operations. We reviewed, compared and evaluated the TRMM (Tropical Rainfall Measuring Mission) precipitation products, NOAA's (National Oceanic and Atmospheric Administration) Global Precipitation Climatology Centre (GPCC) monthly precipitation dataset, PRISM (Parameter elevation Regression on Independent Slopes Model) data and data from individual climate stations including Cooperative Observer Program (COOP), Remote Automated Weather Stations (RAWS), Soil Climate Analysis Network (SCAN) and Snowpack Telemetry (SNOTEL) stations. Though not yet finalized, this study finds that the uncertainty within precipitation estimates datasets is influenced by regional topography, season, climate and precipitation rate. Ongoing work aims to further evaluate precipitation datasets based on the relative influence of these phenomena so that we can identify the optimum datasets for input to statewide water budget analysis.

  20. Dataset: Multi Sensor-Orientation Movement Data of Goats

    NARCIS (Netherlands)

    Kamminga, Jacob Wilhelm

    2018-01-01

    This is a labeled dataset. Motion data were collected from six sensor nodes that were fixed with different orientations to a collar around the neck of goats. These six sensor nodes simultaneously, with different orientations, recorded various activities performed by the goat. We recorded the

  1. A dataset of human decision-making in teamwork management

    Science.gov (United States)

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-01

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.

  2. UK surveillance: provision of quality assured information from combined datasets.

    Science.gov (United States)

    Paiba, G A; Roberts, S R; Houston, C W; Williams, E C; Smith, L H; Gibbens, J C; Holdship, S; Lysons, R

    2007-09-14

    Surveillance information is most useful when provided within a risk framework, which is achieved by presenting results against an appropriate denominator. Often the datasets are captured separately and for different purposes, and will have inherent errors and biases that can be further confounded by the act of merging. The United Kingdom Rapid Analysis and Detection of Animal-related Risks (RADAR) system contains data from several sources and provides both data extracts for research purposes and reports for wider stakeholders. Considerable efforts are made to optimise the data in RADAR during the Extraction, Transformation and Loading (ETL) process. Despite efforts to ensure data quality, the final dataset inevitably contains some data errors and biases, most of which cannot be rectified during subsequent analysis. So, in order for users to establish the 'fitness for purpose' of data merged from more than one data source, Quality Statements are produced as defined within the overarching surveillance Quality Framework. These documents detail identified data errors and biases following ETL and report construction as well as relevant aspects of the datasets from which the data originated. This paper illustrates these issues using RADAR datasets, and describes how they can be minimised.

  3. participatory development of a minimum dataset for the khayelitsha ...

    African Journals Online (AJOL)

    This dataset was integrated with data requirements at ... model for defining health information needs at district level. This participatory process has enabled health workers to appraise their .... of reproductive health, mental health, disability and community ... each chose a facilitator and met in between the forum meetings.

  4. Comparision of analysis of the QTLMAS XII common dataset

    DEFF Research Database (Denmark)

    Lund, Mogens Sandø; Sahana, Goutam; de Koning, Dirk-Jan

    2009-01-01

    A dataset was simulated and distributed to participants of the QTLMAS XII workshop who were invited to develop genomic selection models. Each contributing group was asked to describe the model development and validation as well as to submit genomic predictions for three generations of individuals...

  5. The DIRAC Data Management System and the Gaudi dataset federation

    CERN Document Server

    Haen, Christophe; Frank, Markus; Tsaregorodtsev, Andrei

    2015-01-01

    The DIRAC Interware provides a development framework and a complete set of components for building distributed computing systems. The DIRAC Data Management System (DMS) offers all the necessary tools to ensure data handling operations for small and large user communities. It supports transparent access to storage resources based on multiple technologies, and is easily expandable. The information on data files and replicas is kept in a File Catalog of which DIRAC offers a powerful and versatile implementation (DFC). Data movement can be performed using third party services including FTS3. Bulk data operations are resilient with respect to failures due to the use of the Request Management System (RMS) that keeps track of ongoing tasks.In this contribution we will present an overview of the DIRAC DMS capabilities and its connection with other DIRAC subsystems such as the Transformation System. This paper also focuses on the DIRAC File Catalog, for which a lot of new developments have been carried out, so that LH...

  6. Data Recommender: An Alternative Way to Discover Open Scientific Datasets

    Science.gov (United States)

    Klump, J. F.; Devaraju, A.; Williams, G.; Hogan, D.; Davy, R.; Page, J.; Singh, D.; Peterson, N.

    2017-12-01

    Over the past few years, institutions and government agencies have adopted policies to openly release their data, which has resulted in huge amounts of open data becoming available on the web. When trying to discover the data, users face two challenges: an overload of choice and the limitations of the existing data search tools. On the one hand, there are too many datasets to choose from, and therefore, users need to spend considerable effort to find the datasets most relevant to their research. On the other hand, data portals commonly offer keyword and faceted search, which depend fully on the user queries to search and rank relevant datasets. Consequently, keyword and faceted search may return loosely related or irrelevant results, although the results may contain the same query. They may also return highly specific results that depend more on how well metadata was authored. They do not account well for variance in metadata due to variance in author styles and preferences. The top-ranked results may also come from the same data collection, and users are unlikely to discover new and interesting datasets. These search modes mainly suits users who can express their information needs in terms of the structure and terminology of the data portals, but may pose a challenge otherwise. The above challenges reflect that we need a solution that delivers the most relevant (i.e., similar and serendipitous) datasets to users, beyond the existing search functionalities on the portals. A recommender system is an information filtering system that presents users with relevant and interesting contents based on users' context and preferences. Delivering data recommendations to users can make data discovery easier, and as a result may enhance user engagement with the portal. We developed a hybrid data recommendation approach for the CSIRO Data Access Portal. The approach leverages existing recommendation techniques (e.g., content-based filtering and item co-occurrence) to produce

  7. ROBO-AO M DWARF MULTIPLICITY SURVEY

    Science.gov (United States)

    Lamman, Claire; Berta-Thompson, Zachory; Baranec, Christoph; Law, Nicholas; Schonhut, Jessica

    2018-01-01

    We analyzed over 7,000 observations from Robo-AO’s field M dwarf survey taken on the 2.1m Kitt Peak telescope. Results will help determine the multiplicity fraction of M dwarfs as a function of primary mass, which is a crucial step towards understanding their evolution and formation mechanics. Through its robotic, laser-guided, and automated system, the Robo-AO instrument has yielded the largest adaptive-optics M dwarf survey to date. I developed a graphical user interface to quickly analyze this data. Initial data analysis included assessing data quality, checking the result from Robo-AO’s automatic reduction pipeline, and determining existence as well as the relative position of companions through a visual inspection. This program can be applied to other datasets and was successfully tested by re-analyzing observations from a separate Robo-AO survey. Following the preliminary results from this data analysis tool, further observations were done with the Keck II telescope by using its NIRC2 imager to follow up on ten select targets for the existence and physical association of companions. After a conservative initial cut for quality, 356 companions were found within 4” of a primary star out of 2,746 high quality Robo-AO M dwarf observations, including four triple systems. We will present a preliminary estimate for the multiplicity rate of wide M dwarf companions after accounting for observation limitations and the completeness of our search. Future research will yield insights into low-mass stellar formation and provide a database of nearby M dwarf multiples that will potentially assist ongoing and future surveys for planets around these stars, such as the NASA TESS mission.

  8. Curability of Multiple Myeloma

    Directory of Open Access Journals (Sweden)

    Raymond Alexanian

    2012-01-01

    Full Text Available Among 792 patients with multiple myeloma treated from 1987 to 2010 and assessed after 18 months, there were 167 patients with complete remission. For those 60 patients treated between 1987–1998 and with long followup, the latest relapse occurred after 11.8 years, so that 13 patients have remained in sustained complete remission for longer than 12 years (range 12–22 years. These results suggest that 3% of all patients treated during that period may be cured of multiple myeloma. In addition to immunofixation, more sensitive techniques for the detection of residual disease should be applied more consistently in patients with apparent complete remission in order to identify those with potential cure.

  9. Comparison of global 3-D aviation emissions datasets

    Directory of Open Access Journals (Sweden)

    S. C. Olsen

    2013-01-01

    Full Text Available Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NOx, sulfur oxides, carbon monoxide (CO, and hydrocarbons (HC impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NOx emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NOx emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly smaller in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NOx with regional peaks over the populated land masses of North America, Europe, and East Asia. For CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution

  10. Geoseq: a tool for dissecting deep-sequencing datasets

    Directory of Open Access Journals (Sweden)

    Homann Robert

    2010-10-01

    Full Text Available Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO, Sequence Read Archive (SRA hosted by the NCBI, or the DNA Data Bank of Japan (ddbj. Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a identify differential isoform expression in mRNA-seq datasets, b identify miRNAs (microRNAs in libraries, and identify mature and star sequences in miRNAS and c to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  11. Completeness theorems in transport theory

    International Nuclear Information System (INIS)

    Zweifel, P.F.

    1984-01-01

    Ever since K. M.; Case's famous 1960 paper, transport theorists have been studying the questions of full- and half-range completeness for various transport type equations. The purpose of this note is to try to define exactly what is meant by completeness as it is needed, and used, in solving transport equations and to discuss some of the various techniques which have been, or might be, used to verify completeness. Attention is restricted to the question of full-range completeness. As a paradigm the generalized form of the transport equation first introduced by Beals is adopted

  12. Total ozone trends from 1979 to 2016 derived from five merged observational datasets - the emergence into ozone recovery

    Science.gov (United States)

    Weber, Mark; Coldewey-Egbers, Melanie; Fioletov, Vitali E.; Frith, Stacey M.; Wild, Jeannette D.; Burrows, John P.; Long, Craig S.; Loyola, Diego

    2018-02-01

    We report on updated trends using different merged datasets from satellite and ground-based observations for the period from 1979 to 2016. Trends were determined by applying a multiple linear regression (MLR) to annual mean zonal mean data. Merged datasets used here include NASA MOD v8.6 and National Oceanic and Atmospheric Administration (NOAA) merge v8.6, both based on data from the series of Solar Backscatter UltraViolet (SBUV) and SBUV-2 satellite instruments (1978-present) as well as the Global Ozone Monitoring Experiment (GOME)-type Total Ozone (GTO) and GOME-SCIAMACHY-GOME-2 (GSG) merged datasets (1995-present), mainly comprising satellite data from GOME, the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY), and GOME-2A. The fifth dataset consists of the monthly mean zonal mean data from ground-based measurements collected at World Ozone and UV Data Center (WOUDC). The addition of four more years of data since the last World Meteorological Organization (WMO) ozone assessment (2013-2016) shows that for most datasets and regions the trends since the stratospheric halogen reached its maximum (˜ 1996 globally and ˜ 2000 in polar regions) are mostly not significantly different from zero. However, for some latitudes, in particular the Southern Hemisphere extratropics and Northern Hemisphere subtropics, several datasets show small positive trends of slightly below +1 % decade-1 that are barely statistically significant at the 2σ uncertainty level. In the tropics, only two datasets show significant trends of +0.5 to +0.8 % decade-1, while the others show near-zero trends. Positive trends since 2000 have been observed over Antarctica in September, but near-zero trends are found in October as well as in March over the Arctic. Uncertainties due to possible drifts between the datasets, from the merging procedure used to combine satellite datasets and related to the low sampling of ground-based data, are not accounted for in the trend

  13. Multiple sclerosis

    International Nuclear Information System (INIS)

    Grunwald, I.Q.; Kuehn, A.L.; Backens, M.; Papanagiotou, P.; Shariat, K.; Kostopoulos, P.

    2008-01-01

    Multiple sclerosis is the most common chronic inflammatory disease of myelin with interspersed lesions in the white matter of the central nervous system. Magnetic resonance imaging (MRI) plays a key role in the diagnosis and monitoring of white matter diseases. This article focuses on key findings in multiple sclerosis as detected by MRI. (orig.) [de

  14. Multiplicative properties of quantum channels

    Science.gov (United States)

    Rahaman, Mizanur

    2017-08-01

    In this paper, we study the multiplicative behaviour of quantum channels, mathematically described by trace preserving, completely positive maps on matrix algebras. It turns out that the multiplicative domain of a unital quantum channel has a close connection to its spectral properties. A structure theorem (theorem 2.5), which reveals the automorphic property of an arbitrary unital quantum channel on a subalgebra, is presented. Various classes of quantum channels (irreducible, primitive, etc) are then analysed in terms of this stabilising subalgebra. The notion of the multiplicative index of a unital quantum channel is introduced, which measures the number of times a unital channel needs to be composed with itself for the multiplicative algebra to stabilise. We show that the maps that have trivial multiplicative domains are dense in completely bounded norm topology in the set of all unital completely positive maps. Some applications in quantum information theory are discussed.

  15. Latino College Completion: New York

    Science.gov (United States)

    Excelencia in Education (NJ1), 2012

    2012-01-01

    In 2009, Excelencia in Education launched the Ensuring America's Future initiative to inform, organize, and engage leaders in a tactical plan to increase Latino college completion. An executive summary of Latino College Completion in 50 states synthesizes information on 50 state factsheets and builds on the national benchmarking guide. Each…

  16. Latino College Completion: United States

    Science.gov (United States)

    Excelencia in Education (NJ1), 2012

    2012-01-01

    In 2009, Excelencia in Education launched the Ensuring America's Future initiative to inform, organize, and engage leaders in a tactical plan to increase Latino college completion. An executive summary of Latino College Completion in 50 states synthesizes information on 50 state factsheets and builds on the national benchmarking guide. Each…

  17. Latino College Completion: South Dakota

    Science.gov (United States)

    Excelencia in Education (NJ1), 2012

    2012-01-01

    In 2009, Excelencia in Education launched the Ensuring America's Future initiative to inform, organize, and engage leaders in a tactical plan to increase Latino college completion. An executive summary of Latino College Completion in 50 states synthesizes information on 50 state factsheets and builds on the national benchmarking guide. Each…

  18. Latino College Completion: North Dakota

    Science.gov (United States)

    Excelencia in Education (NJ1), 2012

    2012-01-01

    In 2009, Excelencia in Education launched the Ensuring America's Future initiative to inform, organize, and engage leaders in a tactical plan to increase Latino college completion. An executive summary of Latino College Completion in 50 states synthesizes information on 50 state factsheets and builds on the national benchmarking guide. Each…

  19. Latino College Completion: New Mexico

    Science.gov (United States)

    Excelencia in Education (NJ1), 2012

    2012-01-01

    In 2009, Excelencia in Education launched the Ensuring America's Future initiative to inform, organize, and engage leaders in a tactical plan to increase Latino college completion. An executive summary of Latino College Completion in 50 states synthesizes information on 50 state factsheets and builds on the national benchmarking guide. Each…

  20. Neutron-multiplication measurement instrument

    Energy Technology Data Exchange (ETDEWEB)

    Nixon, K.V.; Dowdy, E.J.; France, S.W.; Millegan, D.R.; Robba, A.A.

    1982-01-01

    The Advanced Nuclear Technology Group of the Los Alamos National Laboratory is now using intelligent data-acquisition and analysis instrumentation for determining the multiplication of nuclear material. Earlier instrumentation, such as the large NIM-crate systems, depended on house power and required additional computation to determine multiplication or to estimate error. The portable, battery-powered multiplication measurement unit, with advanced computational power, acquires data, calculates multiplication, and completes error analysis automatically. Thus, the multiplication is determined easily and an available error estimate enables the user to judge the significance of results.

  1. Neutron-multiplication measurement instrument

    International Nuclear Information System (INIS)

    Nixon, K.V.; Dowdy, E.J.; France, S.W.; Millegan, D.R.; Robba, A.A.

    1982-01-01

    The Advanced Nuclear Technology Group of the Los Alamos National Laboratory is now using intelligent data-acquisition and analysis instrumentation for determining the multiplication of nuclear material. Earlier instrumentation, such as the large NIM-crate systems, depended on house power and required additional computation to determine multiplication or to estimate error. The portable, battery-powered multiplication measurement unit, with advanced computational power, acquires data, calculates multiplication, and completes error analysis automatically. Thus, the multiplication is determined easily and an available error estimate enables the user to judge the significance of results

  2. The MetabolomeExpress Project: enabling web-based processing, analysis and transparent dissemination of GC/MS metabolomics datasets

    Directory of Open Access Journals (Sweden)

    Carroll Adam J

    2010-07-01

    Full Text Available Abstract Background Standardization of analytical approaches and reporting methods via community-wide collaboration can work synergistically with web-tool development to result in rapid community-driven expansion of online data repositories suitable for data mining and meta-analysis. In metabolomics, the inter-laboratory reproducibility of gas-chromatography/mass-spectrometry (GC/MS makes it an obvious target for such development. While a number of web-tools offer access to datasets and/or tools for raw data processing and statistical analysis, none of these systems are currently set up to act as a public repository by easily accepting, processing and presenting publicly submitted GC/MS metabolomics datasets for public re-analysis. Description Here, we present MetabolomeExpress, a new File Transfer Protocol (FTP server and web-tool for the online storage, processing, visualisation and statistical re-analysis of publicly submitted GC/MS metabolomics datasets. Users may search a quality-controlled database of metabolite response statistics from publicly submitted datasets by a number of parameters (eg. metabolite, species, organ/biofluid etc.. Users may also perform meta-analysis comparisons of multiple independent experiments or re-analyse public primary datasets via user-friendly tools for t-test, principal components analysis, hierarchical cluster analysis and correlation analysis. They may interact with chromatograms, mass spectra and peak detection results via an integrated raw data viewer. Researchers who register for a free account may upload (via FTP their own data to the server for online processing via a novel raw data processing pipeline. Conclusions MetabolomeExpress https://www.metabolome-express.org provides a new opportunity for the general metabolomics community to transparently present online the raw and processed GC/MS data underlying their metabolomics publications. Transparent sharing of these data will allow researchers to

  3. SatelliteDL: a Toolkit for Analysis of Heterogeneous Satellite Datasets

    Science.gov (United States)

    Galloy, M. D.; Fillmore, D.

    2014-12-01

    SatelliteDL is an IDL toolkit for the analysis of satellite Earth observations from a diverse set of platforms and sensors. The core function of the toolkit is the spatial and temporal alignment of satellite swath and geostationary data. The design features an abstraction layer that allows for easy inclusion of new datasets in a modular way. Our overarching objective is to create utilities that automate the mundane aspects of satellite data analysis, are extensible and maintainable, and do not place limitations on the analysis itself. IDL has a powerful suite of statistical and visualization tools that can be used in conjunction with SatelliteDL. Toward this end we have constructed SatelliteDL to include (1) HTML and LaTeX API document generation,(2) a unit test framework,(3) automatic message and error logs,(4) HTML and LaTeX plot and table generation, and(5) several real world examples with bundled datasets available for download. For ease of use, datasets, variables and optional workflows may be specified in a flexible format configuration file. Configuration statements may specify, for example, a region and date range, and the creation of images, plots and statistical summary tables for a long list of variables. SatelliteDL enforces data provenance; all data should be traceable and reproducible. The output NetCDF file metadata holds a complete history of the original datasets and their transformations, and a method exists to reconstruct a configuration file from this information. Release 0.1.0 distributes with ingest methods for GOES, MODIS, VIIRS and CERES radiance data (L1) as well as select 2D atmosphere products (L2) such as aerosol and cloud (MODIS and VIIRS) and radiant flux (CERES). Future releases will provide ingest methods for ocean and land surface products, gridded and time averaged datasets (L3 Daily, Monthly and Yearly), and support for 3D products such as temperature and water vapor profiles. Emphasis will be on NPP Sensor, Environmental and

  4. Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization

    Directory of Open Access Journals (Sweden)

    Cameron Christopher JF

    2010-10-01

    Full Text Available Abstract This paper demonstrates how a Neural Grammar Network learns to classify and score molecules for a variety of tasks in chemistry and toxicology. In addition to a more detailed analysis on datasets previously studied, we introduce three new datasets (BBB, FXa, and toxicology to show the generality of the approach. A new experimental methodology is developed and applied to both the new datasets as well as previously studied datasets. This methodology is rigorous and statistically grounded, and ultimately culminates in a Wilcoxon significance test that proves the effectiveness of the system. We further include a complete generalization of the specific technique to arbitrary grammars and datasets using a mathematical abstraction that allows researchers in different domains to apply the method to their own work. Background Our work can be viewed as an alternative to existing methods to solve the quantitative structure-activity relationship (QSAR problem. To this end, we review a number approaches both from a methodological and also a performance perspective. In addition to these approaches, we also examined a number of chemical properties that can be used by generic classifier systems, such as feed-forward artificial neural networks. In studying these approaches, we identified a set of interesting benchmark problem sets to which many of the above approaches had been applied. These included: ACE, AChE, AR, BBB, BZR, Cox2, DHFR, ER, FXa, GPB, Therm, and Thr. Finally, we developed our own benchmark set by collecting data on toxicology. Results Our results show that our system performs better than, or comparatively to, the existing methods over a broad range of problem types. Our method does not require the expert knowledge that is necessary to apply the other methods to novel problems. Conclusions We conclude that our success is due to the ability of our system to: 1 encode molecules losslessly before presentation to the learning system, and 2

  5. Complexity of Products of Some Complete and Complete Bipartite Graphs

    Directory of Open Access Journals (Sweden)

    S. N. Daoud

    2013-01-01

    Full Text Available The number of spanning trees in graphs (networks is an important invariant; it is also an important measure of reliability of a network. In this paper, we derive simple formulas of the complexity, number of spanning trees, of products of some complete and complete bipartite graphs such as cartesian product, normal product, composition product, tensor product, and symmetric product, using linear algebra and matrix analysis techniques.

  6. Multiple homicides.

    Science.gov (United States)

    Copeland, A R

    1989-09-01

    A study of multiple homicides or multiple deaths involving a solitary incident of violence by another individual was performed on the case files of the Office of the Medical Examiner of Metropolitan Dade County in Miami, Florida, during 1983-1987. A total of 107 multiple homicides were studied: 88 double, 17 triple, one quadruple, and one quintuple. The 236 victims were analyzed regarding age, race, sex, cause of death, toxicologic data, perpetrator, locale of the incident, and reason for the incident. This article compares this type of slaying with other types of homicide including those perpetrated by serial killers. Suggestions for future research in this field are offered.

  7. A multimodal MRI dataset of professional chess players.

    Science.gov (United States)

    Li, Kaiming; Jiang, Jing; Qiu, Lihua; Yang, Xun; Huang, Xiaoqi; Lui, Su; Gong, Qiyong

    2015-01-01

    Chess is a good model to study high-level human brain functions such as spatial cognition, memory, planning, learning and problem solving. Recent studies have demonstrated that non-invasive MRI techniques are valuable for researchers to investigate the underlying neural mechanism of playing chess. For professional chess players (e.g., chess grand masters and masters or GM/Ms), what are the structural and functional alterations due to long-term professional practice, and how these alterations relate to behavior, are largely veiled. Here, we report a multimodal MRI dataset from 29 professional Chinese chess players (most of whom are GM/Ms), and 29 age matched novices. We hope that this dataset will provide researchers with new materials to further explore high-level human brain functions.

  8. Knowledge discovery with classification rules in a cardiovascular dataset.

    Science.gov (United States)

    Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan

    2005-12-01

    In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.

  9. Augmented Reality Prototype for Visualizing Large Sensors’ Datasets

    Directory of Open Access Journals (Sweden)

    Folorunso Olufemi A.

    2011-04-01

    Full Text Available This paper addressed the development of an augmented reality (AR based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations which made data exploration and visualisation daunting tasks. Therefore a model to manage such data and enhance computational support needed for effective explorations are developed in this paper. A challenge of this approach is to reduce the data inefficiency. This paper presented a model for computing information gain for each data attributes and determine a lead attribute.The computed lead attribute is then used for the development of an AR-based scientific visualization interface which automatically identifies, localises and visualizes all necessary data relevant to a particularly selected region of interest (ROI on the network. Necessary architectural system supports and the interface requirements for such visualizations are also presented.

  10. An integrated dataset for in silico drug discovery

    Directory of Open Access Journals (Sweden)

    Cockell Simon J

    2010-12-01

    Full Text Available Drug development is expensive and prone to failure. It is potentially much less risky and expensive to reuse a drug developed for one condition for treating a second disease, than it is to develop an entirely new compound. Systematic approaches to drug repositioning are needed to increase throughput and find candidates more reliably. Here we address this need with an integrated systems biology dataset, developed using the Ondex data integration platform, for the in silico discovery of new drug repositioning candidates. We demonstrate that the information in this dataset allows known repositioning examples to be discovered. We also propose a means of automating the search for new treatment indications of existing compounds.

  11. Application of Density Estimation Methods to Datasets from a Glider

    Science.gov (United States)

    2014-09-30

    humpback and sperm whales as well as different dolphin species. OBJECTIVES The objective of this research is to extend existing methods for cetacean...collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources...estimation from single sensor datasets. Required steps for a cue counting approach, where a cue has been defined as a clicking event (Küsel et al., 2011), to

  12. A review of continent scale hydrological datasets available for Africa

    OpenAIRE

    Bonsor, H.C.

    2010-01-01

    As rainfall becomes less reliable with predicted climate change the ability to assess the spatial and seasonal variations in groundwater availability on a large-scale (catchment and continent) is becoming increasingly important (Bates, et al. 2007; MacDonald et al. 2009). The scarcity of observed hydrological data, or difficulty in obtaining such data, within Africa means remotely sensed (RS) datasets must often be used to drive large-scale hydrological models. The different ap...

  13. Dataset of mitochondrial genome variants in oncocytic tumors

    Directory of Open Access Journals (Sweden)

    Lihua Lyu

    2018-04-01

    Full Text Available This dataset presents the mitochondrial genome variants associated with oncocytic tumors. These data were obtained by Sanger sequencing of the whole mitochondrial genomes of oncocytic tumors and the adjacent normal tissues from 32 patients. The mtDNA variants are identified after compared with the revised Cambridge sequence, excluding those defining haplogroups of our patients. The pathogenic prediction for the novel missense variants found in this study was performed with the Mitimpact 2 program.

  14. GLEAM version 3: Global Land Evaporation Datasets and Model

    Science.gov (United States)

    Martens, B.; Miralles, D. G.; Lievens, H.; van der Schalie, R.; de Jeu, R.; Fernandez-Prieto, D.; Verhoest, N.

    2015-12-01

    Terrestrial evaporation links energy, water and carbon cycles over land and is therefore a key variable of the climate system. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to limitations in in situ measurements. As a result, several methods have risen to estimate global patterns of land evaporation from satellite observations. However, these algorithms generally differ in their approach to model evaporation, resulting in large differences in their estimates. One of these methods is GLEAM, the Global Land Evaporation: the Amsterdam Methodology. GLEAM estimates terrestrial evaporation based on daily satellite observations of meteorological variables, vegetation characteristics and soil moisture. Since the publication of the first version of the algorithm (2011), the model has been widely applied to analyse trends in the water cycle and land-atmospheric feedbacks during extreme hydrometeorological events. A third version of the GLEAM global datasets is foreseen by the end of 2015. Given the relevance of having a continuous and reliable record of global-scale evaporation estimates for climate and hydrological research, the establishment of an online data portal to host these data to the public is also foreseen. In this new release of the GLEAM datasets, different components of the model have been updated, with the most significant change being the revision of the data assimilation algorithm. In this presentation, we will highlight the most important changes of the methodology and present three new GLEAM datasets and their validation against in situ observations and an alternative dataset of terrestrial evaporation (ERA-Land). Results of the validation exercise indicate that the magnitude and the spatiotemporal variability of the modelled evaporation agree reasonably well with the estimates of ERA-Land and the in situ

  15. Soil chemistry in lithologically diverse datasets: the quartz dilution effect

    Science.gov (United States)

    Bern, Carleton R.

    2009-01-01

    National- and continental-scale soil geochemical datasets are likely to move our understanding of broad soil geochemistry patterns forward significantly. Patterns of chemistry and mineralogy delineated from these datasets are strongly influenced by the composition of the soil parent material, which itself is largely a function of lithology and particle size sorting. Such controls present a challenge by obscuring subtler patterns arising from subsequent pedogenic processes. Here the effect of quartz concentration is examined in moist-climate soils from a pilot dataset of the North American Soil Geochemical Landscapes Project. Due to variable and high quartz contents (6.2–81.7 wt.%), and its residual and inert nature in soil, quartz is demonstrated to influence broad patterns in soil chemistry. A dilution effect is observed whereby concentrations of various elements are significantly and strongly negatively correlated with quartz. Quartz content drives artificial positive correlations between concentrations of some elements and obscures negative correlations between others. Unadjusted soil data show the highly mobile base cations Ca, Mg, and Na to be often strongly positively correlated with intermediately mobile Al or Fe, and generally uncorrelated with the relatively immobile high-field-strength elements (HFS) Ti and Nb. Both patterns are contrary to broad expectations for soils being weathered and leached. After transforming bulk soil chemistry to a quartz-free basis, the base cations are generally uncorrelated with Al and Fe, and negative correlations generally emerge with the HFS elements. Quartz-free element data may be a useful tool for elucidating patterns of weathering or parent-material chemistry in large soil datasets.

  16. Dataset on records of Hericium erinaceus in Slovakia

    OpenAIRE

    Vladimír Kunca; Marek Čiliak

    2017-01-01

    The data presented in this article are related to the research article entitled ?Habitat preferences of Hericium erinaceus in Slovakia? (Kunca and ?iliak, 2016) [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status,...

  17. Diffeomorphic Iterative Centroid Methods for Template Estimation on Large Datasets

    OpenAIRE

    Cury , Claire; Glaunès , Joan Alexis; Colliot , Olivier

    2014-01-01

    International audience; A common approach for analysis of anatomical variability relies on the stimation of a template representative of the population. The Large Deformation Diffeomorphic Metric Mapping is an attractive framework for that purpose. However, template estimation using LDDMM is computationally expensive, which is a limitation for the study of large datasets. This paper presents an iterative method which quickly provides a centroid of the population in the shape space. This centr...

  18. A Dataset from TIMSS to Examine the Relationship between Computer Use and Mathematics Achievement

    Science.gov (United States)

    Kadijevich, Djordje M.

    2015-01-01

    Because the relationship between computer use and achievement is still puzzling, there is a need to prepare and analyze good quality datasets on computer use and achievement. Such a dataset can be derived from TIMSS data. This paper describes how this dataset can be prepared. It also gives an example of how the dataset may be analyzed. The…

  19. An Analysis on Better Testing than Training Performances on the Iris Dataset

    NARCIS (Netherlands)

    Schutten, Marten; Wiering, Marco

    2016-01-01

    The Iris dataset is a well known dataset containing information on three different types of Iris flowers. A typical and popular method for solving classification problems on datasets such as the Iris set is the support vector machine (SVM). In order to do so the dataset is separated in a set used

  20. Trend of Complete Hydatidiform Mole

    Directory of Open Access Journals (Sweden)

    K Thapa

    2010-03-01

    Full Text Available INTRODUCTION: Complete Hydatidiform mole is one of the most frequent abnormal pregnancies. This review studies the trend of complete mole in Paropakar Maternity and Women's hospital and clinical ability to detect it. METHODS: This is a retrospective study of 504 cases of complete hydatidiform mole recorded at Paropakar maternity and women's hospital, Kathmandu, during 2058-2065 B.S. Medical records were reviewed and incidence, clinical presentation and method of diagnosis were studied. RESULTS: During the study period, there were 13,9117 births and 504 complete moles, 12 partial moles, 48 persistent gestational tumours, six choriocarcinoma and four invasive moles recorded in the hospital. The incidence of complete mole was one per 276 births. It was prevalent among women younger than 29 years (80% and among the primigravidae (36.7%. More than 90% women presented in the first half of their pregnancy and vaginal bleeding was the main complaint (68.3%. Suction evacuation, dilation and evacuation followed by sharp curettage and abdominal hysterectomy were performed in 80.6%, 17.6% and 1.2% of the women respectively. Persistent mole and choriocarcinoma developed in 9.5% and 0.4% respectively. CONCLUSIONS: Complete mole has the highest incidence. It affects mostly younger women and presents with vaginal bleeding most of the time, usually in the first half of their pregnancy. Keywords: complete hydatidiform mole, gestational trophoblastic disease, persistent gestational tumours.

  1. Parton Distributions based on a Maximally Consistent Dataset

    Science.gov (United States)

    Rojo, Juan

    2016-04-01

    The choice of data that enters a global QCD analysis can have a substantial impact on the resulting parton distributions and their predictions for collider observables. One of the main reasons for this has to do with the possible presence of inconsistencies, either internal within an experiment or external between different experiments. In order to assess the robustness of the global fit, different definitions of a conservative PDF set, that is, a PDF set based on a maximally consistent dataset, have been introduced. However, these approaches are typically affected by theory biases in the selection of the dataset. In this contribution, after a brief overview of recent NNPDF developments, we propose a new, fully objective, definition of a conservative PDF set, based on the Bayesian reweighting approach. Using the new NNPDF3.0 framework, we produce various conservative sets, which turn out to be mutually in agreement within the respective PDF uncertainties, as well as with the global fit. We explore some of their implications for LHC phenomenology, finding also good consistency with the global fit result. These results provide a non-trivial validation test of the new NNPDF3.0 fitting methodology, and indicate that possible inconsistencies in the fitted dataset do not affect substantially the global fit PDFs.

  2. New public dataset for spotting patterns in medieval document images

    Science.gov (United States)

    En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent

    2017-01-01

    With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.

  3. Kernel-based discriminant feature extraction using a representative dataset

    Science.gov (United States)

    Li, Honglin; Sancho Gomez, Jose-Luis; Ahalt, Stanley C.

    2002-07-01

    Discriminant Feature Extraction (DFE) is widely recognized as an important pre-processing step in classification applications. Most DFE algorithms are linear and thus can only explore the linear discriminant information among the different classes. Recently, there has been several promising attempts to develop nonlinear DFE algorithms, among which is Kernel-based Feature Extraction (KFE). The efficacy of KFE has been experimentally verified by both synthetic data and real problems. However, KFE has some known limitations. First, KFE does not work well for strongly overlapped data. Second, KFE employs all of the training set samples during the feature extraction phase, which can result in significant computation when applied to very large datasets. Finally, KFE can result in overfitting. In this paper, we propose a substantial improvement to KFE that overcomes the above limitations by using a representative dataset, which consists of critical points that are generated from data-editing techniques and centroid points that are determined by using the Frequency Sensitive Competitive Learning (FSCL) algorithm. Experiments show that this new KFE algorithm performs well on significantly overlapped datasets, and it also reduces computational complexity. Further, by controlling the number of centroids, the overfitting problem can be effectively alleviated.

  4. Decoys Selection in Benchmarking Datasets: Overview and Perspectives

    Science.gov (United States)

    Réau, Manon; Langenfeld, Florent; Zagury, Jean-François; Lagarde, Nathalie; Montes, Matthieu

    2018-01-01

    Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets. PMID:29416509

  5. ENHANCED DATA DISCOVERABILITY FOR IN SITU HYPERSPECTRAL DATASETS

    Directory of Open Access Journals (Sweden)

    B. Rasaiah

    2016-06-01

    Full Text Available Field spectroscopic metadata is a central component in the quality assurance, reliability, and discoverability of hyperspectral data and the products derived from it. Cataloguing, mining, and interoperability of these datasets rely upon the robustness of metadata protocols for field spectroscopy, and on the software architecture to support the exchange of these datasets. Currently no standard for in situ spectroscopy data or metadata protocols exist. This inhibits the effective sharing of growing volumes of in situ spectroscopy datasets, to exploit the benefits of integrating with the evolving range of data sharing platforms. A core metadataset for field spectroscopy was introduced by Rasaiah et al., (2011-2015 with extended support for specific applications. This paper presents a prototype model for an OGC and ISO compliant platform-independent metadata discovery service aligned to the specific requirements of field spectroscopy. In this study, a proof-of-concept metadata catalogue has been described and deployed in a cloud-based architecture as a demonstration of an operationalized field spectroscopy metadata standard and web-based discovery service.

  6. Multiresolution persistent homology for excessively large biomolecular datasets

    Energy Technology Data Exchange (ETDEWEB)

    Xia, Kelin; Zhao, Zhixiong [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Wei, Guo-Wei, E-mail: wei@math.msu.edu [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (United States)

    2015-10-07

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  7. Tissue-Based MRI Intensity Standardization: Application to Multicentric Datasets

    Directory of Open Access Journals (Sweden)

    Nicolas Robitaille

    2012-01-01

    Full Text Available Intensity standardization in MRI aims at correcting scanner-dependent intensity variations. Existing simple and robust techniques aim at matching the input image histogram onto a standard, while we think that standardization should aim at matching spatially corresponding tissue intensities. In this study, we present a novel automatic technique, called STI for STandardization of Intensities, which not only shares the simplicity and robustness of histogram-matching techniques, but also incorporates tissue spatial intensity information. STI uses joint intensity histograms to determine intensity correspondence in each tissue between the input and standard images. We compared STI to an existing histogram-matching technique on two multicentric datasets, Pilot E-ADNI and ADNI, by measuring the intensity error with respect to the standard image after performing nonlinear registration. The Pilot E-ADNI dataset consisted in 3 subjects each scanned in 7 different sites. The ADNI dataset consisted in 795 subjects scanned in more than 50 different sites. STI was superior to the histogram-matching technique, showing significantly better intensity matching for the brain white matter with respect to the standard image.

  8. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander; Mularoni, Loris; Cope, Leslie M.; Medvedeva, Yulia; Mironov, Andrey A.; Makeev, Vsevolod J.; Wheelan, Sarah J.

    2012-01-01

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  9. Image segmentation evaluation for very-large datasets

    Science.gov (United States)

    Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

    2016-03-01

    With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.

  10. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander

    2012-05-31

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  11. A cross-country Exchange Market Pressure (EMP dataset

    Directory of Open Access Journals (Sweden)

    Mohit Desai

    2017-06-01

    Full Text Available The data presented in this article are related to the research article titled - “An exchange market pressure measure for cross country analysis” (Patnaik et al. [1]. In this article, we present the dataset for Exchange Market Pressure values (EMP for 139 countries along with their conversion factors, ρ (rho. Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values for the point estimates of ρ’s. Using the standard errors of estimates of ρ’s, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.

  12. Multiple Sclerosis

    Science.gov (United States)

    Multiple sclerosis (MS) is a nervous system disease that affects your brain and spinal cord. It damages the myelin sheath, the material that surrounds and protects your nerve cells. This damage slows down ...

  13. Multiple myeloma.

    LENUS (Irish Health Repository)

    Collins, Conor D

    2012-02-01

    Advances in the imaging and treatment of multiple myeloma have occurred over the past decade. This article summarises the current status and highlights how an understanding of both is necessary for optimum management.

  14. Multiple mononeuropathy

    Science.gov (United States)

    ... with multiple mononeuropathy are prone to new nerve injuries at pressure points such as the knees and elbows. They should avoid putting pressure on these areas, for example, by not leaning on the elbows, crossing the knees, ...

  15. A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery.

    Science.gov (United States)

    Ahmidi, Narges; Tao, Lingling; Sefati, Shahin; Gao, Yixin; Lea, Colin; Haro, Benjamin Bejar; Zappella, Luca; Khudanpur, Sanjeev; Vidal, Rene; Hager, Gregory D

    2017-09-01

    State-of-the-art techniques for surgical data analysis report promising results for automated skill assessment and action recognition. The contributions of many of these techniques, however, are limited to study-specific data and validation metrics, making assessment of progress across the field extremely challenging. In this paper, we address two major problems for surgical data analysis: First, lack of uniform-shared datasets and benchmarks, and second, lack of consistent validation processes. We address the former by presenting the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a public dataset that we have created to support comparative research benchmarking. JIGSAWS contains synchronized video and kinematic data from multiple performances of robotic surgical tasks by operators of varying skill. We address the latter by presenting a well-documented evaluation methodology and reporting results for six techniques for automated segmentation and classification of time-series data on JIGSAWS. These techniques comprise four temporal approaches for joint segmentation and classification: hidden Markov model, sparse hidden Markov model (HMM), Markov semi-Markov conditional random field, and skip-chain conditional random field; and two feature-based ones that aim to classify fixed segments: bag of spatiotemporal features and linear dynamical systems. Most methods recognize gesture activities with approximately 80% overall accuracy under both leave-one-super-trial-out and leave-one-user-out cross-validation settings. Current methods show promising results on this shared dataset, but room for significant progress remains, particularly for consistent prediction of gesture activities across different surgeons. The results reported in this paper provide the first systematic and uniform evaluation of surgical activity recognition techniques on the benchmark database.

  16. Large Hadron Collider nears completion

    CERN Multimedia

    2008-01-01

    Installation of the final component of the Large Hadron Collider particle accelerator is under way along the Franco-Swiss border near Geneva, Switzerland. When completed this summer, the LHC will be the world's largest and most complex scientific instrument.

  17. Complete Blood Count (For Parents)

    Science.gov (United States)

    ... Kids Deal With Injections and Blood Tests Blood Culture Anemia Blood Test: Basic Metabolic Panel (BMP) Blood Test: Hemoglobin Basic Blood Chemistry Tests Word! Complete Blood Count (CBC) Medical Tests and Procedures ( ...

  18. Monitoring Completed Navigation Projects Program

    National Research Council Canada - National Science Library

    Bottin, Jr., Robert R

    2001-01-01

    ... (MCNP) Program. The program was formerly known as the Monitoring Completed Coastal Projects Program, but was modified in the late 1990s to include all navigation projects, inland as well as coastal...

  19. A curated transcriptome dataset collection to investigate the functional programming of human hematopoietic cells in early life.

    Science.gov (United States)

    Rahman, Mahbuba; Boughorbel, Sabri; Presnell, Scott; Quinn, Charlie; Cugno, Chiara; Chaussabel, Damien; Marr, Nico

    2016-01-01

    Compendia of large-scale datasets made available in public repositories provide an opportunity to identify and fill gaps in biomedical knowledge. But first, these data need to be made readily accessible to research investigators for interpretation. Here we make available a collection of transcriptome datasets to investigate the functional programming of human hematopoietic cells in early life. Thirty two datasets were retrieved from the NCBI Gene Expression Omnibus (GEO) and loaded in a custom web application called the Gene Expression Browser (GXB), which was designed for interactive query and visualization of integrated large-scale data. Quality control checks were performed. Multiple sample groupings and gene rank lists were created allowing users to reveal age-related differences in transcriptome profiles, changes in the gene expression of neonatal hematopoietic cells to a variety of immune stimulators and modulators, as well as during cell differentiation. Available demographic, clinical, and cell phenotypic information can be overlaid with the gene expression data and used to sort samples. Web links to customized graphical views can be generated and subsequently inserted in manuscripts to report novel findings. GXB also enables browsing of a single gene across projects, thereby providing new perspectives on age- and developmental stage-specific expression of a given gene across the human hematopoietic system. This dataset collection is available at: http://developmentalimmunology.gxbsidra.org/dm3/geneBrowser/list.

  20. Complete colonic duplication in children.

    Science.gov (United States)

    Khaleghnejad Tabari, Ahmad; Mirshemirani, Alireza; Khaleghnejad Tabari, Nasibeh

    2012-01-01

    Complete colonic duplication is a very rare congenital anomaly that may have different presentations according to its location and size. Complete colonic duplication can occur in 15% of gastrointestinal duplication. We report two cases of complete colonic duplications, and their characteristics. We present two patients with complete colonic duplication with different types and presentations. Case 1: A 2- year old boy presented to the clinic with abdominal protrusion, difficulty to defecate, chronic constipation and mucosal prolaps covered bulging (rectocele) since he was 6 months old. The patient had palpable pelvic mass with doughy consistency. Rectal exam confirmed perirectal mass with soft consistency. The patient underwent a surgical operation that had total tubular colorectal duplication with one blind end and was treated with simple fenestration of distal end, and was discharged without complication. After two years follow up, he had normal defecation and good weight gain. Case 2: A 2 -day old infant was referred with imperforate anus and complete duplication of recto-sigmoid colon, diphallus, double bladder, and hypospadiasis. After clinical and paraclinical investigations, he underwent operations in several stages in different periods, and was discharged without complications. After four years follow up, he led a normal life. The patients with complete duplication have to be examined carefully because of the high incidence of other systemic anomalies. Treatment includes simple resection of distal common wall, fenestration, and repair other associated anomalies.

  1. Unsupervised multiple kernel learning for heterogeneous data integration.

    Science.gov (United States)

    Mariette, Jérôme; Villa-Vialaneix, Nathalie

    2018-03-15

    Recent high-throughput sequencing advances have expanded the breadth of available omics datasets and the integrated analysis of multiple datasets obtained on the same samples has allowed to gain important insights in a wide range of applications. However, the integration of various sources of information remains a challenge for systems biology since produced datasets are often of heterogeneous types, with the need of developing generic methods to take their different specificities into account. We propose a multiple kernel framework that allows to integrate multiple datasets of various types into a single exploratory analysis. Several solutions are provided to learn either a consensus meta-kernel or a meta-kernel that preserves the original topology of the datasets. We applied our framework to analyse two public multi-omics datasets. First, the multiple metagenomic datasets, collected during the TARA Oceans expedition, was explored to demonstrate that our method is able to retrieve previous findings in a single kernel PCA as well as to provide a new image of the sample structures when a larger number of datasets are included in the analysis. To perform this analysis, a generic procedure is also proposed to improve the interpretability of the kernel PCA in regards with the original data. Second, the multi-omics breast cancer datasets, provided by The Cancer Genome Atlas, is analysed using a kernel Self-Organizing Maps with both single and multi-omics strategies. The comparison of these two approaches demonstrates the benefit of our integration method to improve the representation of the studied biological system. Proposed methods are available in the R package mixKernel, released on CRAN. It is fully compatible with the mixOmics package and a tutorial describing the approach can be found on mixOmics web site http://mixomics.org/mixkernel/. jerome.mariette@inra.fr or nathalie.villa-vialaneix@inra.fr. Supplementary data are available at Bioinformatics online.

  2. A Multi-Resolution Spatial Model for Large Datasets Based on the Skew-t Distribution

    KAUST Repository

    Tagle, Felipe

    2017-12-06

    Large, non-Gaussian spatial datasets pose a considerable modeling challenge as the dependence structure implied by the model needs to be captured at different scales, while retaining feasible inference. Skew-normal and skew-t distributions have only recently begun to appear in the spatial statistics literature, without much consideration, however, for the ability to capture dependence at multiple resolutions, and simultaneously achieve feasible inference for increasingly large data sets. This article presents the first multi-resolution spatial model inspired by the skew-t distribution, where a large-scale effect follows a multivariate normal distribution and the fine-scale effects follow a multivariate skew-normal distributions. The resulting marginal distribution for each region is skew-t, thereby allowing for greater flexibility in capturing skewness and heavy tails characterizing many environmental datasets. Likelihood-based inference is performed using a Monte Carlo EM algorithm. The model is applied as a stochastic generator of daily wind speeds over Saudi Arabia.

  3. A dataset mapping the potential biophysical effects of vegetation cover change

    Science.gov (United States)

    Duveiller, Gregory; Hooker, Josh; Cescatti, Alessandro

    2018-02-01

    Changing the vegetation cover of the Earth has impacts on the biophysical properties of the surface and ultimately on the local climate. Depending on the specific type of vegetation change and on the background climate, the resulting competing biophysical processes can have a net warming or cooling effect, which can further vary both spatially and seasonally. Due to uncertain climate impacts and the lack of robust observations, biophysical effects are not yet considered in land-based climate policies. Here we present a dataset based on satellite remote sensing observations that provides the potential changes i) of the full surface energy balance, ii) at global scale, and iii) for multiple vegetation transitions, as would now be required for the comprehensive evaluation of land based mitigation plans. We anticipate that this dataset will provide valuable information to benchmark Earth system models, to assess future scenarios of land cover change and to develop the monitoring, reporting and verification guidelines required for the implementation of mitigation plans that account for biophysical land processes.

  4. Mr-Moose: An advanced SED-fitting tool for heterogeneous multi-wavelength datasets

    Science.gov (United States)

    Drouart, G.; Falkendal, T.

    2018-04-01

    We present the public release of Mr-Moose, a fitting procedure that is able to perform multi-wavelength and multi-object spectral energy distribution (SED) fitting in a Bayesian framework. This procedure is able to handle a large variety of cases, from an isolated source to blended multi-component sources from an heterogeneous dataset (i.e. a range of observation sensitivities and spectral/spatial resolutions). Furthermore, Mr-Moose handles upper-limits during the fitting process in a continuous way allowing models to be gradually less probable as upper limits are approached. The aim is to propose a simple-to-use, yet highly-versatile fitting tool fro handling increasing source complexity when combining multi-wavelength datasets with fully customisable filter/model databases. The complete control of the user is one advantage, which avoids the traditional problems related to the "black box" effect, where parameter or model tunings are impossible and can lead to overfitting and/or over-interpretation of the results. Also, while a basic knowledge of Python and statistics is required, the code aims to be sufficiently user-friendly for non-experts. We demonstrate the procedure on three cases: two artificially-generated datasets and a previous result from the literature. In particular, the most complex case (inspired by a real source, combining Herschel, ALMA and VLA data) in the context of extragalactic SED fitting, makes Mr-Moose a particularly-attractive SED fitting tool when dealing with partially blended sources, without the need for data deconvolution.

  5. Statistical and population genetics issues of two Hungarian datasets from the aspect of DNA evidence interpretation.

    Science.gov (United States)

    Szabolcsi, Zoltán; Farkas, Zsuzsa; Borbély, Andrea; Bárány, Gusztáv; Varga, Dániel; Heinrich, Attila; Völgyi, Antónia; Pamjav, Horolma

    2015-11-01

    When the DNA profile from a crime-scene matches that of a suspect, the weight of DNA evidence depends on the unbiased estimation of the match probability of the profiles. For this reason, it is required to establish and expand the databases that reflect the actual allele frequencies in the population applied. 21,473 complete DNA profiles from Databank samples were used to establish the allele frequency database to represent the population of Hungarian suspects. We used fifteen STR loci (PowerPlex ESI16) including five, new ESS loci. The aim was to calculate the statistical, forensic efficiency parameters for the Databank samples and compare the newly detected data to the earlier report. The population substructure caused by relatedness may influence the frequency of profiles estimated. As our Databank profiles were considered non-random samples, possible relationships between the suspects can be assumed. Therefore, population inbreeding effect was estimated using the FIS calculation. The overall inbreeding parameter was found to be 0.0106. Furthermore, we tested the impact of the two allele frequency datasets on 101 randomly chosen STR profiles, including full and partial profiles. The 95% confidence interval estimates for the profile frequencies (pM) resulted in a tighter range when we used the new dataset compared to the previously published ones. We found that the FIS had less effect on frequency values in the 21,473 samples than the application of minimum allele frequency. No genetic substructure was detected by STRUCTURE analysis. Due to the low level of inbreeding effect and the high number of samples, the new dataset provides unbiased and precise estimates of LR for statistical interpretation of forensic casework and allows us to use lower allele frequencies. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  6. Blood vessel-based liver segmentation through the portal phase of a CT dataset

    Science.gov (United States)

    Maklad, Ahmed S.; Matsuhiro, Mikio; Suzuki, Hidenobu; Kawata, Yoshiki; Niki, Noboru; Moriyama, Noriyuki; Utsunomiya, Toru; Shimada, Mitsuo

    2013-02-01

    Blood vessels are dispersed throughout the human body organs and carry unique information for each person. This information can be used to delineate organ boundaries. The proposed method relies on abdominal blood vessels (ABV) to segment the liver considering the potential presence of tumors through the portal phase of a CT dataset. ABV are extracted and classified into hepatic (HBV) and nonhepatic (non-HBV) with a small number of interactions. HBV and non-HBV are used to guide an automatic segmentation of the liver. HBV are used to individually segment the core region of the liver. This region and non-HBV are used to construct a boundary surface between the liver and other organs to separate them. The core region is classified based on extracted posterior distributions of its histogram into low intensity tumor (LIT) and non-LIT core regions. Non-LIT case includes normal part of liver, HBV, and high intensity tumors if exist. Each core region is extended based on its corresponding posterior distribution. Extension is completed when it reaches either a variation in intensity or the constructed boundary surface. The method was applied to 80 datasets (30 Medical Image Computing and Computer Assisted Intervention (MICCAI) and 50 non-MICCAI data) including 60 datasets with tumors. Our results for the MICCAI-test data were evaluated by sliver07 [1] with an overall score of 79.7, which ranks seventh best on the site (December 2013). This approach seems a promising method for extraction of liver volumetry of various shapes and sizes and low intensity hepatic tumors.

  7. Acquiring a four-dimensional computed tomography dataset using an external respiratory signal

    International Nuclear Information System (INIS)

    Vedam, S S; Keall, P J; Kini, V R; Mostafavi, H; Shukla, H P; Mohan, R

    2003-01-01

    Four-dimensional (4D) methods strive to achieve highly conformal radiotherapy, particularly for lung and breast tumours, in the presence of respiratory-induced motion of tumours and normal tissues. Four-dimensional radiotherapy accounts for respiratory motion during imaging, planning and radiation delivery, and requires a 4D CT image in which the internal anatomy motion as a function of the respiratory cycle can be quantified. The aims of our research were (a) to develop a method to acquire 4D CT images from a spiral CT scan using an external respiratory signal and (b) to examine the potential utility of 4D CT imaging. A commercially available respiratory motion monitoring system provided an 'external' tracking signal of the patient's breathing. Simultaneous recording of a TTL 'X-Ray ON' signal from the CT scanner indicated the start time of CT image acquisition, thus facilitating time stamping of all subsequent images. An over-sampled spiral CT scan was acquired using a pitch of 0.5 and scanner rotation time of 1.5 s. Each image from such a scan was sorted into an image bin that corresponded with the phase of the respiratory cycle in which the image was acquired. The complete set of such image bins accumulated over a respiratory cycle constitutes a 4D CT dataset. Four-dimensional CT datasets of a mechanical oscillator phantom and a patient undergoing lung radiotherapy were acquired. Motion artefacts were significantly reduced in the images in the 4D CT dataset compared to the three-dimensional (3D) images, for which respiratory motion was not accounted. Accounting for respiratory motion using 4D CT imaging is feasible and yields images with less distortion than 3D images. 4D images also contain respiratory motion information not available in a 3D CT image

  8. P-MartCancer–Interactive Online Software to Enable Analysis of Shotgun Cancer Proteomic Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Webb-Robertson, Bobbie-Jo M.; Bramer, Lisa M.; Jensen, Jeffrey L.; Kobold, Markus A.; Stratton, Kelly G.; White, Amanda M.; Rodland, Karin D.

    2017-10-31

    P-MartCancer is a new interactive web-based software environment that enables biomedical and biological scientists to perform in-depth analyses of global proteomics data without requiring direct interaction with the data or with statistical software. P-MartCancer offers a series of statistical modules associated with quality assessment, peptide and protein statistics, protein quantification and exploratory data analyses driven by the user via customized workflows and interactive visualization. Currently, P-MartCancer offers access to multiple cancer proteomic datasets generated through the Clinical Proteomics Tumor Analysis Consortium (CPTAC) at the peptide, gene and protein levels. P-MartCancer is deployed using Azure technologies (http://pmart.labworks.org/cptac.html), the web-service is alternatively available via Docker Hub (https://hub.docker.com/r/pnnl/pmart-web/) and many statistical functions can be utilized directly from an R package available on GitHub (https://github.com/pmartR).

  9. P-MartCancer-Interactive Online Software to Enable Analysis of Shotgun Cancer Proteomic Datasets.

    Science.gov (United States)

    Webb-Robertson, Bobbie-Jo M; Bramer, Lisa M; Jensen, Jeffrey L; Kobold, Markus A; Stratton, Kelly G; White, Amanda M; Rodland, Karin D

    2017-11-01

    P-MartCancer is an interactive web-based software environment that enables statistical analyses of peptide or protein data, quantitated from mass spectrometry-based global proteomics experiments, without requiring in-depth knowledge of statistical programming. P-MartCancer offers a series of statistical modules associated with quality assessment, peptide and protein statistics, protein quantification, and exploratory data analyses driven by the user via customized workflows and interactive visualization. Currently, P-MartCancer offers access and the capability to analyze multiple cancer proteomic datasets generated through the Clinical Proteomics Tumor Analysis Consortium at the peptide, gene, and protein levels. P-MartCancer is deployed as a web service (https://pmart.labworks.org/cptac.html), alternatively available via Docker Hub (https://hub.docker.com/r/pnnl/pmart-web/). Cancer Res; 77(21); e47-50. ©2017 AACR . ©2017 American Association for Cancer Research.

  10. Dataset of curcumin derivatives for QSAR modeling of anti cancer against P388 cell line

    Directory of Open Access Journals (Sweden)

    Yum Eryanti

    2016-12-01

    Full Text Available The dataset of curcumin derivatives consists of 45 compounds (Table 1 with their anti cancer biological activity (IC50 against P388 cell line. 45 curcumin derivatives were used in the model development where 30 of these compounds were in the training set and the remaining 15 compounds were in the test set. The development of the QSAR model involved the use of the multiple linear regression analysis (MLRA method. Based on the method, r2 value, r2 (CV value of 0.81, 0.67 were obtained. The QSAR model was also employed to predict the biological activity of compounds in the test set. Predictive correlation coefficient r2 values of 0.88 were obtained for the test set.

  11. The fate of completed intentions.

    Science.gov (United States)

    Anderson, Francis T; Einstein, Gilles O

    2017-04-01

    The goal of this research was to determine whether and how people deactivate prospective memory (PM) intentions after they have been completed. One view proposes that PM intentions can be deactivated after completion, such that they no longer come to mind and interfere with current tasks. Another view is that now irrelevant completed PM intentions exhibit persisting activation, and continue to be retrieved. In Experiment 1, participants were given a PM intention embedded within the ongoing task during Phase 1, after which participants were told either that the PM task had been completed or suspended until later. During Phase 2, participants were instructed to perform only the ongoing task and were periodically prompted to report their thoughts. Critically, the PM targets from Phase 1 reappeared in Phase 2. All of our measures, including thoughts reported about the PM task, supported the existence of persisting activation. In Experiment 2, we varied conditions that were expected to mitigate persisting activation. Despite our best attempts to promote deactivation, we found evidence for the persistence of spontaneous retrieval in all groups after intentions were completed. The theoretical and practical implications of this potential dark side to spontaneous retrieval are discussed.

  12. A complete generalized adjustment criterion

    NARCIS (Netherlands)

    Perković, Emilija; Textor, Johannes; Kalisch, Markus; Maathuis, Marloes H.

    2015-01-01

    Covariate adjustment is a widely used approach to estimate total causal effects from observational data. Several graphical criteria have been developed in recent years to identify valid covariates for adjustment from graphical causal models. These criteria can handle multiple causes, latent

  13. Climatic Analysis of Oceanic Water Vapor Transports Based on Satellite E-P Datasets

    Science.gov (United States)

    Smith, Eric A.; Sohn, Byung-Ju; Mehta, Vikram

    2004-01-01

    Understanding the climatically varying properties of water vapor transports from a robust observational perspective is an essential step in calibrating climate models. This is tantamount to measuring year-to-year changes of monthly- or seasonally-averaged, divergent water vapor transport distributions. This cannot be done effectively with conventional radiosonde data over ocean regions where sounding data are generally sparse. This talk describes how a methodology designed to derive atmospheric water vapor transports over the world oceans from satellite-retrieved precipitation (P) and evaporation (E) datasets circumvents the problem of inadequate sampling. Ultimately, the method is intended to take advantage of the relatively complete and consistent coverage, as well as continuity in sampling, associated with E and P datasets obtained from satellite measurements. Independent P and E retrievals from Special Sensor Microwave Imager (SSM/I) measurements, along with P retrievals from Tropical Rainfall Measuring Mission (TRMM) measurements, are used to obtain transports by solving a potential function for the divergence of water vapor transport as balanced by large scale E - P conditions.

  14. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    Directory of Open Access Journals (Sweden)

    C. V. Subbulakshmi

    2015-01-01

    Full Text Available Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO algorithm with the extreme learning machine (ELM classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN, proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers.

  15. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants.

    Science.gov (United States)

    Sauzet, Odile; Peacock, Janet L

    2017-07-20

    The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.

  16. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants

    Directory of Open Access Journals (Sweden)

    Odile Sauzet

    2017-07-01

    Full Text Available Abstract Background The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Methods Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. Results The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. Conclusions This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.

  17. Multiple sclerosis

    DEFF Research Database (Denmark)

    Stenager, Egon; Stenager, E N; Knudsen, Lone

    1994-01-01

    In a cross-sectional study of 117 randomly selected patients (52 men, 65 women) with definite multiple sclerosis, it was found that 76 percent were married or cohabitant, 8 percent divorced. Social contacts remained unchanged for 70 percent, but outgoing social contacts were reduced for 45 percent......, need for structural changes in home and need for pension became greater with increasing physical handicap. No significant differences between gender were found. It is concluded that patients and relatives are under increased social strain, when multiple sclerosis progresses to a moderate handicap...

  18. Animated analysis of geoscientific datasets: An interactive graphical application

    Science.gov (United States)

    Morse, Peter; Reading, Anya; Lueg, Christopher

    2017-12-01

    Geoscientists are required to analyze and draw conclusions from increasingly large volumes of data. There is a need to recognise and characterise features and changing patterns of Earth observables within such large datasets. It is also necessary to identify significant subsets of the data for more detailed analysis. We present an innovative, interactive software tool and workflow to visualise, characterise, sample and tag large geoscientific datasets from both local and cloud-based repositories. It uses an animated interface and human-computer interaction to utilise the capacity of human expert observers to identify features via enhanced visual analytics. 'Tagger' enables users to analyze datasets that are too large in volume to be drawn legibly on a reasonable number of single static plots. Users interact with the moving graphical display, tagging data ranges of interest for subsequent attention. The tool provides a rapid pre-pass process using fast GPU-based OpenGL graphics and data-handling and is coded in the Quartz Composer visual programing language (VPL) on Mac OSX. It makes use of interoperable data formats, and cloud-based (or local) data storage and compute. In a case study, Tagger was used to characterise a decade (2000-2009) of data recorded by the Cape Sorell Waverider Buoy, located approximately 10 km off the west coast of Tasmania, Australia. These data serve as a proxy for the understanding of Southern Ocean storminess, which has both local and global implications. This example shows use of the tool to identify and characterise 4 different types of storm and non-storm events during this time. Events characterised in this way are compared with conventional analysis, noting advantages and limitations of data analysis using animation and human interaction. Tagger provides a new ability to make use of humans as feature detectors in computer-based analysis of large-volume geosciences and other data.

  19. Designing the colorectal cancer core dataset in Iran

    Directory of Open Access Journals (Sweden)

    Sara Dorri

    2017-01-01

    Full Text Available Background: There is no need to explain the importance of collection, recording and analyzing the information of disease in any health organization. In this regard, systematic design of standard data sets can be helpful to record uniform and consistent information. It can create interoperability between health care systems. The main purpose of this study was design the core dataset to record colorectal cancer information in Iran. Methods: For the design of the colorectal cancer core data set, a combination of literature review and expert consensus were used. In the first phase, the draft of the data set was designed based on colorectal cancer literature review and comparative studies. Then, in the second phase, this data set was evaluated by experts from different discipline such as medical informatics, oncology and surgery. Their comments and opinion were taken. In the third phase refined data set, was evaluated again by experts and eventually data set was proposed. Results: In first phase, based on the literature review, a draft set of 85 data elements was designed. In the second phase this data set was evaluated by experts and supplementary information was offered by professionals in subgroups especially in treatment part. In this phase the number of elements totally were arrived to 93 numbers. In the third phase, evaluation was conducted by experts and finally this dataset was designed in five main parts including: demographic information, diagnostic information, treatment information, clinical status assessment information, and clinical trial information. Conclusion: In this study the comprehensive core data set of colorectal cancer was designed. This dataset in the field of collecting colorectal cancer information can be useful through facilitating exchange of health information. Designing such data set for similar disease can help providers to collect standard data from patients and can accelerate retrieval from storage systems.

  20. FTSPlot: fast time series visualization for large datasets.

    Directory of Open Access Journals (Sweden)

    Michael Riss

    Full Text Available The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of O(n x log(N; the visualization itself can be done with a complexity of O(1 and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with < 20 ms ms. The current 64-bit implementation theoretically supports datasets with up to 2(64 bytes, on the x86_64 architecture currently up to 2(48 bytes are supported, and benchmarks have been conducted with 2(40 bytes/1 TiB or 1.3 x 10(11 double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments.

  1. A synthetic dataset for evaluating soft and hard fusion algorithms

    Science.gov (United States)

    Graham, Jacob L.; Hall, David L.; Rimland, Jeffrey

    2011-06-01

    There is an emerging demand for the development of data fusion techniques and algorithms that are capable of combining conventional "hard" sensor inputs such as video, radar, and multispectral sensor data with "soft" data including textual situation reports, open-source web information, and "hard/soft" data such as image or video data that includes human-generated annotations. New techniques that assist in sense-making over a wide range of vastly heterogeneous sources are critical to improving tactical situational awareness in counterinsurgency (COIN) and other asymmetric warfare situations. A major challenge in this area is the lack of realistic datasets available for test and evaluation of such algorithms. While "soft" message sets exist, they tend to be of limited use for data fusion applications due to the lack of critical message pedigree and other metadata. They also lack corresponding hard sensor data that presents reasonable "fusion opportunities" to evaluate the ability to make connections and inferences that span the soft and hard data sets. This paper outlines the design methodologies, content, and some potential use cases of a COIN-based synthetic soft and hard dataset created under a United States Multi-disciplinary University Research Initiative (MURI) program funded by the U.S. Army Research Office (ARO). The dataset includes realistic synthetic reports from a variety of sources, corresponding synthetic hard data, and an extensive supporting database that maintains "ground truth" through logical grouping of related data into "vignettes." The supporting database also maintains the pedigree of messages and other critical metadata.

  2. Complete Normal Ordering 1: Foundations

    CERN Document Server

    Ellis, John; Skliros, Dimitri P.

    2016-01-01

    We introduce a new prescription for quantising scalar field theories perturbatively around a true minimum of the full quantum effective action, which is to `complete normal order' the bare action of interest. When the true vacuum of the theory is located at zero field value, the key property of this prescription is the automatic cancellation, to any finite order in perturbation theory, of all tadpole and, more generally, all `cephalopod' Feynman diagrams. The latter are connected diagrams that can be disconnected into two pieces by cutting one internal vertex, with either one or both pieces free from external lines. In addition, this procedure of `complete normal ordering' (which is an extension of the standard field theory definition of normal ordering) reduces by a substantial factor the number of Feynman diagrams to be calculated at any given loop order. We illustrate explicitly the complete normal ordering procedure and the cancellation of cephalopod diagrams in scalar field theories with non-derivative i...

  3. Identifying frauds and anomalies in Medicare-B dataset.

    Science.gov (United States)

    Jiwon Seo; Mendelevitch, Ofer

    2017-07-01

    Healthcare industry is growing at a rapid rate to reach a market value of $7 trillion dollars world wide. At the same time, fraud in healthcare is becoming a serious problem, amounting to 5% of the total healthcare spending, or $100 billion dollars each year in US. Manually detecting healthcare fraud requires much effort. Recently, machine learning and data mining techniques are applied to automatically detect healthcare frauds. This paper proposes a novel PageRank-based algorithm to detect healthcare frauds and anomalies. We apply the algorithm to Medicare-B dataset, a real-life data with 10 million healthcare insurance claims. The algorithm successfully identifies tens of previously unreported anomalies.

  4. Equalizing imbalanced imprecise datasets for genetic fuzzy classifiers

    Directory of Open Access Journals (Sweden)

    AnaM. Palacios

    2012-04-01

    Full Text Available Determining whether an imprecise dataset is imbalanced is not immediate. The vagueness in the data causes that the prior probabilities of the classes are not precisely known, and therefore the degree of imbalance can also be uncertain. In this paper we propose suitable extensions of different resampling algorithms that can be applied to interval valued, multi-labelled data. By means of these extended preprocessing algorithms, certain classification systems designed for minimizing the fraction of misclassifications are able to produce knowledge bases that are also adequate under common metrics for imbalanced classification.

  5. Dataset concerning the analytical approximation of the Ae3 temperature

    Directory of Open Access Journals (Sweden)

    B.L. Ennis

    2017-02-01

    The dataset includes the terms of the function and the values for the polynomial coefficients for major alloying elements in steel. A short description of the approximation method used to derive and validate the coefficients has also been included. For discussion and application of this model, please refer to the full length article entitled “The role of aluminium in chemical and phase segregation in a TRIP-assisted dual phase steel” 10.1016/j.actamat.2016.05.046 (Ennis et al., 2016 [1].

  6. Gene set analysis of the EADGENE chicken data-set

    DEFF Research Database (Denmark)

    Skarman, Axel; Jiang, Li; Hornshøj, Henrik

    2009-01-01

     Abstract Background: Gene set analysis is considered to be a way of improving our biological interpretation of the observed expression patterns. This paper describes different methods applied to analyse expression data from a chicken DNA microarray dataset. Results: Applying different gene set...... analyses to the chicken expression data led to different ranking of the Gene Ontology terms tested. A method for prediction of possible annotations was applied. Conclusion: Biological interpretation based on gene set analyses dependent on the statistical method used. Methods for predicting the possible...

  7. A Validation Dataset for CryoSat Sea Ice Investigators

    DEFF Research Database (Denmark)

    Julia, Gaudelli,; Baker, Steve; Haas, Christian

    Since its launch in April 2010 Cryosat has been collecting valuable sea ice data over the Arctic region. Over the same period ESA’s CryoVEx and NASA IceBridge validation campaigns have been collecting a unique set of coincident airborne measurements in the Arctic. The CryoVal-SI project has...... community. In this talk we will describe the composition of the validation dataset, summarising how it was processed and how to understand the content and format of the data. We will also explain how to access the data and the supporting documentation....

  8. Dataset of statements on policy integration of selected intergovernmental organizations

    Directory of Open Access Journals (Sweden)

    Jale Tosun

    2018-04-01

    Full Text Available This article describes data for 78 intergovernmental organizations (IGOs working on topics related to energy governance, environmental protection, and the economy. The number of IGOs covered also includes organizations active in other sectors. The point of departure for data construction was the Correlates of War dataset, from which we selected this sample of IGOs. We updated and expanded the empirical information on the IGOs selected by manual coding. Most importantly, we collected the primary law texts of the individual IGOs in order to code whether they commit themselves to environmental policy integration (EPI, climate policy integration (CPI and/or energy policy integration (EnPI.

  9. Dataset on the energy performance of atrium type hotel buildings.

    Science.gov (United States)

    Vujosevic, Milica; Krstic-Furundzic, Aleksandra

    2018-04-01

    The data presented in this article are related to the research article entitled "The Influence of Atrium on Energy Performance of Hotel Building" (Vujosevic and Krstic-Furundzic, 2017) [1], which describes the annual energy performance of atrium type hotel building in Belgrade climate conditions, with the objective to present the impact of the atrium on the hotel building's energy demands for space heating and cooling. This dataset is made publicly available to show energy performance of selected hotel design alternatives, in order to enable extended analyzes of these data for other researchers.

  10. Dataset on records of Hericium erinaceus in Slovakia.

    Science.gov (United States)

    Kunca, Vladimír; Čiliak, Marek

    2017-06-01

    The data presented in this article are related to the research article entitled "Habitat preferences of Hericium erinaceus in Slovakia" (Kunca and Čiliak, 2016) [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status, host tree position and intensity of management of forest stands were evaluated in this study. All surveys were based on basidioma occurrence and some result from targeted searches.

  11. Dataset on records of Hericium erinaceus in Slovakia

    Directory of Open Access Journals (Sweden)

    Vladimír Kunca

    2017-06-01

    Full Text Available The data presented in this article are related to the research article entitled “Habitat preferences of Hericium erinaceus in Slovakia” (Kunca and Čiliak, 2016 [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status, host tree position and intensity of management of forest stands were evaluated in this study. All surveys were based on basidioma occurrence and some result from targeted searches.

  12. ­A curated transcriptomic dataset collection relevant to embryonic development associated with in vitro fertilization in healthy individuals and patients with polycystic ovary syndrome [version 1; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Rafah Mackeh

    2017-02-01

    Full Text Available The collection of large-scale datasets available in public repositories is rapidly growing and providing opportunities to identify and fill gaps in different fields of biomedical research. However, users of these datasets should be able to selectively browse datasets related to their field of interest. Here we made available a collection of transcriptome datasets related to human follicular cells from normal individuals or patients with polycystic ovary syndrome, in the process of their development, during in vitro fertilization. After RNA-seq dataset exclusion and careful selection based on study description and sample information, 12 datasets, encompassing a total of 85 unique transcriptome profiles, were identified in NCBI Gene Expression Omnibus and uploaded to the Gene Expression Browser (GXB, a web application specifically designed for interactive query and visualization of integrated large-scale data. Once annotated in GXB, multiple sample grouping has been made in order to create rank lists to allow easy data interpretation and comparison. The GXB tool also allows the users to browse a single gene across multiple projects to evaluate its expression profiles in multiple biological systems/conditions in a web-based customized graphical views. The curated dataset is accessible at the following link: http://ivf.gxbsidra.org/dm3/landing.gsp.

  13. annot8r: GO, EC and KEGG annotation of EST datasets

    Directory of Open Access Journals (Sweden)

    Schmid Ralf

    2008-04-01

    Full Text Available Abstract Background The expressed sequence tag (EST methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. Results annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO, Enzyme Commission (EC and Kyoto Encyclopaedia of Genes and Genomes (KEGG annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. Conclusion annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non

  14. Multiple myeloma

    International Nuclear Information System (INIS)

    Sohn, Jeong Ick; Ha, Choon Ho; Choi, Karp Shik

    1994-01-01

    Multiple myeloma is a malignant plasma cell tumor that is thought to originate proliferation of a single clone of abnormal plasma cell resulting production of a whole monoclonal paraprotein. The authors experienced a case of multiple myeloma with severe mandibular osteolytic lesions in 46-year-old female. As a result of careful analysis of clinical, radiological, histopathological features, and laboratory findings, we diagnosed it as multiple myeloma, and the following results were obtained. 1. Main clinical symptoms were intermittent dull pain on the mandibular body area, abnormal sensation of lip and pain due to the fracture on the right clavicle. 2. Laboratory findings revealed M-spike, reversed serum albumin-globulin ratio, markedly elevated ESR and hypercalcemia. 3. Radiographically, multiple osteolytic punched-out radiolucencies were evident on the skull, zygoma, jaw bones, ribs, clavicle and upper extremities. Enlarged liver and increased uptakes on the lesional sites in RN scan were also observed. 4. Histopathologically, markedly hypercellular marrow with sheets of plasmoblasts and megakaryocytes were also observed.

  15. Multiple sclerosis

    DEFF Research Database (Denmark)

    Stenager, E; Jensen, K

    1988-01-01

    Forty-two (12%) of a total of 366 patients with multiple sclerosis (MS) had psychiatric admissions. Of these, 34 (81%) had their first psychiatric admission in conjunction with or after the onset of MS. Classification by psychiatric diagnosis showed that there was a significant positive correlation...

  16. Multiple sclerosis

    DEFF Research Database (Denmark)

    Stenager, E; Knudsen, L; Jensen, K

    1991-01-01

    In a cross-sectional investigation of 116 patients with multiple sclerosis, the social and sparetime activities of the patient were assessed by both patient and his/her family. The assessments were correlated to physical disability which showed that particularly those who were moderately disabled...

  17. Multiple sclerosis

    DEFF Research Database (Denmark)

    Stenager, E; Jensen, K

    1990-01-01

    An investigation on the correlation between ability to read TV subtitles and the duration of visual evoked potential (VEP) latency in 14 patients with definite multiple sclerosis (MS), indicated that VEP latency in patients unable to read the TV subtitles was significantly delayed in comparison...

  18. Multiple sclerosis

    DEFF Research Database (Denmark)

    Stenager, E; Knudsen, L; Jensen, K

    1994-01-01

    In a cross-sectional study of 94 patients (42 males, 52 females) with definite multiple sclerosis (MS) in the age range 25-55 years, the correlation of neuropsychological tests with the ability to read TV-subtitles and with the use of sedatives is examined. A logistic regression analysis reveals...

  19. Multiple Sclerosis.

    Science.gov (United States)

    Plummer, Nancy; Michael, Nancy, Ed.

    This module on multiple sclerosis is intended for use in inservice or continuing education programs for persons who administer medications in long-term care facilities. Instructor information, including teaching suggestions, and a listing of recommended audiovisual materials and their sources appear first. The module goal and objectives are then…

  20. Parenting Multiples

    Science.gov (United States)

    ... when your babies do. Though it can be hard to let go of the thousand other things you need to do, remember that your well-being is key to your ability to take care of your babies. What Problems Can Happen? It may be hard to tell multiple babies apart when they first ...

  1. Establishing macroecological trait datasets: digitalization, extrapolation, and validation of diet preferences in terrestrial mammals worldwide.

    Science.gov (United States)

    Kissling, Wilm Daniel; Dalby, Lars; Fløjgaard, Camilla; Lenoir, Jonathan; Sandel, Brody; Sandom, Christopher; Trøjelsgaard, Kristian; Svenning, Jens-Christian

    2014-07-01

    Ecological trait data are essential for understanding the broad-scale distribution of biodiversity and its response to global change. For animals, diet represents a fundamental aspect of species' evolutionary adaptations, ecological and functional roles, and trophic interactions. However, the importance of diet for macroevolutionary and macroecological dynamics remains little explored, partly because of the lack of comprehensive trait datasets. We compiled and evaluated a comprehensive global dataset of diet preferences of mammals ("MammalDIET"). Diet information was digitized from two global and cladewide data sources and errors of data entry by multiple data recorders were assessed. We then developed a hierarchical extrapolation procedure to fill-in diet information for species with missing information. Missing data were extrapolated with information from other taxonomic levels (genus, other species within the same genus, or family) and this extrapolation was subsequently validated both internally (with a jack-knife approach applied to the compiled species-level diet data) and externally (using independent species-level diet information from a comprehensive continentwide data source). Finally, we grouped mammal species into trophic levels and dietary guilds, and their species richness as well as their proportion of total richness were mapped at a global scale for those diet categories with good validation results. The success rate of correctly digitizing data was 94%, indicating that the consistency in data entry among multiple recorders was high. Data sources provided species-level diet information for a total of 2033 species (38% of all 5364 terrestrial mammal species, based on the IUCN taxonomy). For the remaining 3331 species, diet information was mostly extrapolated from genus-level diet information (48% of all terrestrial mammal species), and only rarely from other species within the same genus (6%) or from family level (8%). Internal and external

  2. A Comprehensive Dataset of Genes with a Loss-of-Function Mutant Phenotype in Arabidopsis1[W][OA

    Science.gov (United States)

    Lloyd, Johnny; Meinke, David

    2012-01-01

    Despite the widespread use of Arabidopsis (Arabidopsis thaliana) as a model plant, a curated dataset of Arabidopsis genes with mutant phenotypes remains to be established. A preliminary list published nine years ago in Plant Physiology is outdated, and genome-wide phenotype information remains difficult to obtain. We describe here a comprehensive dataset of 2,400 genes with a loss-of-function mutant phenotype in Arabidopsis. Phenotype descriptions were gathered primarily from manual curation of the scientific literature. Genes were placed into prioritized groups (essential, morphological, cellular-biochemical, and conditional) based on the documented phenotypes of putative knockout alleles. Phenotype classes (e.g. vegetative, reproductive, and timing, for the morphological group) and subsets (e.g. flowering time, senescence, circadian rhythms, and miscellaneous, for the timing class) were also established. Gene identities were classified as confirmed (through molecular complementation or multiple alleles) or not confirmed. Relationships between mutant phenotype and protein function, genetic redundancy, protein connectivity, and subcellular protein localization were explored. A complementary dataset of 401 genes that exhibit a mutant phenotype only when disrupted in combination with a putative paralog was also compiled. The importance of these genes in confirming functional redundancy and enhancing the value of single gene datasets is discussed. With further input and curation from the Arabidopsis community, these datasets should help to address a variety of important biological questions, provide a foundation for exploring the relationship between genotype and phenotype in angiosperms, enhance the utility of Arabidopsis as a reference plant, and facilitate comparative studies with model genetic organisms. PMID:22247268

  3. Multivendor Spectral-Domain Optical Coherence Tomography Dataset, Observer Annotation Performance Evaluation, and Standardized Evaluation Framework for Intraretinal Cystoid Fluid Segmentation

    Directory of Open Access Journals (Sweden)

    Jing Wu

    2016-01-01

    Full Text Available Development of image analysis and machine learning methods for segmentation of clinically significant pathology in retinal spectral-domain optical coherence tomography (SD-OCT, used in disease detection and prediction, is limited due to the availability of expertly annotated reference data. Retinal segmentation methods use datasets that either are not publicly available, come from only one device, or use different evaluation methodologies making them difficult to compare. Thus we present and evaluate a multiple expert annotated reference dataset for the problem of intraretinal cystoid fluid (IRF segmentation, a key indicator in exudative macular disease. In addition, a standardized framework for segmentation accuracy evaluation, applicable to other pathological structures, is presented. Integral to this work is the dataset used which must be fit for purpose for IRF segmentation algorithm training and testing. We describe here a multivendor dataset comprised of 30 scans. Each OCT scan for system training has been annotated by multiple graders using a proprietary system. Evaluation of the intergrader annotations shows a good correlation, thus making the reproducibly annotated scans suitable for the training and validation of image processing and machine learning based segmentation methods. The dataset will be made publicly available in the form of a segmentation Grand Challenge.

  4. Structural Completeness in Fuzzy Logics

    Czech Academy of Sciences Publication Activity Database

    Cintula, Petr; Metcalfe, G.

    2009-01-01

    Roč. 50, č. 2 (2009), s. 153-183 ISSN 0029-4527 R&D Projects: GA MŠk(CZ) 1M0545 Institutional research plan: CEZ:AV0Z10300504 Keywords : structral logics * fuzzy logics * structural completeness * admissible rules * primitive variety * residuated lattices Subject RIV: BA - General Mathematics

  5. Quantum space and quantum completeness

    Science.gov (United States)

    Jurić, Tajron

    2018-05-01

    Motivated by the question whether quantum gravity can "smear out" the classical singularity we analyze a certain quantum space and its quantum-mechanical completeness. Classical singularity is understood as a geodesic incompleteness, while quantum completeness requires a unique unitary time evolution for test fields propagating on an underlying background. Here the crucial point is that quantum completeness renders the Hamiltonian (or spatial part of the wave operator) to be essentially self-adjoint in order to generate a unique time evolution. We examine a model of quantum space which consists of a noncommutative BTZ black hole probed by a test scalar field. We show that the quantum gravity (noncommutative) effect is to enlarge the domain of BTZ parameters for which the relevant wave operator is essentially self-adjoint. This means that the corresponding quantum space is quantum complete for a larger range of BTZ parameters rendering the conclusion that in the quantum space one observes the effect of "smearing out" the singularity.

  6. Program Costs and Student Completion

    Science.gov (United States)

    Manning, Terri M.; Crosta, Peter M.

    2014-01-01

    Community colleges are under pressure to increase completion rates, prepare students for the workplace, and contain costs. Colleges need to know the financial implications of what are often perceived as routine decisions: course scheduling, program offerings, and the provision of support services. This chapter presents a methodology for estimating…

  7. Completely integrable operator evolutionary equations

    International Nuclear Information System (INIS)

    Chudnovsky, D.V.

    1979-01-01

    The authors present natural generalizations of classical completely integrable equations where the functions are replaced by arbitrary operators. Among these equations are the non-linear Schroedinger, the Korteweg-de Vries, and the modified KdV equations. The Lax representation and the Baecklund transformations are presented. (Auth.)

  8. Globals of Completely Regular Monoids

    Institute of Scientific and Technical Information of China (English)

    Wu Qian-qian; Gan Ai-ping; Du Xian-kun

    2015-01-01

    An element of a semigroup S is called irreducible if it cannot be expressed as a product of two elements in S both distinct from itself. In this paper we show that the class C of all completely regular monoids with irreducible identity elements satisfies the strong isomorphism property and so it is globally determined.

  9. Complete nitrification by Nitrospira bacteria

    DEFF Research Database (Denmark)

    Daims, Holger; Lebedeva, Elena V.; Pjevac, Petra

    2015-01-01

    Nitrification, the oxidation of ammonia via nitrite to nitrate, has always been considered to be a two-step process catalysed by chemolithoautotrophic microorganisms oxidizing either ammonia or nitrite. No known nitrifier carries out both steps, although complete nitrification should be energetic...

  10. The Completeness Theorem of Godel

    Indian Academy of Sciences (India)

    GENERAL I ARTICLE. The Completeness Theorem of Godel. 2. Henkin's Proof for First Order Logic. S M Srivastava is with the. Indian Statistical,. Institute, Calcutta. He received his PhD from the Indian Statistical. Institute in 1980. His research interests are in descriptive set theory. I Part 1. An Introduction to Math- ematical ...

  11. Spring valve for well completion

    Energy Technology Data Exchange (ETDEWEB)

    Gorbatov, P T

    1966-07-22

    A spring-loaded valve for well completion consists of a housing with a spring-loaded closing element. In order to protect the closing element from corrosion which might lower the pressure drop, the closing element is made in the form of a piston. It is tightly connected with sealing elements. The housing has orifices, overlapping the piston in the initial position.

  12. Largest particle detector nearing completion

    CERN Multimedia

    2006-01-01

    "Construction of another part of the Large Hadron Collider (LHC), the worl's largest particle accelerator at CERN in Switzerland, is nearing completion. The Compact Muon Solenoid (CMS) is oner of the LHC project's four large particle detectors. (1/2 page)

  13. YB0 SERVICES INSTALLATION COMPLETED

    CERN Document Server

    The beauty of the completed YB0 was briefly visible at P5 as preparations continue for Tracker installation. A tremendous effort, lasting 7 months and involving more than 100 workers on the busiest days, resulted in 5700 electrical cables, 780 optical cables with 65k fibre channels, and 550 pipes laid on YB0 for HB, EB and Tracker.

  14. Morphological homoplasy, life history evolution, and historical biogeography of plethodontid salamanders inferred from complete mitochondrial genomes

    Energy Technology Data Exchange (ETDEWEB)

    Mueller, Rachel Lockridge; Macey, J. Robert; Jaekel, Martin; Wake, David B.; Boore, Jeffrey L.

    2004-08-01

    The evolutionary history of the largest salamander family (Plethodontidae) is characterized by extreme morphological homoplasy. Analysis of the mechanisms generating such homoplasy requires an independent, molecular phylogeny. To this end, we sequenced 24 complete mitochondrial genomes (22 plethodontids and two outgroup taxa), added data for three species from GenBank, and performed partitioned and unpartitioned Bayesian, ML, and MP phylogenetic analyses. We explored four dataset partitioning strategies to account for evolutionary process heterogeneity among genes and codon positions, all of which yielded increased model likelihoods and decreased numbers of supported nodes in the topologies (PP > 0.95) relative to the unpartitioned analysis. Our phylogenetic analyses yielded congruent trees that contrast with the traditional morphology-based taxonomy; the monophyly of three out of four major groups is rejected. Reanalysis of current hypotheses in light of these new evolutionary relationships suggests that (1) a larval life history stage re-evolved from a direct-developing ancestor multiple times, (2) there is no phylogenetic support for the ''Out of Appalachia'' hypothesis of plethodontid origins, and (3) novel scenarios must be reconstructed for the convergent evolution of projectile tongues, reduction in toe number, and specialization for defensive tail loss. Some of these novel scenarios imply morphological transformation series that proceed in the opposite direction than was previously thought. In addition, they suggest surprising evolutionary lability in traits previously interpreted to be conservative.

  15. The new Planetary Science Archive: A tool for exploration and discovery of scientific datasets from ESA's planetary missions

    Science.gov (United States)

    Heather, David

    2016-07-01

    Introduction: The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces (e.g. FTP browser, Map based, Advanced search, and Machine interface): http://archives.esac.esa.int/psa All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. Updating the PSA: The PSA is currently implementing a number of significant changes, both to its web-based interface to the scientific community, and to its database structure. The new PSA will be up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's upcoming ExoMars and BepiColombo missions. The newly designed PSA homepage will provide direct access to scientific datasets via a text search for targets or missions. This will significantly reduce the complexity for users to find their data and will promote one-click access to the datasets. Additionally, the homepage will provide direct access to advanced views and searches of the datasets. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. A login mechanism will provide additional functionalities to the users to aid / ease their searches (e.g. saving queries, managing default views). Queries to the PSA database will be possible either via the homepage (for simple searches of missions or targets), or through a filter menu for more tailored queries. The filter menu will offer multiple options to search for a particular dataset or product, and will manage queries for both in-situ and remote sensing instruments. Parameters such as start-time, phase angle, and heliocentric distance will be emphasized. A further

  16. Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

    Directory of Open Access Journals (Sweden)

    Sai Kiranmayee Samudrala

    2015-01-01

    Full Text Available Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.

  17. The Path from Large Earth Science Datasets to Information

    Science.gov (United States)

    Vicente, G. A.

    2013-12-01

    The NASA Goddard Earth Sciences Data (GES) and Information Services Center (DISC) is one of the major Science Mission Directorate (SMD) for archiving and distribution of Earth Science remote sensing data, products and services. This virtual portal provides convenient access to Atmospheric Composition and Dynamics, Hydrology, Precipitation, Ozone, and model derived datasets (generated by GSFC's Global Modeling and Assimilation Office), the North American Land Data Assimilation System (NLDAS) and the Global Land Data Assimilation System (GLDAS) data products (both generated by GSFC's Hydrological Sciences Branch). This presentation demonstrates various tools and computational technologies developed in the GES DISC to manage the huge volume of data and products acquired from various missions and programs over the years. It explores approaches to archive, document, distribute, access and analyze Earth Science data and information as well as addresses the technical and scientific issues, governance and user support problem faced by scientists in need of multi-disciplinary datasets. It also discusses data and product metrics, user distribution profiles and lessons learned through interactions with the science communities around the world. Finally it demonstrates some of the most used data and product visualization and analyses tools developed and maintained by the GES DISC.

  18. BLAST-EXPLORER helps you building datasets for phylogenetic analysis

    Directory of Open Access Journals (Sweden)

    Claverie Jean-Michel

    2010-01-01

    Full Text Available Abstract Background The right sampling of homologous sequences for phylogenetic or molecular evolution analyses is a crucial step, the quality of which can have a significant impact on the final interpretation of the study. There is no single way for constructing datasets suitable for phylogenetic analysis, because this task intimately depends on the scientific question we want to address, Moreover, database mining softwares such as BLAST which are routinely used for searching homologous sequences are not specifically optimized for this task. Results To fill this gap, we designed BLAST-Explorer, an original and friendly web-based application that combines a BLAST search with a suite of tools that allows interactive, phylogenetic-oriented exploration of the BLAST results and flexible selection of homologous sequences among the BLAST hits. Once the selection of the BLAST hits is done using BLAST-Explorer, the corresponding sequence can be imported locally for external analysis or passed to the phylogenetic tree reconstruction pipelines available on the Phylogeny.fr platform. Conclusions BLAST-Explorer provides a simple, intuitive and interactive graphical representation of the BLAST results and allows selection and retrieving of the BLAST hit sequences based a wide range of criterions. Although BLAST-Explorer primarily aims at helping the construction of sequence datasets for further phylogenetic study, it can also be used as a standard BLAST server with enriched output. BLAST-Explorer is available at http://www.phylogeny.fr

  19. Multiresolution comparison of precipitation datasets for large-scale models

    Science.gov (United States)

    Chun, K. P.; Sapriza Azuri, G.; Davison, B.; DeBeer, C. M.; Wheater, H. S.

    2014-12-01

    Gridded precipitation datasets are crucial for driving large-scale models which are related to weather forecast and climate research. However, the quality of precipitation products is usually validated individually. Comparisons between gridded precipitation products along with ground observations provide another avenue for investigating how the precipitation uncertainty would affect the performance of large-scale models. In this study, using data from a set of precipitation gauges over British Columbia and Alberta, we evaluate several widely used North America gridded products including the Canadian Gridded Precipitation Anomalies (CANGRD), the National Center for Environmental Prediction (NCEP) reanalysis, the Water and Global Change (WATCH) project, the thin plate spline smoothing algorithms (ANUSPLIN) and Canadian Precipitation Analysis (CaPA). Based on verification criteria for various temporal and spatial scales, results provide an assessment of possible applications for various precipitation datasets. For long-term climate variation studies (~100 years), CANGRD, NCEP, WATCH and ANUSPLIN have different comparative advantages in terms of their resolution and accuracy. For synoptic and mesoscale precipitation patterns, CaPA provides appealing performance of spatial coherence. In addition to the products comparison, various downscaling methods are also surveyed to explore new verification and bias-reduction methods for improving gridded precipitation outputs for large-scale models.

  20. Benchmarking Deep Learning Models on Large Healthcare Datasets.

    Science.gov (United States)

    Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan

    2018-06-04

    Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.