WorldWideScience

Sample records for link datasets generated

  1. KinLinks: Software Toolkit for Kinship Analysis and Pedigree Generation from NGS Datasets

    2015-04-21

    combined into multi-generation 2 pedigrees via a set of heuristics to resolve relationship type (half siblings vs avuncular vs grandparent...this metric is able to differentiate parent- child relationships from siblings . • Kinship Coefficient as calculated by the KING algorithm [22]. The...second classifier predicts the exact relationship among a pair of samples (i.e. parent/ child , sibling , grandparent, avuncular, cousin, unrelated). Both

  2. PROVIDING GEOGRAPHIC DATASETS AS LINKED DATA IN SDI

    E. Hietanen

    2016-06-01

    Full Text Available In this study, a prototype service to provide data from Web Feature Service (WFS as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF data format. Next, a Web Ontology Language (OWL ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID. The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

  3. The SAIL databank: linking multiple health and social care datasets

    Ford David V

    2009-01-01

    Full Text Available Abstract Background Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress. Methods Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR, to assess the efficacy of this process, and the optimum matching technique. Results The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL at the 50% threshold, and error rates were Conclusion With the infrastructure that has been put in place, the reliable matching process that has been developed enables an ALF to be consistently allocated to records in the databank. The SAIL databank represents a research-ready platform for record-linkage studies.

  4. The SAIL databank: linking multiple health and social care datasets.

    Lyons, Ronan A; Jones, Kerina H; John, Gareth; Brooks, Caroline J; Verplancke, Jean-Philippe; Ford, David V; Brown, Ginevra; Leake, Ken

    2009-01-16

    Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage) databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress. Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF) to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage) was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR), to assess the efficacy of this process, and the optimum matching technique. The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL) at the 50% threshold, and error rates were SAIL databank represents a research-ready platform for record-linkage studies.

  5. Associating uncertainty with datasets using Linked Data and allowing propagation via provenance chains

    Car, Nicholas; Cox, Simon; Fitch, Peter

    2015-04-01

    With earth-science datasets increasingly being published to enable re-use in projects disassociated from the original data acquisition or generation, there is an urgent need for associated metadata to be connected, in order to guide their application. In particular, provenance traces should support the evaluation of data quality and reliability. However, while standards for describing provenance are emerging (e.g. PROV-O), these do not include the necessary statistical descriptors and confidence assessments. UncertML has a mature conceptual model that may be used to record uncertainty metadata. However, by itself UncertML does not support the representation of uncertainty of multi-part datasets, and provides no direct way of associating the uncertainty information - metadata in relation to a dataset - with dataset objects.We present a method to address both these issues by combining UncertML with PROV-O, and delivering resulting uncertainty-enriched provenance traces through the Linked Data API. UncertProv extends the PROV-O provenance ontology with an RDF formulation of the UncertML conceptual model elements, adds further elements to support uncertainty representation without a conceptual model and the integration of UncertML through links to documents. The Linked ID API provides a systematic way of navigating from dataset objects to their UncertProv metadata and back again. The Linked Data API's 'views' capability enables access to UncertML and non-UncertML uncertainty metadata representations for a dataset. With this approach, it is possible to access and navigate the uncertainty metadata associated with a published dataset using standard semantic web tools, such as SPARQL queries. Where the uncertainty data follows the UncertML model it can be automatically interpreted and may also support automatic uncertainty propagation . Repositories wishing to enable uncertainty propagation for all datasets must ensure that all elements that are associated with uncertainty

  6. Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator

    Seyed, P.; Chastain, K.; McGuinness, D. L.

    2013-12-01

    Use of Semantic Web technologies for data management in the Earth sciences (and beyond) has great potential but is still in its early stages, since the challenges of translating data into a more explicit or semantic form for immediate use within applications has not been fully addressed. In this abstract we help address this challenge by introducing the SemantEco Annotator, which enables anyone, regardless of expertise, to semantically annotate tabular Earth Science data and translate it into linked data format, while applying the logic inherent in community-standard vocabularies to guide the process. The Annotator was conceived under a desire to unify dataset content from a variety of sources under common vocabularies, for use in semantically-enabled web applications. Our current use case employs linked data generated by the Annotator for use in the SemantEco environment, which utilizes semantics to help users explore, search, and visualize water or air quality measurement and species occurrence data through a map-based interface. The generated data can also be used immediately to facilitate discovery and search capabilities within 'big data' environments. The Annotator provides a method for taking information about a dataset, that may only be known to its maintainers, and making it explicit, in a uniform and machine-readable fashion, such that a person or information system can more easily interpret the underlying structure and meaning. Its primary mechanism is to enable a user to formally describe how columns of a tabular dataset relate and/or describe entities. For example, if a user identifies columns for latitude and longitude coordinates, we can infer the data refers to a point that can be plotted on a map. Further, it can be made explicit that measurements of 'nitrate' and 'NO3-' are of the same entity through vocabulary assignments, thus more easily utilizing data sets that use different nomenclatures. The Annotator provides an extensive and searchable

  7. Validity and reliability of stillbirth data using linked self-reported and administrative datasets.

    Hure, Alexis J; Chojenta, Catherine L; Powers, Jennifer R; Byles, Julie E; Loxton, Deborah

    2015-01-01

    A high rate of stillbirth was previously observed in the Australian Longitudinal Study of Women's Health (ALSWH). Our primary objective was to test the validity and reliability of self-reported stillbirth data linked to state-based administrative datasets. Self-reported data, collected as part of the ALSWH cohort born in 1973-1978, were linked to three administrative datasets for women in New South Wales, Australia (n = 4374): the Midwives Data Collection; Admitted Patient Data Collection; and Perinatal Death Review Database. Linkages were obtained from the Centre for Health Record Linkage for the period 1996-2009. True cases of stillbirth were defined by being consistently recorded in two or more independent data sources. Sensitivity, specificity, positive predictive value, negative predictive value, percent agreement, and kappa statistics were calculated for each dataset. Forty-nine women reported 53 stillbirths. No dataset was 100% accurate. The administrative datasets performed better than self-reported data, with high accuracy and agreement. Self-reported data showed high sensitivity (100%) but low specificity (30%), meaning women who had a stillbirth always reported it, but there was also over-reporting of stillbirths. About half of the misreported cases in the ALSWH were able to be removed by identifying inconsistencies in longitudinal data. Data linkage provides great opportunity to assess the validity and reliability of self-reported study data. Conversely, self-reported study data can help to resolve inconsistencies in administrative datasets. Quantifying the strengths and limitations of both self-reported and administrative data can improve epidemiological research, especially by guiding methods and interpretation of findings.

  8. GUDM: Automatic Generation of Unified Datasets for Learning and Reasoning in Healthcare.

    Ali, Rahman; Siddiqi, Muhammad Hameed; Idris, Muhammad; Ali, Taqdir; Hussain, Shujaat; Huh, Eui-Nam; Kang, Byeong Ho; Lee, Sungyoung

    2015-07-02

    A wide array of biomedical data are generated and made available to healthcare experts. However, due to the diverse nature of data, it is difficult to predict outcomes from it. It is therefore necessary to combine these diverse data sources into a single unified dataset. This paper proposes a global unified data model (GUDM) to provide a global unified data structure for all data sources and generate a unified dataset by a "data modeler" tool. The proposed tool implements user-centric priority based approach which can easily resolve the problems of unified data modeling and overlapping attributes across multiple datasets. The tool is illustrated using sample diabetes mellitus data. The diverse data sources to generate the unified dataset for diabetes mellitus include clinical trial information, a social media interaction dataset and physical activity data collected using different sensors. To realize the significance of the unified dataset, we adopted a well-known rough set theory based rules creation process to create rules from the unified dataset. The evaluation of the tool on six different sets of locally created diverse datasets shows that the tool, on average, reduces 94.1% time efforts of the experts and knowledge engineer while creating unified datasets.

  9. CHARMe Commentary metadata for Climate Science: collecting, linking and sharing user feedback on climate datasets

    Blower, Jon; Lawrence, Bryan; Kershaw, Philip; Nagni, Maurizio

    2014-05-01

    The research process can be thought of as an iterative activity, initiated based on prior domain knowledge, as well on a number of external inputs, and producing a range of outputs including datasets, studies and peer reviewed publications. These outputs may describe the problem under study, the methodology used, the results obtained, etc. In any new publication, the author may cite or comment other papers or datasets in order to support their research hypothesis. However, as their work progresses, the researcher may draw from many other latent channels of information. These could include for example, a private conversation following a lecture or during a social dinner; an opinion expressed concerning some significant event such as an earthquake or for example a satellite failure. In addition, other sources of information of grey literature are important public such as informal papers such as the arxiv deposit, reports and studies. The climate science community is not an exception to this pattern; the CHARMe project, funded under the European FP7 framework, is developing an online system for collecting and sharing user feedback on climate datasets. This is to help users judge how suitable such climate data are for an intended application. The user feedback could be comments about assessments, citations, or provenance of the dataset, or other information such as descriptions of uncertainty or data quality. We define this as a distinct category of metadata called Commentary or C-metadata. We link C-metadata with target climate datasets using a Linked Data approach via the Open Annotation data model. In the context of Linked Data, C-metadata plays the role of a resource which, depending on its nature, may be accessed as simple text or as more structured content. The project is implementing a range of software tools to create, search or visualize C-metadata including a JavaScript plugin enabling this functionality to be integrated in situ with data provider portals

  10. FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets.

    Shcherbina, Anna

    2014-08-15

    High-throughput next generation sequencing technologies have enabled rapid characterization of clinical and environmental samples. Consequently, the largest bottleneck to actionable data has become sample processing and bioinformatics analysis, creating a need for accurate and rapid algorithms to process genetic data. Perfectly characterized in silico datasets are a useful tool for evaluating the performance of such algorithms. Background contaminating organisms are observed in sequenced mixtures of organisms. In silico samples provide exact truth. To create the best value for evaluating algorithms, in silico data should mimic actual sequencer data as closely as possible. FASTQSim is a tool that provides the dual functionality of NGS dataset characterization and metagenomic data generation. FASTQSim is sequencing platform-independent, and computes distributions of read length, quality scores, indel rates, single point mutation rates, indel size, and similar statistics for any sequencing platform. To create training or testing datasets, FASTQSim has the ability to convert target sequences into in silico reads with specific error profiles obtained in the characterization step. FASTQSim enables users to assess the quality of NGS datasets. The tool provides information about read length, read quality, repetitive and non-repetitive indel profiles, and single base pair substitutions. FASTQSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software. In this regard, in silico datasets generated with the FASTQsim tool hold several advantages over natural datasets: they are sequencing platform independent, extremely well characterized, and less expensive to generate. Such datasets are valuable in a number of applications, including the training of assemblers for multiple platforms, benchmarking bioinformatics algorithm performance, and creating challenge

  11. Evolving hard problems: Generating human genetics datasets with a complex etiology

    Himmelstein Daniel S

    2011-07-01

    Full Text Available Abstract Background A goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models. Results Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate eight-hundred Pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variants have been minimized, while the predictiveness of third, fourth, or fifth-order combinations is maximized. Two hundred runs of the algorithm are further dedicated to creating datasets with predictive four or five order interactions and minimized lower-level effects. Conclusions This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This allows researchers to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire Pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 76,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/.

  12. ProDaMa: an open source Python library to generate protein structure datasets.

    Armano, Giuliano; Manconi, Andrea

    2009-10-02

    The huge difference between the number of known sequences and known tertiary structures has justified the use of automated methods for protein analysis. Although a general methodology to solve these problems has not been yet devised, researchers are engaged in developing more accurate techniques and algorithms whose training plays a relevant role in determining their performance. From this perspective, particular importance is given to the training data used in experiments, and researchers are often engaged in the generation of specialized datasets that meet their requirements. To facilitate the task of generating specialized datasets we devised and implemented ProDaMa, an open source Python library than provides classes for retrieving, organizing, updating, analyzing, and filtering protein data. ProDaMa has been used to generate specialized datasets useful for secondary structure prediction and to develop a collaborative web application aimed at generating and sharing protein structure datasets. The library, the related database, and the documentation are freely available at the URL http://iasc.diee.unica.it/prodama.

  13. ProDaMa: an open source Python library to generate protein structure datasets

    Manconi Andrea

    2009-10-01

    Full Text Available Abstract Background The huge difference between the number of known sequences and known tertiary structures has justified the use of automated methods for protein analysis. Although a general methodology to solve these problems has not been yet devised, researchers are engaged in developing more accurate techniques and algorithms whose training plays a relevant role in determining their performance. From this perspective, particular importance is given to the training data used in experiments, and researchers are often engaged in the generation of specialized datasets that meet their requirements. Findings To facilitate the task of generating specialized datasets we devised and implemented ProDaMa, an open source Python library than provides classes for retrieving, organizing, updating, analyzing, and filtering protein data. Conclusion ProDaMa has been used to generate specialized datasets useful for secondary structure prediction and to develop a collaborative web application aimed at generating and sharing protein structure datasets. The library, the related database, and the documentation are freely available at the URL http://iasc.diee.unica.it/prodama.

  14. A method for generating large datasets of organ geometries for radiotherapy treatment planning studies

    Hu, Nan; Cerviño, Laura; Segars, Paul; Lewis, John; Shan, Jinlu; Jiang, Steve; Zheng, Xiaolin; Wang, Ge

    2014-01-01

    With the rapidly increasing application of adaptive radiotherapy, large datasets of organ geometries based on the patient’s anatomy are desired to support clinical application or research work, such as image segmentation, re-planning, and organ deformation analysis. Sometimes only limited datasets are available in clinical practice. In this study, we propose a new method to generate large datasets of organ geometries to be utilized in adaptive radiotherapy. Given a training dataset of organ shapes derived from daily cone-beam CT, we align them into a common coordinate frame and select one of the training surfaces as reference surface. A statistical shape model of organs was constructed, based on the establishment of point correspondence between surfaces and non-uniform rational B-spline (NURBS) representation. A principal component analysis is performed on the sampled surface points to capture the major variation modes of each organ. A set of principal components and their respective coefficients, which represent organ surface deformation, were obtained, and a statistical analysis of the coefficients was performed. New sets of statistically equivalent coefficients can be constructed and assigned to the principal components, resulting in a larger geometry dataset for the patient’s organs. These generated organ geometries are realistic and statistically representative

  15. Macroeconomic dataset for generating macroeconomic volatility among selected countries in the Asia Pacific region

    Chow, Yee Peng; Muhammad, Junaina; Amin Noordin, Bany Ariffin; Cheng, Fan Fah

    2017-01-01

    This data article provides macroeconomic data that can be used to generate macroeconomic volatility. The data cover a sample of seven selected countries in the Asia Pacific region for the period 2004–2014, including both developing and developed countries. This dataset was generated to enhance our understanding of the sources of macroeconomic volatility affecting the countries in this region. Although the Asia Pacific region continues to remain as the most dynamic part of the world's economy,...

  16. A conceptual prototype for the next-generation national elevation dataset

    Stoker, Jason M.; Heidemann, Hans Karl; Evans, Gayla A.; Greenlee, Susan K.

    2013-01-01

    In 2012 the U.S. Geological Survey's (USGS) National Geospatial Program (NGP) funded a study to develop a conceptual prototype for a new National Elevation Dataset (NED) design with expanded capabilities to generate and deliver a suite of bare earth and above ground feature information over the United States. This report details the research on identifying operational requirements based on prior research, evaluation of what is needed for the USGS to meet these requirements, and development of a possible conceptual framework that could potentially deliver the kinds of information that are needed to support NGP's partners and constituents. This report provides an initial proof-of-concept demonstration using an existing dataset, and recommendations for the future, to inform NGP's ongoing and future elevation program planning and management decisions. The demonstration shows that this type of functional process can robustly create derivatives from lidar point cloud data; however, more research needs to be done to see how well it extends to multiple datasets.

  17. One tree to link them all: a phylogenetic dataset for the European tetrapoda.

    Roquet, Cristina; Lavergne, Sébastien; Thuiller, Wilfried

    2014-08-08

    Since the ever-increasing availability of phylogenetic informative data, the last decade has seen an upsurge of ecological studies incorporating information on evolutionary relationships among species. However, detailed species-level phylogenies are still lacking for many large groups and regions, which are necessary for comprehensive large-scale eco-phylogenetic analyses. Here, we provide a dataset of 100 dated phylogenetic trees for all European tetrapods based on a mixture of supermatrix and supertree approaches. Phylogenetic inference was performed separately for each of the main Tetrapoda groups of Europe except mammals (i.e. amphibians, birds, squamates and turtles) by means of maximum likelihood (ML) analyses of supermatrix applying a tree constraint at the family (amphibians and squamates) or order (birds and turtles) levels based on consensus knowledge. For each group, we inferred 100 ML trees to be able to provide a phylogenetic dataset that accounts for phylogenetic uncertainty, and assessed node support with bootstrap analyses. Each tree was dated using penalized-likelihood and fossil calibration. The trees obtained were well-supported by existing knowledge and previous phylogenetic studies. For mammals, we modified the most complete supertree dataset available on the literature to include a recent update of the Carnivora clade. As a final step, we merged the phylogenetic trees of all groups to obtain a set of 100 phylogenetic trees for all European Tetrapoda species for which data was available (91%). We provide this phylogenetic dataset (100 chronograms) for the purpose of comparative analyses, macro-ecological or community ecology studies aiming to incorporate phylogenetic information while accounting for phylogenetic uncertainty.

  18. Datasets linking ethnic perceptions to undergraduate students learning outcomes in a Nigerian Tertiary Institution

    Joke A. Badejo

    2018-06-01

    Full Text Available This data article represents academic performances of undergraduate students in a select Nigerian Private Tertiary institution from 2008 to 2013. The 2413 dataset categorizes students with respect to their origins (ethnicity, pre-university admission scores and Cumulative Grade Point Averages earned at the end of their study at the university. We present a descriptive statistics showing mean, median, mode, maximum, minimum, range, standard deviation and variance in the performances of these students and a boxplot representation of the performances of these students with respect to their origins. Keywords: Learning analytics, Cultural impact, Ethnicity, Undergraduates, Education data mining, Smart campus, Nigerian university

  19. A Linked and Open Dataset from a Network of Learning Repositories on Organic Agriculture

    Rajabi, Enayat; Sanchez-Alonso, Salvador; Sicilia, Miguel-Angel; Manouselis, Nikos

    2017-01-01

    Exposing eLearning objects on the Web of Data leads to sharing and reusing of educational resources and improves the interoperability of data on the Web. Furthermore, it enriches e-learning content, as it is connected to other valuable resources using the Linked Data principles. This paper describes a study performed on the Organic.Edunet…

  20. Proposed inverter to link generators at PPPL

    Lawn, F.; Huttar, D.E.

    1983-01-01

    Large magnetic confinement experiments and the associated energy conversion systems which power the coils of such machines are a challenge to the energy supply engineer. Characteristically, the energy required per pulse is measured in gigajoules, the peak power level is hundreds of megawatts and the power supplies present a power factor of 0.6 or less to the AC supply system during the actual experiment. Direct coil supply systems such as homopolar generators and commutated DC generators are used in some cases but are not as flexible as AC power distribution systems feeding solid state rectifiers. When PPPL required high pulsed energy for the PLT and PDX machines, stored energy was obtained from existing MG sets at C-Site. These consist of three shafts each driven by a 7,000 horsepower AC motor, each shaft drives four DC generators and a flywheel. These generators combined with utility fed, static power supplies provided the energy for these machines

  1. An approach for generating synthetic fine temporal resolution solar radiation time series from hourly gridded datasets

    Matthew Perry

    2017-06-01

    Full Text Available A tool has been developed to statistically increase the temporal resolution of solar irradiance time series. Fine temporal resolution time series are an important input into the planning process for solar power plants, and lead to increased understanding of the likely short-term variability of solar energy. The approach makes use of the spatial variability of hourly gridded datasets around a location of interest to make inferences about the temporal variability within the hour. The unique characteristics of solar irradiance data are modelled by classifying each hour into a typical weather situation. Low variability situations are modelled using an autoregressive process which is applied to ramps of clear-sky index. High variability situations are modelled as a transition between states of clear sky conditions and different levels of cloud opacity. The methods have been calibrated to Australian conditions using 1 min data from four ground stations for a 10 year period. These stations, together with an independent dataset, have also been used to verify the quality of the results using a number of relevant metrics. The results show that the method generates realistic fine resolution synthetic time series. The synthetic time series correlate well with observed data on monthly and annual timescales as they are constrained to the nearest grid-point value on each hour. The probability distributions of the synthetic and observed global irradiance data are similar, with Kolmogorov-Smirnov test statistic less than 0.04 at each station. The tool could be useful for the estimation of solar power output for integration studies.

  2. Validation of a Meteosat Second Generation solar radiation dataset over the northeastern Iberian Peninsula

    J. Cristóbal

    2013-01-01

    Full Text Available Solar radiation plays a key role in the Earth's energy balance and is used as an essential input data in radiation-based evapotranspiration (ET models. Accurate gridded solar radiation data at high spatial and temporal resolution are needed to retrieve ET over large domains. In this work we present an evaluation at hourly, daily and monthly time steps and regional scale (Catalonia, NE Iberian Peninsula of a satellite-based solar radiation product developed by the Land Surface Analysis Satellite Application Facility (LSA SAF using data from the Meteosat Second Generation (MSG Spinning Enhanced Visible and Infrared Imager (SEVIRI. Product performance and accuracy were evaluated for datasets segmented into two terrain classes (flat and hilly areas and two atmospheric conditions (clear and cloudy sky, as well as for the full dataset as a whole. Evaluation against measurements made with ground-based pyranometers yielded good results in flat areas with an averaged model RMSE of 65 W m−2 (19%, 34 W m−2 (9.7% and 21 W m−2 (5.6%, for hourly, daily and monthly-averaged solar radiation and including clear and cloudy sky conditions and snow or ice cover. Hilly areas yielded intermediate results with an averaged model RMSE (root mean square error of 89 W m−2 (27%, 48 W m−2 (14.5% and 32 W m−2 (9.3%, for hourly, daily and monthly time steps, suggesting the need of further improvements (e.g., terrain corrections required for retrieving localized variability in solar radiation in these areas. According to the literature, the LSA SAF solar radiation product appears to have sufficient accuracy to serve as a useful and operative input to evaporative flux retrieval models.

  3. Macroeconomic dataset for generating macroeconomic volatility among selected countries in the Asia Pacific region.

    Chow, Yee Peng; Muhammad, Junaina; Amin Noordin, Bany Ariffin; Cheng, Fan Fah

    2018-02-01

    This data article provides macroeconomic data that can be used to generate macroeconomic volatility. The data cover a sample of seven selected countries in the Asia Pacific region for the period 2004-2014, including both developing and developed countries. This dataset was generated to enhance our understanding of the sources of macroeconomic volatility affecting the countries in this region. Although the Asia Pacific region continues to remain as the most dynamic part of the world's economy, it is not spared from various sources of macroeconomic volatility through the decades. The reported data cover 15 types of macroeconomic data series, representing three broad categories of indicators that can be used to proxy macroeconomic volatility. They are indicators that account for macroeconomic volatility (i.e. volatility as a macroeconomic outcome), domestic sources of macroeconomic volatility and external sources of macroeconomic volatility. In particular, the selected countries are Malaysia, Thailand, Indonesia and Philippines, which are regarded as developing countries, while Singapore, Japan and Australia are developed countries. Despite the differences in level of economic development, these countries were affected by similar sources of macroeconomic volatility such as the Asian Financial Crisis and the Global Financial Crisis. These countries were also affected by other similar external turbulence arising from factors such as the global economic slowdown, geopolitical risks in the Middle East and volatile commodity prices. Nonetheless, there were also sources of macroeconomic volatility which were peculiar to certain countries only. These were generally domestic sources of volatility such as political instability (for Thailand, Indonesia and Philippines), natural disasters and anomalous weather conditions (for Thailand, Indonesia, Philippines, Japan and Australia) and over-dependence on the electronic sector (for Singapore).

  4. Macroeconomic dataset for generating macroeconomic volatility among selected countries in the Asia Pacific region

    Yee Peng Chow

    2018-02-01

    Full Text Available This data article provides macroeconomic data that can be used to generate macroeconomic volatility. The data cover a sample of seven selected countries in the Asia Pacific region for the period 2004–2014, including both developing and developed countries. This dataset was generated to enhance our understanding of the sources of macroeconomic volatility affecting the countries in this region. Although the Asia Pacific region continues to remain as the most dynamic part of the world's economy, it is not spared from various sources of macroeconomic volatility through the decades. The reported data cover 15 types of macroeconomic data series, representing three broad categories of indicators that can be used to proxy macroeconomic volatility. They are indicators that account for macroeconomic volatility (i.e. volatility as a macroeconomic outcome, domestic sources of macroeconomic volatility and external sources of macroeconomic volatility. In particular, the selected countries are Malaysia, Thailand, Indonesia and Philippines, which are regarded as developing countries, while Singapore, Japan and Australia are developed countries. Despite the differences in level of economic development, these countries were affected by similar sources of macroeconomic volatility such as the Asian Financial Crisis and the Global Financial Crisis. These countries were also affected by other similar external turbulence arising from factors such as the global economic slowdown, geopolitical risks in the Middle East and volatile commodity prices. Nonetheless, there were also sources of macroeconomic volatility which were peculiar to certain countries only. These were generally domestic sources of volatility such as political instability (for Thailand, Indonesia and Philippines, natural disasters and anomalous weather conditions (for Thailand, Indonesia, Philippines, Japan and Australia and over-dependence on the electronic sector (for Singapore. Keywords

  5. Development of a browser application to foster research on linking climate and health datasets: Challenges and opportunities.

    Hajat, Shakoor; Whitmore, Ceri; Sarran, Christophe; Haines, Andy; Golding, Brian; Gordon-Brown, Harriet; Kessel, Anthony; Fleming, Lora E

    2017-01-01

    Improved data linkages between diverse environment and health datasets have the potential to provide new insights into the health impacts of environmental exposures, including complex climate change processes. Initiatives that link and explore big data in the environment and health arenas are now being established. To encourage advances in this nascent field, this article documents the development of a web browser application to facilitate such future research, the challenges encountered to date, and how they were addressed. A 'storyboard approach' was used to aid the initial design and development of the application. The application followed a 3-tier architecture: a spatial database server for storing and querying data, server-side code for processing and running models, and client-side browser code for user interaction and for displaying data and results. The browser was validated by reproducing previously published results from a regression analysis of time-series datasets of daily mortality, air pollution and temperature in London. Data visualisation and analysis options of the application are presented. The main factors that shaped the development of the browser were: accessibility, open-source software, flexibility, efficiency, user-friendliness, licensing restrictions and data confidentiality, visualisation limitations, cost-effectiveness, and sustainability. Creating dedicated data and analysis resources, such as the one described here, will become an increasingly vital step in improving understanding of the complex interconnections between the environment and human health and wellbeing, whilst still ensuring appropriate confidentiality safeguards. The issues raised in this paper can inform the future development of similar tools by other researchers working in this field. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. The future of population registers: linking routine health datasets to assess a population's current glycaemic status for quality improvement.

    Chan, Wing Cheuk; Jackson, Gary; Wright, Craig Shawe; Orr-Walker, Brandon; Drury, Paul L; Boswell, D Ross; Lee, Mildred Ai Wei; Papa, Dean; Jackson, Rod

    2014-04-28

    To determine the diabetes screening levels and known glycaemic status of all individuals by age, gender and ethnicity within a defined geographic location in a timely and consistent way to potentially facilitate systematic disease prevention and management. Retrospective observational study. Auckland region of New Zealand. 1 475 347 people who had utilised publicly funded health service in New Zealand and domicile in the Auckland region of New Zealand in 2010. The health service utilisation population was individually linked to a comprehensive regional laboratory repository dating back to 2004. The two outcomes measures were glycaemia-related blood testing coverage (glycated haemoglobin (HbA1c), fasting and random glucose and glucose tolerance tests), and the proportions and number of people with known dysglycaemia in 2010 using modified American Diabetes Association (ADA) and WHO criteria. Within the health service utilisation population, 792 560 people had had at least one glucose or HbA1c blood test in the previous 5.5 years. Overall, 81% of males (n=198 086) and 87% of females (n=128 982) in the recommended age groups for diabetes screening had a blood test to assess their glycaemic status. The estimated age-standardised prevalence of dysglycaemia was highest in people of Pacific Island ethnicity at 11.4% (95% CI 11.2% to 11.5%) for males and 11.6% (11.4% to 11.8%) for females, followed closely by people of Indian ethnicity at 10.8% (10.6% to 11.1%) and 9.3% (9.1% to 9.6%), respectively. Among the indigenous Maori population, the prevalence was 8.2% (7.9% to 8.4%) and 7% (6.8% to 7.2%), while for 'Others' (mainly Europeans) it was 3% (3% to 3.1%) and 2.2% (2.1% to 2.2%), respectively. We have demonstrated that the data linkage between a laboratory repository and national administrative datasets has the potential to provide a systematic and consistent individual level clinical information that is relevant to medical auditing for a large geographically defined

  7. NGSUtils: a software suite for analyzing and manipulating next-generation sequencing datasets

    Breese, Marcus R.; Liu, Yunlong

    2013-01-01

    Summary: NGSUtils is a suite of software tools for manipulating data common to next-generation sequencing experiments, such as FASTQ, BED and BAM format files. These tools provide a stable and modular platform for data management and analysis.

  8. Generation of Ground Truth Datasets for the Analysis of 3d Point Clouds in Urban Scenes Acquired via Different Sensors

    Xu, Y.; Sun, Z.; Boerner, R.; Koch, T.; Hoegner, L.; Stilla, U.

    2018-04-01

    In this work, we report a novel way of generating ground truth dataset for analyzing point cloud from different sensors and the validation of algorithms. Instead of directly labeling large amount of 3D points requiring time consuming manual work, a multi-resolution 3D voxel grid for the testing site is generated. Then, with the help of a set of basic labeled points from the reference dataset, we can generate a 3D labeled space of the entire testing site with different resolutions. Specifically, an octree-based voxel structure is applied to voxelize the annotated reference point cloud, by which all the points are organized by 3D grids of multi-resolutions. When automatically annotating the new testing point clouds, a voting based approach is adopted to the labeled points within multiple resolution voxels, in order to assign a semantic label to the 3D space represented by the voxel. Lastly, robust line- and plane-based fast registration methods are developed for aligning point clouds obtained via various sensors. Benefiting from the labeled 3D spatial information, we can easily create new annotated 3D point clouds of different sensors of the same scene directly by considering the corresponding labels of 3D space the points located, which would be convenient for the validation and evaluation of algorithms related to point cloud interpretation and semantic segmentation.

  9. Calcium scoring with prospectively ECG-triggered CT: Using overlapping datasets generated with MPR decreases inter-scan variability

    Rutten, A.; Isgum, I.; Prokop, M.

    2011-01-01

    Objective: To examine the feasibility of reducing the inter-scan variability of prospectively ECG-triggered calcium-scoring scans by using overlapping 3-mm datasets generated from multiplanar reformation (MPR) instead of non-overlapping 3-mm or 1.5-mm datasets. Patients and methods: Seventy-five women (59-79 years old) underwent two sequential prospectively ECG-triggered calcium-scoring scans with 16 mm x 1.5 mm collimation in one session. Between the two scans patients got off and on the table. We performed calcium scoring (Agatston and mass scores) on the following datasets: contiguous 3-mm sections reconstructed from the raw data (A), contiguous 3-mm sections from MPR (B), overlapping 3-mm sections from MPR (C) and contiguous 1.5-mm sections from the raw data (D). To determine the feasibility of the MPR approach, we compared MPR (B) with direct raw data reconstruction (A). Inter-scan variability was calculated for each type of dataset (A-D). Results: Calcium scores ranged from 0 to 1455 (Agatston) and 0 to 279 mg (mass) for overlapping 3-mm sections (C). Calcium scores (both Agatston and mass) were nearly identical for MPR (B) and raw data approaches (A), with inter-quartile ranges of 0-1% for inter-scan variability. Median inter-scan variability with contiguous 3-mm sections (B) was 13% (Agatston) and 11% (mass). Median variability was reduced to 10% (Agatston and mass) with contiguous 1.5-mm sections (D) and to 8% (Agatston) and 7% (mass) with overlapping 3-mm MPR (A). Conclusion: Calcium scoring on MPR yields nearly identical results to calcium scoring on images directly reconstructed from raw data. Overlapping MPR from prospectively ECG-triggered scans improve inter-scan variability of calcium scoring without increasing patient radiation dose.

  10. Generation and importance of linked and irreducible moment diagrams in the recursive residue generation method

    Schek, I.; Wyatt, R.E.

    1986-01-01

    Molecular multiphoton processes are treated in the Recursive Residue Generation Method (A. Nauts and R.E. Wyatt, Phys. Rev. Lett 51, 2238 (1983)) by converting the molecular-field Hamiltonian matrix into tridiagonal form, using the Lanczos equations. In this study, the self-energies (diagonal) and linking (off-diagaonal) terms in the tridiagonal matrix are obtained by comparing linked moment diagrams in both representations. The dynamics of the source state is introduced and computed in terms of the linked and the irreducible moments

  11. Automatic Description Generation from Images : A Survey of Models, Datasets, and Evaluation Measures

    Bernardi, Raffaella; Cakici, Ruket; Elliott, Desmond; Erdem, Aykut; Erdem, Erkut; Ikizler-Cinbis, Nazli; Keller, Frank; Muscat, Adrian; Plank, Barbara

    2016-01-01

    Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the existing approaches based on how they conceptualize this problem,

  12. Modeling Boston: A workflow for the efficient generation and maintenance of urban building energy models from existing geospatial datasets

    Cerezo Davila, Carlos; Reinhart, Christoph F.; Bemis, Jamie L.

    2016-01-01

    City governments and energy utilities are increasingly focusing on the development of energy efficiency strategies for buildings as a key component in emission reduction plans and energy supply strategies. To support these diverse needs, a new generation of Urban Building Energy Models (UBEM) is currently being developed and validated to estimate citywide hourly energy demands at the building level. However, in order for cities to rely on UBEMs, effective model generation and maintenance workflows are needed based on existing urban data structures. Within this context, the authors collaborated with the Boston Redevelopment Authority to develop a citywide UBEM based on official GIS datasets and a custom building archetype library. Energy models for 83,541 buildings were generated and assigned one of 52 use/age archetypes, within the CAD modelling environment Rhinoceros3D. The buildings were then simulated using the US DOE EnergyPlus simulation program, and results for buildings of the same archetype were crosschecked against data from the US national energy consumption surveys. A district-level intervention combining photovoltaics with demand side management is presented to demonstrate the ability of UBEM to provide actionable information. Lack of widely available archetype templates and metered energy data, were identified as key barriers within existing workflows that may impede cities from effectively applying UBEM to guide energy policy. - Highlights: • Data requirements for Urban Building Energy Models are reviewed. • A workflow for UBEM generation from available GIS datasets is developed. • A citywide demand simulation model for Boston is generated and tested. • Limitations for UBEM in current urban data systems are identified and discussed. • Model application for energy management policy is shown in an urban PV scenario.

  13. Project Roadkill: Linking European Hare vehicle collisions with landscape-structure using datasets from citizen scientists and professionals

    Stretz, Carina; Heigl, Florian; Steiner, Wolfgang; Bauer, Thomas; Suppan, Franz; Zaller, Johann G.

    2015-04-01

    Road networks can implicate lots of negative effects for wildlife. One of the most important indication for strong landscape fragmentation are roadkills, i.e. collisions between motorised vehicles and wild animals. A species that is often involved in roadkills is the European hare (Lepus europaeus). European hare populations are in decline throughout Europe since the 1960s and classified as "potentially endangered" in the Red Data Book of Austria. Therefore, it is striking that in the hunting year 2013/14, 19,343 hares were killed on Austrian roads translating to 53 hare roadkills each day, or rather about two per hour. We hypothesized, that (I) hare-vehicle-collisions occur as an aggregation of events (hotspot), (II) the surrounding landscape influences the number of roadkilled hares and (III) roadkill data from citizen science projects and data from professionals (e.g. hunters, police) are convergent. Investigations on the surrounding landscape of the scenes of accidents will be carried out using land cover data derived from Landsat satellite images. Information on road kills are based on datasets from two different sources. One dataset stems from the citizen science project "Roadkill" (www.citizen-science.at/roadkill) where participants report roadkill findings via a web application. The second dataset is from a project where roadkill data were collected by the police and by hunters. Besides answering our research questions, findings of this project also allow the location of dangerous roadkill hotspots for animals and could be implemented in nature conservation actions.

  14. Development and Evaluation of a Methodology for the Generation of Gridded Isotopic Datasets

    Argiriou, A. A.; Salamalikis, V [University of Patras, Department of Physics, Laboratory of Atmospheric Physics, Patras (Greece); Lykoudis, S. P. [National Observatory of Athens, Institute of Environmental and Sustainable Development, Athens (Greece)

    2013-07-15

    The accurate knowledge of the spatial distribution of stable isotopes in precipitation is necessary for several applications. Since the number of rain sampling stations is small and unevenly distributed around the globe, the global distribution of stable isotopes can be calculated via the generation of gridded isotopic data sets. Several methods have been proposed for this purpose. In this work a methodology is proposed for the development of 10'x 10' gridded isotopic data from precipitation in the central and eastern Mediterranean. Statistical models are developed taking into account geographical and meteorological parameters as regressors. The residuals are interpolated onto the grid using ordinary kriging and thin plate splines. The result is added to the model grids, to obtain the final isotopic gridded data sets. Models are evaluated using an independent data set. the overall performance of the procedure is satisfactory and the obtained gridded data reproduce the isotopic parameters successfully. (author)

  15. Development and Evaluation of a Methodology for the Generation of Gridded Isotopic Datasets

    Argiriou, A.A.; Salamalikis, V; Lykoudis, S.P.

    2013-01-01

    The accurate knowledge of the spatial distribution of stable isotopes in precipitation is necessary for several applications. Since the number of rain sampling stations is small and unevenly distributed around the globe, the global distribution of stable isotopes can be calculated via the generation of gridded isotopic data sets. Several methods have been proposed for this purpose. In this work a methodology is proposed for the development of 10'x 10' gridded isotopic data from precipitation in the central and eastern Mediterranean. Statistical models are developed taking into account geographical and meteorological parameters as regressors. The residuals are interpolated onto the grid using ordinary kriging and thin plate splines. The result is added to the model grids, to obtain the final isotopic gridded data sets. Models are evaluated using an independent data set. the overall performance of the procedure is satisfactory and the obtained gridded data reproduce the isotopic parameters successfully. (author)

  16. Analysis of a comprehensive dataset of diversity generating retroelements generated by the program DiGReF

    Schillinger Thomas

    2012-08-01

    Full Text Available Abstract Background Diversity Generating Retroelements (DGRs are genetic cassettes that can introduce tremendous diversity into a short, defined region of the genome. They achieve hypermutation through replacement of the variable region with a strongly mutated cDNA copy generated by the element-encoded reverse transcriptase. In contrast to “selfish” retroelements such as group II introns and retrotransposons, DGRs impart an advantage to their host by increasing its adaptive potential. DGRs were discovered in a bacteriophage, but since then additional examples have been identified in some bacterial genomes. Results Here we present the program DiGReF that allowed us to comprehensively screen available databases for DGRs. We identified 155 DGRs which are found in all major classes of bacteria, though exhibiting sporadic distribution across species. Phylogenetic analysis and sequence comparison showed that DGRs move between genomes by associating with various mobile elements such as phages, transposons and plasmids. The DGR cassettes exhibit high flexibility in the arrangement of their components and easily acquire additional paralogous target genes. Surprisingly, the genomic data alone provide new insights into the molecular mechanism of DGRs. Most notably, our data suggest that the template RNA is transcribed separately from the rest of the element. Conclusions DiGReF is a valuable tool to detect DGRs in genome data. Its output allows comprehensive analysis of various aspects of DGR biology, thus deepening our understanding of the role DGRs play in prokaryotic genome plasticity, from the global down to the molecular level.

  17. The incidence of preeclampsia and eclampsia and associated maternal mortality in Australia from population-linked datasets: 2000-2008.

    Thornton, Charlene; Dahlen, Hannah; Korda, Andrew; Hennessy, Annemarie

    2013-06-01

    To determine the incidence of preeclampsia and eclampsia and associated mortality in Australia between 2000 and 2008. Analysis of statutorily collected datasets of singleton births in New South Wales using International Classification of Disease coding. Analyzed using cross tabulation, logistic regression, and means testing, where appropriate. The overall incidence of preeclampsia was 3.3% with a decrease from 4.6% to 2.3%. The overall rate of eclampsia was 8.6/10,000 births or 2.6% of preeclampsia cases, with an increase from 2.3% to 4.2%. The relative risk of eclampsia in preeclamptic women in 2008 was 1.9 (95% confidence interval, 1.28-2.92) when compared with the year 2000. The relative risk of a woman with preeclampsia/eclampsia dying in the first 12 months following birth compared with normotensive women is 5.1 (95% confidence interval, 3.07-8.60). Falling rates of preeclampsia have not equated to a decline in the incidence of eclampsia. An accurate rate of both preeclampsia and eclampsia is vital considering the considerable contribution that these diseases make to maternal mortality. The identification and treatment of eclampsia should remain a priority in the clinical setting. Copyright © 2013 Mosby, Inc. All rights reserved.

  18. How Do Astronomers Share Data? Reliability and Persistence of Datasets Linked in AAS Publications and a Qualitative Study of Data Practices among US Astronomers

    Pepe, Alberto; Goodman, Alyssa; Muench, August; Crosas, Merce; Erdmann, Christopher

    2014-08-01

    We analyze data sharing practices of astronomers over the past fifteen years. An analysis of URL links embedded in papers published by the American Astronomical Society reveals that the total number of links included in the literature rose dramatically from 1997 until 2005, when it leveled off at around 1500 per year. The analysis also shows that the availability of linked material decays with time: in 2011, 44% of links published a decade earlier, in 2001, were broken. A rough analysis of link types reveals that links to data hosted on astronomers' personal websites become unreachable much faster than links to datasets on curated institutional sites. To gauge astronomers' current data sharing practices and preferences further, we performed in-depth interviews with 12 scientists and online surveys with 173 scientists, all at a large astrophysical research institute in the United States: the Harvard-Smithsonian Center for Astrophysics, in Cambridge, MA. Both the in-depth interviews and the online survey indicate that, in principle, there is no philosophical objection to data-sharing among astronomers at this institution. Key reasons that more data are not presently shared more efficiently in astronomy include: the difficulty of sharing large data sets; over reliance on non-robust, non-reproducible mechanisms for sharing data (e.g. emailing it); unfamiliarity with options that make data-sharing easier (faster) and/or more robust; and, lastly, a sense that other researchers would not want the data to be shared. We conclude with a short discussion of a new effort to implement an easy-to-use, robust, system for data sharing in astronomy, at theastrodata.org, and we analyze the uptake of that system to-date.

  19. How do astronomers share data? Reliability and persistence of datasets linked in AAS publications and a qualitative study of data practices among US astronomers.

    Pepe, Alberto; Goodman, Alyssa; Muench, August; Crosas, Merce; Erdmann, Christopher

    2014-01-01

    We analyze data sharing practices of astronomers over the past fifteen years. An analysis of URL links embedded in papers published by the American Astronomical Society reveals that the total number of links included in the literature rose dramatically from 1997 until 2005, when it leveled off at around 1500 per year. The analysis also shows that the availability of linked material decays with time: in 2011, 44% of links published a decade earlier, in 2001, were broken. A rough analysis of link types reveals that links to data hosted on astronomers' personal websites become unreachable much faster than links to datasets on curated institutional sites. To gauge astronomers' current data sharing practices and preferences further, we performed in-depth interviews with 12 scientists and online surveys with 173 scientists, all at a large astrophysical research institute in the United States: the Harvard-Smithsonian Center for Astrophysics, in Cambridge, MA. Both the in-depth interviews and the online survey indicate that, in principle, there is no philosophical objection to data-sharing among astronomers at this institution. Key reasons that more data are not presently shared more efficiently in astronomy include: the difficulty of sharing large data sets; over reliance on non-robust, non-reproducible mechanisms for sharing data (e.g. emailing it); unfamiliarity with options that make data-sharing easier (faster) and/or more robust; and, lastly, a sense that other researchers would not want the data to be shared. We conclude with a short discussion of a new effort to implement an easy-to-use, robust, system for data sharing in astronomy, at theastrodata.org, and we analyze the uptake of that system to-date.

  20. How do astronomers share data? Reliability and persistence of datasets linked in AAS publications and a qualitative study of data practices among US astronomers.

    Alberto Pepe

    Full Text Available We analyze data sharing practices of astronomers over the past fifteen years. An analysis of URL links embedded in papers published by the American Astronomical Society reveals that the total number of links included in the literature rose dramatically from 1997 until 2005, when it leveled off at around 1500 per year. The analysis also shows that the availability of linked material decays with time: in 2011, 44% of links published a decade earlier, in 2001, were broken. A rough analysis of link types reveals that links to data hosted on astronomers' personal websites become unreachable much faster than links to datasets on curated institutional sites. To gauge astronomers' current data sharing practices and preferences further, we performed in-depth interviews with 12 scientists and online surveys with 173 scientists, all at a large astrophysical research institute in the United States: the Harvard-Smithsonian Center for Astrophysics, in Cambridge, MA. Both the in-depth interviews and the online survey indicate that, in principle, there is no philosophical objection to data-sharing among astronomers at this institution. Key reasons that more data are not presently shared more efficiently in astronomy include: the difficulty of sharing large data sets; over reliance on non-robust, non-reproducible mechanisms for sharing data (e.g. emailing it; unfamiliarity with options that make data-sharing easier (faster and/or more robust; and, lastly, a sense that other researchers would not want the data to be shared. We conclude with a short discussion of a new effort to implement an easy-to-use, robust, system for data sharing in astronomy, at theastrodata.org, and we analyze the uptake of that system to-date.

  1. Demonstrating the value of publishing open data by linking DOI-based citations of source datasets to uses in research and policy

    Copas, K.; Legind, J. K.; Hahn, A.; Braak, K.; Høftt, M.; Noesgaard, D.; Robertson, T.; Méndez Hernández, F.; Schigel, D.; Ko, C.

    2017-12-01

    GBIF—the Global Biodiversity Information Facility—has recently demonstrated a system that tracks publications back to individual datasets, giving data providers demonstrable evidence of the benefit and utility of sharing data to support an array of scholarly topics and practical applications. GBIF is an open-data network and research infrastructure funded by the world's governments. Its community consists of more than 90 formal participants and almost 1,000 data-publishing institutions, which currently make tens of thousands of datasets containing nearly 800 million species occurrence records freely and publicly available for discovery, use and reuse across a wide range of biodiversity-related research and policy investigations. Starting in 2015 with the help of DataONE, GBIF introduced DOIs as persistent identifiers for the datasets shared through its network. This enhancement soon extended to the assignment of DOIs to user downloads from GBIF.org, which typically filter the available records with a variety of taxonomic, geographic, temporal and other search terms. Despite the lack of widely accepted standards for citing data among researchers and publications, this technical infrastructure is beginning to take hold and support open, transparent, persistent and repeatable use and reuse of species occurrence data. These `download DOIs' provide canonical references for the search results researchers process and use in peer-reviewed articles—a practice GBIF encourages by confirming new DOIs with each download and offering guidelines on citation. GBIF has recently started linking these citation results back to dataset and publisher pages, offering more consistent, traceable evidence of the value of sharing data to support others' research. GBIF's experience may be a useful model for other repositories to follow.

  2. Linking the M&Rfi Weather Generator with Agrometeorological Models

    Dubrovsky, Martin; Trnka, Miroslav

    2015-04-01

    team focused on drought" No. CZ.1.07/2.3.00/20.0248. The weather generator is being developed within the frame of WG4VALUE project (LD12029), which is supported by Ministry of Education, Youth and Sports and linked to the COST action ES1102 VALUE.

  3. Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets

    Hoefsloot Huub CJ

    2009-05-01

    Full Text Available Abstract Background Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. Experimental design of mass-spectrometry studies has come under close scrutiny and the importance of strict protocols for sample collection is now understood. However, the question of how best to process the large quantities of data generated is still unanswered. Main challenges for the analysis are the choice of proper pre-processing and classification methods. While these two issues have been investigated in isolation, we propose to use the classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Results Two in-house generated clinical SELDI-TOF MS datasets are used in this study as an example of high throughput mass-spectrometry data. We perform a systematic comparison of two commonly used pre-processing methods as implemented in Ciphergen ProteinChip Software and in the Cromwell package. With respect to reproducibility, Ciphergen and Cromwell pre-processing are largely comparable. We find that the overlap between peaks detected by either Ciphergen ProteinChip Software or Cromwell is large. This is especially the case for the more stringent peak detection settings. Moreover, similarity of the estimated intensities between matched peaks is high. We evaluate the pre-processing methods using five different classification methods. Classification is done in a double cross-validation protocol using repeated random sampling to obtain an unbiased estimate of classification accuracy. No pre-processing method significantly outperforms the other for all peak detection settings evaluated. Conclusion We use classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Both pre-processing methods lead to similar classification results on an ovarian cancer and a Gaucher disease dataset. However, the settings for pre

  4. A Self-Organizing Map-Based Approach to Generating Reduced-Size, Statistically Similar Climate Datasets

    Cabell, R.; Delle Monache, L.; Alessandrini, S.; Rodriguez, L.

    2015-12-01

    Climate-based studies require large amounts of data in order to produce accurate and reliable results. Many of these studies have used 30-plus year data sets in order to produce stable and high-quality results, and as a result, many such data sets are available, generally in the form of global reanalyses. While the analysis of these data lead to high-fidelity results, its processing can be very computationally expensive. This computational burden prevents the utilization of these data sets for certain applications, e.g., when rapid response is needed in crisis management and disaster planning scenarios resulting from release of toxic material in the atmosphere. We have developed a methodology to reduce large climate datasets to more manageable sizes while retaining statistically similar results when used to produce ensembles of possible outcomes. We do this by employing a Self-Organizing Map (SOM) algorithm to analyze general patterns of meteorological fields over a regional domain of interest to produce a small set of "typical days" with which to generate the model ensemble. The SOM algorithm takes as input a set of vectors and generates a 2D map of representative vectors deemed most similar to the input set and to each other. Input predictors are selected that are correlated with the model output, which in our case is an Atmospheric Transport and Dispersion (T&D) model that is highly dependent on surface winds and boundary layer depth. To choose a subset of "typical days," each input day is assigned to its closest SOM map node vector and then ranked by distance. Each node vector is treated as a distribution and days are sampled from them by percentile. Using a 30-node SOM, with sampling every 20th percentile, we have been able to reduce 30 years of the Climate Forecast System Reanalysis (CFSR) data for the month of October to 150 "typical days." To estimate the skill of this approach, the "Measure of Effectiveness" (MOE) metric is used to compare area and overlap

  5. LRSim: A Linked-Reads Simulator Generating Insights for Better Genome Partitioning

    Ruibang Luo

    Full Text Available Linked-read sequencing, using highly-multiplexed genome partitioning and barcoding, can span hundreds of kilobases to improve de novo assembly, haplotype phasing, and other applications. Based on our analysis of 14 datasets, we introduce LRSim that simulates linked-reads by emulating the library preparation and sequencing process with fine control over variants, linked-read characteristics, and the short-read profile. We conclude from the phasing and assembly of multiple datasets, recommendations on coverage, fragment length, and partitioning when sequencing genomes of different sizes and complexities. These optimizations improve results by orders of magnitude, and enable the development of novel methods. LRSim is available at https://github.com/aquaskyline/LRSIM. Keywords: Linked-read, Molecular barcoding, Reads partitioning, Phasing, Reads simulation, Genome assembly, 10X Genomics

  6. Linking Substance Use and Problem Behavior across Three Generations

    Bailey, Jennifer A.; Hill, Karl G.; Oesterle, Sabrina; Hawkins, J. David

    2006-01-01

    This study examined patterns of between-generation continuity in substance use from generation 1 (G1) parents to generation 2 (G2) adolescents and from G2 adult substance use and G1 substance use to generation 3 (G3) problem behavior in childhood. Structural equation modeling of prospective, longitudinal data from 808 participants, their parents,…

  7. Automatic generation and simulation of urban building energy models based on city datasets for city-scale building retrofit analysis

    Chen, Yixing; Hong, Tianzhen; Piette, Mary Ann

    2017-01-01

    Highlights: •Developed methods and used data models to integrate city’s public building records. •Shading from neighborhood buildings strongly influences urban building performance. •A case study demonstrated the workflow, simulation and analysis of building retrofits. •CityBES retrofit analysis feature provides actionable information for decision making. •Discussed significance and challenges of urban building energy modeling. -- Abstract: Buildings in cities consume 30–70% of total primary energy, and improving building energy efficiency is one of the key strategies towards sustainable urbanization. Urban building energy models (UBEM) can support city managers to evaluate and prioritize energy conservation measures (ECMs) for investment and the design of incentive and rebate programs. This paper presents the retrofit analysis feature of City Building Energy Saver (CityBES) to automatically generate and simulate UBEM using EnergyPlus based on cities’ building datasets and user-selected ECMs. CityBES is a new open web-based tool to support city-scale building energy efficiency strategic plans and programs. The technical details of using CityBES for UBEM generation and simulation are introduced, including the workflow, key assumptions, and major databases. Also presented is a case study that analyzes the potential retrofit energy use and energy cost savings of five individual ECMs and two measure packages for 940 office and retail buildings in six city districts in northeast San Francisco, United States. The results show that: (1) all five measures together can save 23–38% of site energy per building; (2) replacing lighting with light-emitting diode lamps and adding air economizers to existing heating, ventilation and air-conditioning (HVAC) systems are most cost-effective with an average payback of 2.0 and 4.3 years, respectively; and (3) it is not economical to upgrade HVAC systems or replace windows in San Francisco due to the city’s mild

  8. The CCSDS Next Generation Space Data Link Protocol (NGSLP)

    Kazz, Greg J.; Greenberg, Edward

    2014-01-01

    The CCSDS space link protocols i.e., Telemetry (TM), Telecommand (TC), Advanced Orbiting Systems (AOS) were developed in the early growth period of the space program. They were designed to meet the needs of the early missions, be compatible with the available technology and focused on the specific link environments. Digital technology was in its infancy and spacecraft power and mass issues enforced severe constraints on flight implementations. Therefore the Telecommand protocol was designed around a simple Bose, Hocquenghem, Chaudhuri (BCH) code that provided little coding gain and limited error detection but was relatively simple to decode on board. The infusion of the concatenated Convolutional and Reed-Solomon codes5 for telemetry was a major milestone and transformed telemetry applications by providing them the ability to more efficiently utilize the telemetry link and its ability to deliver user data. The ability to significantly lower the error rates on the telemetry links enabled the use of packet telemetry and data compression. The infusion of the high performance codes for telemetry was enabled by the advent of digital processing, but it was limited to earth based systems supporting telemetry. The latest CCSDS space link protocol, Proximity-1 was developed in early 2000 to meet the needs of short-range, bi-directional, fixed or mobile radio links characterized by short time delays, moderate but not weak signals, and short independent sessions. Proximity-1 has been successfully deployed on both NASA and ESA missions at Mars and is planned to be utilized by all Mars missions in development. A new age has arisen, one that now provides the means to perform advanced digital processing in spacecraft systems enabling the use of improved transponders, digital correlators, and high performance forward error correcting codes for all communications links. Flight transponders utilizing digital technology have emerged and can efficiently provide the means to make the

  9. Generating and Executing Complex Natural Language Queries across Linked Data.

    Hamon, Thierry; Mougin, Fleur; Grabar, Natalia

    2015-01-01

    With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.

  10. Proteomics dataset

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    The datasets presented in this article are related to the research articles entitled “Neutrophil Extracellular Traps in Ulcerative Colitis: A Proteome Analysis of Intestinal Biopsies” (Bennike et al., 2015 [1]), and “Proteome Analysis of Rheumatoid Arthritis Gut Mucosa” (Bennike et al., 2017 [2])...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  11. Linking the generation of DNA adducts to lung cancer.

    Ceppi, Marcello; Munnia, Armelle; Cellai, Filippo; Bruzzone, Marco; Peluso, Marco E M

    2017-09-01

    Worldwide, lung cancer is the leading cause of cancer death. DNA adducts are considered a reliable biomarker that reflects carcinogen exposure to tobacco smoke, but the central question is what is the relationship of DNA adducts and cancer? Therefore, we investigated this relationship by a meta-analysis of twenty-two studies with bronchial adducts for a total of 1091 subjects, 887 lung cancer cases and 204 apparently healthy individuals with no evidence of lung cancer. Our study shows that these adducts are significantly associated to increase lung cancer risk. The value of Mean Ratio lung-cancer (MR) of bronchial adducts resulting from the random effects model was 2.64, 95% C.I. 2.00-3.50, in overall lung cancer cases as compared to controls. The significant difference, with lung cancer patients having significant higher levels of bronchial adducts than controls, persisted after stratification for smoking habits. The MR lung-cancer value between lung cancer patients and controls for smokers was 2.03, 95% C.I. 1.42-2.91, for ex-smokers 3.27, 95% C.I. 1.49-7.18, and for non-smokers was 3.81, 95% C.I. 1.85-7.85. Next, we found that the generation of bronchial adducts is significantly related to inhalation exposure to tobacco smoke carcinogens confirming its association with volatile carcinogens. The MR smoking estimate of bronchial adducts resulting from meta-regression was 2.28, 95% Confidence Interval (C.I.) 1.10-4.73, in overall smokers in respect to non-smokers. The present work provides strengthening of the hypothesis that bronchial adducts are not simply relate to exposure, but are a cause of chemical-induced lung cancer. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Modeling the video distribution link in the Next Generation Optical Access Networks

    Amaya, F.; Cárdenas, A.; Tafur Monroy, Idelfonso

    2011-01-01

    In this work we present a model for the design and optimization of the video distribution link in the next generation optical access network. We analyze the video distribution performance in a SCM-WDM link, including the noise, the distortion and the fiber optic nonlinearities. Additionally, we...... consider in the model the effect of distributed Raman amplification, used to extent the capacity and the reach of the optical link. In the model, we use the nonlinear Schrödinger equation with the purpose to obtain capacity limitations and design constrains of the next generation optical access networks....

  13. Modeling the video distribution link in the Next Generation Optical Access Networks

    Amaya, F; Cardenas, A; Tafur, I

    2011-01-01

    In this work we present a model for the design and optimization of the video distribution link in the next generation optical access network. We analyze the video distribution performance in a SCM-WDM link, including the noise, the distortion and the fiber optic nonlinearities. Additionally, we consider in the model the effect of distributed Raman amplification, used to extent the capacity and the reach of the optical link. In the model, we use the nonlinear Schroedinger equation with the purpose to obtain capacity limitations and design constrains of the next generation optical access networks.

  14. Proteomics dataset

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    patients (Morgan et al., 2012; Abraham and Medzhitov, 2011; Bennike, 2014) [8–10. Therefore, we characterized the proteome of colon mucosa biopsies from 10 inflammatory bowel disease ulcerative colitis (UC) patients, 11 gastrointestinal healthy rheumatoid arthritis (RA) patients, and 10 controls. We...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  15. Modification of input datasets for the Ensemble Streamflow Prediction based on large scale climatic indices and weather generator

    Šípek, Václav; Daňhelka, J.

    2015-01-01

    Roč. 528, September (2015), s. 720-733 ISSN 0022-1694 Institutional support: RVO:67985874 Keywords : seasonal forecasting * ESP * large-scale climate * weather generator Subject RIV: DA - Hydrology ; Limnology Impact factor: 3.043, year: 2015

  16. Modification of input datasets for the Ensemble Streamflow Prediction based on large scale climatic indices and weather generator

    Šípek, Václav; Daňhelka, J.

    2015-01-01

    Roč. 528, September (2015), s. 720-733 ISSN 0022-1694 Institutional support: RVO:67985874 Keywords : sea sonal forecasting * ESP * large-scale climate * weather generator Subject RIV: DA - Hydrology ; Limnology Impact factor: 3.043, year: 2015

  17. Editorial: Datasets for Learning Analytics

    Dietze, Stefan; George, Siemens; Davide, Taibi; Drachsler, Hendrik

    2018-01-01

    The European LinkedUp and LACE (Learning Analytics Community Exchange) project have been responsible for setting up a series of data challenges at the LAK conferences 2013 and 2014 around the LAK dataset. The LAK datasets consists of a rich collection of full text publications in the domain of

  18. Linked data management

    Hose, Katja; Schenkel, Ralf

    2014-01-01

    Linked Data Management presents techniques for querying and managing Linked Data that is available on today’s Web. The book shows how the abundance of Linked Data can serve as fertile ground for research and commercial applications. The text focuses on aspects of managing large-scale collections of Linked Data. It offers a detailed introduction to Linked Data and related standards, including the main principles distinguishing Linked Data from standard database technology. Chapters also describe how to generate links between datasets and explain the overall architecture of data integration systems based on Linked Data. A large part of the text is devoted to query processing in different setups. After presenting methods to publish relational data as Linked Data and efficient centralized processing, the book explores lookup-based, distributed, and parallel solutions. It then addresses advanced topics, such as reasoning, and discusses work related to read-write Linked Data for system interoperation. Desp...

  19. Assimilation and Health: Evidence From Linked Birth Records of Second- and Third-Generation Hispanics.

    Giuntella, Osea

    2016-12-01

    This study explores the effects of assimilation on the health of Hispanics in the United States, using ethnic intermarriage as a metric of acculturation. I exploit a unique data set of linked confidential use birth records in California and Florida from 1970-2009. The confidential data allow me to link mothers giving birth in 1989-2009 to their own birth certificate records in 1970-1985 and to identify second-generation siblings. Thus, I can analyze the relationship between the parental exogamy of second-generation Hispanic women and the birth outcomes of their offspring controlling for grandmother fixed effects as well as indicators for second generation's birth weight. Despite their higher socioeconomic status, third-generation children of second-generation intermarried Hispanic women are more likely to have poor health at birth, even after I account for second-generation health at birth and employ only within-family variations in the extent of assimilation. I find that a second-generation Hispanic woman married to a non-Hispanic man is 9 % more likely to have a child with low birth weight relative to a second-generation woman married to another Hispanic. These results largely reflect the higher incidence of risky behaviors (e.g., smoking during pregnancy) among intermarried Hispanic women.

  20. Generating a National Land Cover Dataset for Mexico at 30m Spatial Resolution in the Framework of the NALCMS Project.

    Llamas, R. M.; Colditz, R. R.; Ressl, R.; Jurado Cruz, D. A.; Argumedo, J.; Victoria, A.; Meneses, C.

    2017-12-01

    The North American Land Change Monitoring System (NALCMS) is a tri-national initiative for mapping land cover across Mexico, United States and Canada, integrating efforts of institutions from the three countries. At the continental scale the group released land cover and change maps derived from MODIS image mosaics at 250m spatial resolution for 2005 and 2010. Current efforts are based on 30m Landsat images for 2010 ± 1 year. Each country uses its own mapping approach and sources for ancillary data, while ensuring that maps are produced in a coherent fashion across the continent. This paper presents the methodology and final land cover map of Mexico for the year 2010 that was later integrated into a continental map. The principal input for Mexico was the Monitoring Activity Data for Mexico (MAD-MEX) land cover map (version 4.3), derived from all available mostly cloud-free images for the year 2010. A total of 35 classes were regrouped to 15 classes of the NALCMS legend present in Mexico. Next, various issues of the automatically generated MAD-MEX land cover mosaic were corrected, such as: filling areas of no data due no cloud-free observation or gaps in Landsat 7 ETM+ images, filling inland water bodies which were left unclassified due to masking issues, relabeling isolated unclassified of falsely classified pixels, structural mislabeling due to data gaps, reclassifying areas of adjacent scenes with significant class disagreements and correcting obvious misclassifications, mostly of water and urban areas. In a second step minor missing areas and rare class snow and ice were digitized and a road network was added. A product such as NALCMS land cover map at 30m for North America is an unprecedented effort and will be without doubt an important source of information for many users around the world who need coherent land cover data over a continental domain as an input for a wide variety of environmental studies. The product release to the general public is expected

  1. DC Link Current Estimation in Wind-Double Feed Induction Generator Power Conditioning System

    MARIAN GAICEANU

    2010-12-01

    Full Text Available In this paper the implementation of the DC link current estimator in power conditioning system of the variable speed wind turbine is shown. The wind turbine is connected to double feed induction generator (DFIG. The variable electrical energy parameters delivered by DFIG are fitted with the electrical grid parameters through back-to-back power converter. The bidirectional AC-AC power converter covers a wide speed range from subsynchronous to supersynchronous speeds. The modern control of back-to-back power converter involves power balance concept, therefore its load power should be known in any instant. By using the power balance control, the DC link voltage variation at the load changes can be reduced. In this paper the load power is estimated from the dc link, indirectly, through a second order DC link current estimator. The load current estimator is based on the DC link voltage and on the dc link input current of the rotor side converter. This method presents certain advantages instead of using measured method, which requires a low pass filter: no time delay, the feedforward current component has no ripple, no additional hardware, and more fast control response. Through the numerical simulation the performances of the proposed DC link output current estimator scheme are demonstrated.

  2. Proportional-Type Performance Recovery DC-Link Voltage Tracking Algorithm for Permanent Magnet Synchronous Generators

    Seok-Kyoon Kim

    2017-09-01

    Full Text Available This study proposes a disturbance observer-based proportional-type DC-link voltage tracking algorithm for permanent magnet synchronous generators (PMSGs. The proposed technique feedbacks the only proportional term of the tracking errors, and it contains the nominal static and dynamic feed-forward compensators coming from the first-order disturbance observers. It is rigorously proved that the proposed method ensures the performance recovery and offset-free properties without the use of the integrators of the tracking errors. A wind power generation system has been simulated to verify the efficacy of the proposed method using the PSIM (PowerSIM software with the DLL (Dynamic Link Library block.

  3. Geothermal electric power generation in Iceland for the proposed Iceland/United Kingdom HVDC power link

    Hammons, T.J.; Palmason, G.; Thorhallsson, S.

    1991-01-01

    The paper reviews geothermal electric power potential in Iceland which could economically be developed to supplement hydro power for the proposed HVDC Power Link to the United Kingdom, and power intensive industries in Iceland, which are envisaged for development at this time. Technically harnessable energy for electricity generation taking account of geothermal resources down to an assumed base depth, temperature distribution in the crust, probable geothermal recovery factor, and accessibility of the field, has been assessed. Nineteen known high-temperature fields and 9 probable fields have been identified. Technically harnessable geo-heat for various areas is indicated. Data on high temperature fields suitable for geothermal electric power generation, and on harnessable energy for electric power generation within volcanic zones, is stated, and overall assessments are made. The paper then reviews how the potential might be developed, discussing preference of possible sites, and cost of the developments at todays prices. Cost of geothermal electric power generation with comparative costs for hydro generation are given. Possible transmission system developments to feed the power to the proposed HVDC Link converter stations are also discussed

  4. A BAC-bacterial recombination method to generate physically linked multiple gene reporter DNA constructs

    Gong Shiaochin

    2009-03-01

    Full Text Available Abstract Background Reporter gene mice are valuable animal models for biological research providing a gene expression readout that can contribute to cellular characterization within the context of a developmental process. With the advancement of bacterial recombination techniques to engineer reporter gene constructs from BAC genomic clones and the generation of optically distinguishable fluorescent protein reporter genes, there is an unprecedented capability to engineer more informative transgenic reporter mouse models relative to what has been traditionally available. Results We demonstrate here our first effort on the development of a three stage bacterial recombination strategy to physically link multiple genes together with their respective fluorescent protein (FP reporters in one DNA fragment. This strategy uses bacterial recombination techniques to: (1 subclone genes of interest into BAC linking vectors, (2 insert desired reporter genes into respective genes and (3 link different gene-reporters together. As proof of concept, we have generated a single DNA fragment containing the genes Trap, Dmp1, and Ibsp driving the expression of ECFP, mCherry, and Topaz FP reporter genes, respectively. Using this DNA construct, we have successfully generated transgenic reporter mice that retain two to three gene readouts. Conclusion The three stage methodology to link multiple genes with their respective fluorescent protein reporter works with reasonable efficiency. Moreover, gene linkage allows for their common chromosomal integration into a single locus. However, the testing of this multi-reporter DNA construct by transgenesis does suggest that the linkage of two different genes together, despite their large size, can still create a positional effect. We believe that gene choice, genomic DNA fragment size and the presence of endogenous insulator elements are critical variables.

  5. A BAC-bacterial recombination method to generate physically linked multiple gene reporter DNA constructs.

    Maye, Peter; Stover, Mary Louise; Liu, Yaling; Rowe, David W; Gong, Shiaochin; Lichtler, Alexander C

    2009-03-13

    Reporter gene mice are valuable animal models for biological research providing a gene expression readout that can contribute to cellular characterization within the context of a developmental process. With the advancement of bacterial recombination techniques to engineer reporter gene constructs from BAC genomic clones and the generation of optically distinguishable fluorescent protein reporter genes, there is an unprecedented capability to engineer more informative transgenic reporter mouse models relative to what has been traditionally available. We demonstrate here our first effort on the development of a three stage bacterial recombination strategy to physically link multiple genes together with their respective fluorescent protein (FP) reporters in one DNA fragment. This strategy uses bacterial recombination techniques to: (1) subclone genes of interest into BAC linking vectors, (2) insert desired reporter genes into respective genes and (3) link different gene-reporters together. As proof of concept, we have generated a single DNA fragment containing the genes Trap, Dmp1, and Ibsp driving the expression of ECFP, mCherry, and Topaz FP reporter genes, respectively. Using this DNA construct, we have successfully generated transgenic reporter mice that retain two to three gene readouts. The three stage methodology to link multiple genes with their respective fluorescent protein reporter works with reasonable efficiency. Moreover, gene linkage allows for their common chromosomal integration into a single locus. However, the testing of this multi-reporter DNA construct by transgenesis does suggest that the linkage of two different genes together, despite their large size, can still create a positional effect. We believe that gene choice, genomic DNA fragment size and the presence of endogenous insulator elements are critical variables.

  6. Generation of induced pluripotent stem cells from a patient with X-linked juvenile retinoschisis

    Chi-Hsien Peng

    2018-05-01

    Full Text Available X-linked juvenile retinoschisis (XLRS is a hereditary retinal dystrophy manifested as splitting of anatomical layers of retina. In this report, we generated a patient-specific induced pluripotent stem cell (iPSC line, TVGH-iPSC-013-05, from the peripheral blood mononuclear cells of a male patient with XLRS by using the Sendai-virus delivery system. We believe that XLRS patient-specific iPSCs provide a powerful in vitro model for evaluating the pathological phenotypes of the disease.

  7. Distribution System Augmented by DC Links for Increasing the Hosting Capacity of PV Generation

    Chaudhary, Sanjay; Demirok, Erhan; Teodorescu, Remus

    2012-01-01

    This paper presents a concept of enhancing the photovoltaic (PV) power generation hosting capacity of distribution networks. Distribution network serving electrical energy to farm settlements was selected as an example for their large roof area available for PV installation. Further, they are cha......This paper presents a concept of enhancing the photovoltaic (PV) power generation hosting capacity of distribution networks. Distribution network serving electrical energy to farm settlements was selected as an example for their large roof area available for PV installation. Further......, they are characterized by long radial feeders. Such feeders suffer from voltage rise and transformer overloading problems as the total number and capacity of the PV installations increase. The distribution network can be augmented by dc distribution links with power electronic converter interfaces to the traditional ac...... distribution systems. It is shown here that the dc links can be used to interconnect the different radial feeders and the excess power thus could be transferred to the nearby industrial load-center....

  8. Probing Rubber Cross-Linking Generation of Industrial Polymer Networks at Nanometer Scale.

    Gabrielle, Brice; Gomez, Emmanuel; Korb, Jean-Pierre

    2016-06-23

    We present improved analyses of rheometric torque measurements as well as (1)H double-quantum (DQ) nuclear magnetic resonance (NMR) buildup data on polymer networks of industrial compounds. This latter DQ NMR analysis allows finding the distribution of an orientation order parameter (Dres) resulting from the noncomplete averaging of proton dipole-dipole couplings within the cross-linked polymer chains. We investigate the influence of the formulation (filler and vulcanization systems) as well as the process (curing temperature) ending to the final polymer network. We show that DQ NMR follows the generation of the polymer network during the vulcanization process from a heterogeneous network to a very homogeneous one. The time variations of microscopic Dres and macroscopic rheometric torques present power-law behaviors above a threshold time scale with characteristic exponents of the percolation theory. We observe also a very good linear correlation between the kinetics of Dres and rheometric data routinely performed in industry. All these observations confirm the description of the polymer network generation as a critical phenomenon. On the basis of all these results, we believe that DQ NMR could become a valuable tool for investigating in situ the cross-linking of industrial polymer networks at the nanometer scale.

  9. The UK waste input-output table: Linking waste generation to the UK economy.

    Salemdeeb, Ramy; Al-Tabbaa, Abir; Reynolds, Christian

    2016-10-01

    In order to achieve a circular economy, there must be a greater understanding of the links between economic activity and waste generation. This study introduces the first version of the UK waste input-output table that could be used to quantify both direct and indirect waste arisings across the supply chain. The proposed waste input-output table features 21 industrial sectors and 34 waste types and is for the 2010 time-period. Using the waste input-output table, the study results quantitatively confirm that sectors with a long supply chain (i.e. manufacturing and services sectors) have higher indirect waste generation rates compared with industrial primary sectors (e.g. mining and quarrying) and sectors with a shorter supply chain (e.g. construction). Results also reveal that the construction, mining and quarrying sectors have the highest waste generation rates, 742 and 694 tonne per £1m of final demand, respectively. Owing to the aggregated format of the first version of the waste input-output, the model does not address the relationship between waste generation and recycling activities. Therefore, an updated version of the waste input-output table is expected be developed considering this issue. Consequently, the expanded model would lead to a better understanding of waste and resource flows in the supply chain. © The Author(s) 2016.

  10. Exploring the roles of cannot-link constraint in community detection via Multi-variance Mixed Gaussian Generative Model

    Ge, Meng; Jin, Di; He, Dongxiao; Fu, Huazhu; Wang, Jing; Cao, Xiaochun

    2017-01-01

    Due to the demand for performance improvement and the existence of prior information, semi-supervised community detection with pairwise constraints becomes a hot topic. Most existing methods have been successfully encoding the must-link constraints, but neglect the opposite ones, i.e., the cannot-link constraints, which can force the exclusion between nodes. In this paper, we are interested in understanding the role of cannot-link constraints and effectively encoding pairwise constraints. Towards these goals, we define an integral generative process jointly considering the network topology, must-link and cannot-link constraints. We propose to characterize this process as a Multi-variance Mixed Gaussian Generative (MMGG) Model to address diverse degrees of confidences that exist in network topology and pairwise constraints and formulate it as a weighted nonnegative matrix factorization problem. The experiments on artificial and real-world networks not only illustrate the superiority of our proposed MMGG, but also, most importantly, reveal the roles of pairwise constraints. That is, though the must-link is more important than cannot-link when either of them is available, both must-link and cannot-link are equally important when both of them are available. To the best of our knowledge, this is the first work on discovering and exploring the importance of cannot-link constraints in semi-supervised community detection. PMID:28678864

  11. A link between thrifty phenotype and maternal care across two generations of intercrossed mice.

    Bruno Sauce

    Full Text Available Maternal effects are causal influences from mother to offspring beyond genetic information, and have lifelong consequences for multiple traits. Previously, we reported that mice whose mothers did not nurse properly had low birth weight followed by rapid fat accumulation and disturbed development of some organs. That pattern resembles metabolic syndromes known collectively as the thrifty phenotype, which is believed to be an adaptation to a stressful environment which prepares offspring for reduced nutrient supply. The potential link between maternal care, stress reactivity, and the thrifty phenotype, however, has been poorly explored in the human and animal literature: only a couple of studies even mention (much less, test these concepts under a cohesive framework. Here, we explored this link using mice of the parental inbred strains SM/J and LG/J-who differ dramatically in their maternal care-and the intercrossed generations F1 and F2. We measured individual differences in 15 phenotypes and used structural equation modeling to test our hypotheses. We found a remarkable relationship between thrifty phenotype and lower quality of maternal behaviors, including nest building, pup retrieval, grooming/licking, and nursing. To our knowledge, this is the first study to show, in any mammal, a clear connection between the natural variation in thrifty phenotype and maternal care. Both traits in the mother also had a substantial effect on survival rate in the F3 offspring. To our surprise, however, stress reactivity seemed to play no role in our models. Furthermore, the strain of maternal grandmother, but not of paternal grandmother, affected the variation of maternal care in F2 mice, and this effect was mediated by thrifty phenotype in F2. Since F1 animals were all genetically identical, this finding suggests that maternal effects pass down both maternal care and thrifty phenotype in these mice across generations via epigenetic transmission.

  12. EPA Nanorelease Dataset

    U.S. Environmental Protection Agency — EPA Nanorelease Dataset. This dataset is associated with the following publication: Wohlleben, W., C. Kingston, J. Carter, E. Sahle-Demessie, S. Vazquez-Campos, B....

  13. Querying Large Biological Network Datasets

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  14. The Harvard organic photovoltaic dataset.

    Lopez, Steven A; Pyzer-Knapp, Edward O; Simm, Gregor N; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-09-27

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

  15. The Harvard organic photovoltaic dataset

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-01-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312

  16. Shame and Alienation Related to Child Maltreatment: Links to Symptoms Across Generations.

    Babcock Fenerci, Rebecca L; DePrince, Anne P

    2017-11-20

    The current study investigated associations between appraisals of shame and alienation related to mothers' own experiences of child maltreatment and symptoms across generations-in mothers themselves as well as their toddler/preschool-aged children. Mothers who survived maltreatment (N = 113) with a child between the ages of 2 and 5 were recruited to participate in an online study on Maternal Coping, Attachment and Health. Mother participants completed a series of questionnaires, including those that asked about posttrauma appraisals of their own maltreatment experiences as well as their child's and their own mental health symptoms. When taking into account other posttrauma appraisals (e.g., fear, betrayal, anger, self-blame), maternal shame and alienation were both significantly associated with maternal trauma-related distress (a composite of anxiety, PTSD, dissociation, and depressive symptoms). Maternal shame was also significantly linked to child internalizing symptoms and externalizing symptoms. Lower levels of fear and higher levels of betrayal were associated with externalizing symptoms as well. Maternal trauma-related distress mediated the relationship between maternal shame and child externalizing symptoms, and partially mediated the relationship between shame and internalizing symptoms. This study is the first of its kind to examine the role of posttrauma appraisals among mother survivors of maltreatment as they relate to symptoms in their young children. Although additional research is necessary, findings suggest that mothers' posttrauma appraisals, such as shame, could be a relevant factor in the early social-emotional development of survivors' children. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  17. An Improved Control Strategy of Limiting the DC-Link Voltage Fluctuation for a Doubly Fed Induction Wind Generator

    Yao, J.; Li, H.; Liao, Y.

    2008-01-01

    The paper presents to develop a new control strategy of limiting the dc-link voltage fluctuation for a back-to-back pulsewidth modulation converter in a doubly fed induction generator (DFIG) for wind turbine systems. The reasons of dc-link voltage fluctuation are analyzed. An improved control...... strategy with the instantaneous rotor power feedback is proposed to limit the fluctuation range of the dc-link voltage. An experimental rig is set up to valid the proposed strategy, and the dynamic performances of the DFIG are compared with the traditional control method under a constant grid voltage....... Furthermore, the capabilities of keeping the dc-link voltage stable are also compared in the ride-through control of DFIG during a three-phase grid fault, by using a developed 2 MW DFIG wind power system model. Both the experimental and simulation results have shown that the proposed control strategy is more...

  18. ASSISTments Dataset from Multiple Randomized Controlled Experiments

    Selent, Douglas; Patikorn, Thanaporn; Heffernan, Neil

    2016-01-01

    In this paper, we present a dataset consisting of data generated from 22 previously and currently running randomized controlled experiments inside the ASSISTments online learning platform. This dataset provides data mining opportunities for researchers to analyze ASSISTments data in a convenient format across multiple experiments at the same time.…

  19. Synthetic and Empirical Capsicum Annuum Image Dataset

    Barth, R.

    2016-01-01

    This dataset consists of per-pixel annotated synthetic (10500) and empirical images (50) of Capsicum annuum, also known as sweet or bell pepper, situated in a commercial greenhouse. Furthermore, the source models to generate the synthetic images are included. The aim of the datasets are to

  20. SIMADL: Simulated Activities of Daily Living Dataset

    Talal Alshammari

    2018-04-01

    Full Text Available With the realisation of the Internet of Things (IoT paradigm, the analysis of the Activities of Daily Living (ADLs, in a smart home environment, is becoming an active research domain. The existence of representative datasets is a key requirement to advance the research in smart home design. Such datasets are an integral part of the visualisation of new smart home concepts as well as the validation and evaluation of emerging machine learning models. Machine learning techniques that can learn ADLs from sensor readings are used to classify, predict and detect anomalous patterns. Such techniques require data that represent relevant smart home scenarios, for training, testing and validation. However, the development of such machine learning techniques is limited by the lack of real smart home datasets, due to the excessive cost of building real smart homes. This paper provides two datasets for classification and anomaly detection. The datasets are generated using OpenSHS, (Open Smart Home Simulator, which is a simulation software for dataset generation. OpenSHS records the daily activities of a participant within a virtual environment. Seven participants simulated their ADLs for different contexts, e.g., weekdays, weekends, mornings and evenings. Eighty-four files in total were generated, representing approximately 63 days worth of activities. Forty-two files of classification of ADLs were simulated in the classification dataset and the other forty-two files are for anomaly detection problems in which anomalous patterns were simulated and injected into the anomaly detection dataset.

  1. Deploying Linked Open Vocabulary (LOV to Enhance Library Linked Data

    Oh, Sam Gyun

    2015-06-01

    Full Text Available Since the advent of Linked Data (LD as a method for building webs of data, there have been many attempts to apply and implement LD in various settings. Efforts have been made to convert bibliographic data in libraries into Linked Data, thereby generating Library Linked Data (LLD. However, when memory institutions have tried to link their data with external sources based on principles suggested by Tim Berners-Lee, identifying appropriate vocabularies for use in describing their bibliographic data has proved challenging. The objective of this paper is to discuss the potential role of Linked Open Vocabularies (LOV in providing better access to various open datasets and facilitating effective linking. The paper will also examine the ways in which memory institutions can utilize LOV to enhance the quality of LLD and LLD-based ontology design.

  2. Oxidation of myosin by haem proteins generates myosin radicals and protein cross-links

    Lametsch, Marianne Lund; Luxford, Catherine; Skibsted, Leif Horsfelt

    2008-01-01

    of thiyl and tyrosyl radicals is consistent with the observed consumption of cysteine and tyrosine residues, the detection of di-tyrosine by HPLC and the detection of both reducible (disulfide bond) and non-reducible cross-links between myosin molecules by SDS/PAGE. The time course of radical formation...

  3. The Colibactin Genotoxin Generates DNA Interstrand Cross-Links in Infected Cells

    Nadège Bossuet-Greif

    2018-03-01

    Full Text Available Colibactins are hybrid polyketide-nonribosomal peptides produced by Escherichia coli, Klebsiella pneumoniae, and other Enterobacteriaceae harboring the pks genomic island. These genotoxic metabolites are produced by pks-encoded peptide-polyketide synthases as inactive prodrugs called precolibactins, which are then converted to colibactins by deacylation for DNA-damaging effects. Colibactins are bona fide virulence factors and are suspected of promoting colorectal carcinogenesis when produced by intestinal E. coli. Natural active colibactins have not been isolated, and how they induce DNA damage in the eukaryotic host cell is poorly characterized. Here, we show that DNA strands are cross-linked covalently when exposed to enterobacteria producing colibactins. DNA cross-linking is abrogated in a clbP mutant unable to deacetylate precolibactins or by adding the colibactin self-resistance protein ClbS, confirming the involvement of the mature forms of colibactins. A similar DNA-damaging mechanism is observed in cellulo, where interstrand cross-links are detected in the genomic DNA of cultured human cells exposed to colibactin-producing bacteria. The intoxicated cells exhibit replication stress, activation of ataxia-telangiectasia and Rad3-related kinase (ATR, and recruitment of the DNA cross-link repair Fanconi anemia protein D2 (FANCD2 protein. In contrast, inhibition of ATR or knockdown of FANCD2 reduces the survival of cells exposed to colibactin-producing bacteria. These findings demonstrate that DNA interstrand cross-linking is the critical mechanism of colibactin-induced DNA damage in infected cells.

  4. Aaron Journal article datasets

    U.S. Environmental Protection Agency — All figures used in the journal article are in netCDF format. This dataset is associated with the following publication: Sims, A., K. Alapaty , and S. Raman....

  5. Integrated Surface Dataset (Global)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Integrated Surface (ISD) Dataset (ISD) is composed of worldwide surface weather observations from over 35,000 stations, though the best spatial coverage is...

  6. Control Measure Dataset

    U.S. Environmental Protection Agency — The EPA Control Measure Dataset is a collection of documents describing air pollution control available to regulated facilities for the control and abatement of air...

  7. National Hydrography Dataset (NHD)

    Kansas Data Access and Support Center — The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that comprise the...

  8. Market Squid Ecology Dataset

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains ecological information collected on the major adult spawning and juvenile habitats of market squid off California and the US Pacific Northwest....

  9. Tables and figure datasets

    U.S. Environmental Protection Agency — Soil and air concentrations of asbestos in Sumas study. This dataset is associated with the following publication: Wroble, J., T. Frederick, A. Frame, and D....

  10. Design of a DC-AC Link Converter for 500W Residential Wind Generator

    Riza Muhida

    2012-12-01

    Full Text Available  As one of alternative sources of renewable energy, wind energy has an excellence prospect in Indonesia, particularly in coastal and hilly areas which have potential wind to generate electricity for residential uses. There is urgent need to locally develop low cost inverter of wind generator system for residential use. Recent developments in power electronic converters and embedded computing allow improvement of power electronic converter devices that enable integration of microcontrollers in its design. In this project, an inverter circuit with suitable control scheme design was developed. The circuit was to be used with a selected topology of Wind Energy Conversion System (WECS to convert electricity generated by a 500W direct-drive permanent magnet type wind generator which is typical for residential use. From single phase AC output of the generator, a rectifier circuit is designed to convert AC to DC voltage. Then a DC-DC boost converter is used to step up the voltage to a nominal DC voltage suitable for domestic use. The proposed inverter then will convert the DC voltage to sinusoidal AC. The duty cycle of sinusoidal Pulse-Width Modulated (SPWM signal controlling switches in the inverter was generated by a microcontroller. The lab-scale experimental rig involves simulation of wind generator by running a geared DC motor coupled with 500W wind generator where the prototype circuit was connected at the generator output. The experimental circuit produced single phase 240V sinusoidal AC voltage with frequency of 50Hz. Measured total harmonics distortion (THD of the voltage across load was 4.0% which is within the limit of 5% as recommended by IEEE Standard 519-1992.

  11. Strontium removal jar test dataset for all figures and tables.

    U.S. Environmental Protection Agency — The datasets where used to generate data to demonstrate strontium removal under various water quality and treatment conditions. This dataset is associated with the...

  12. Viking Seismometer PDS Archive Dataset

    Lorenz, R. D.

    2016-12-01

    The Viking Lander 2 seismometer operated successfully for over 500 Sols on the Martian surface, recording at least one likely candidate Marsquake. The Viking mission, in an era when data handling hardware (both on board and on the ground) was limited in capability, predated modern planetary data archiving, and ad-hoc repositories of the data, and the very low-level record at NSSDC, were neither convenient to process nor well-known. In an effort supported by the NASA Mars Data Analysis Program, we have converted the bulk of the Viking dataset (namely the 49,000 and 270,000 records made in High- and Event- modes at 20 and 1 Hz respectively) into a simple ASCII table format. Additionally, since wind-generated lander motion is a major component of the signal, contemporaneous meteorological data are included in summary records to facilitate correlation. These datasets are being archived at the PDS Geosciences Node. In addition to brief instrument and dataset descriptions, the archive includes code snippets in the freely-available language 'R' to demonstrate plotting and analysis. Further, we present examples of lander-generated noise, associated with the sampler arm, instrument dumps and other mechanical operations.

  13. A comparison of electrical and photonic pulse generation for IR-UWB on fiber links

    Rodes Lopez, Roberto; Caballero Jambrina, Antonio; Yu, Xianbin

    2010-01-01

    We present and compare experimental results for electrical and photonic generation of 2-Gb/s pulses for impulse radio ultra-wideband on fiber transmission systems based on direct current modulation of a semiconductor laser diode and external optical injection of a semiconductor laser diode......, respectively. We assess the performance of the two generation approaches in terms of bit-error rate after propagation over 20 km of optical fiber followed by wireless transmission....

  14. Design of a DC-AC Link Converter for 500W Residential Wind Generator

    Riza Muhida; Ahmad Firdaus A. Zaidi; Afzeri Tamsir; Rudi Irawan

    2012-01-01

     As one of alternative sources of renewable energy, wind energy has an excellence prospect in Indonesia, particularly in coastal and hilly areas which have potential wind to generate electricity for residential uses. There is urgent need to locally develop low cost inverter of wind generator system for residential use. Recent developments in power electronic converters and embedded computing allow improvement of power electronic converter devices that enable integration of microcontrollers in...

  15. Comparing vector-based and Bayesian memory models using large-scale datasets: User-generated hashtag and tag prediction on Twitter and Stack Overflow.

    Stanley, Clayton; Byrne, Michael D

    2016-12-01

    The growth of social media and user-created content on online sites provides unique opportunities to study models of human declarative memory. By framing the task of choosing a hashtag for a tweet and tagging a post on Stack Overflow as a declarative memory retrieval problem, 2 cognitively plausible declarative memory models were applied to millions of posts and tweets and evaluated on how accurately they predict a user's chosen tags. An ACT-R based Bayesian model and a random permutation vector-based model were tested on the large data sets. The results show that past user behavior of tag use is a strong predictor of future behavior. Furthermore, past behavior was successfully incorporated into the random permutation model that previously used only context. Also, ACT-R's attentional weight term was linked to an entropy-weighting natural language processing method used to attenuate high-frequency words (e.g., articles and prepositions). Word order was not found to be a strong predictor of tag use, and the random permutation model performed comparably to the Bayesian model without including word order. This shows that the strength of the random permutation model is not in the ability to represent word order, but rather in the way in which context information is successfully compressed. The results of the large-scale exploration show how the architecture of the 2 memory models can be modified to significantly improve accuracy, and may suggest task-independent general modifications that can help improve model fit to human data in a much wider range of domains. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  16. Anticipation of the needs linked with new generation reactors: COGEMA Logistics casks developments

    Issard, H.; Grygiel, J.M. [COGEMA LOGISTICS SA, 1 rue des Herons BP 302, Montigny-le-Bretonneux, 78054 Saint Quentin en Yvelines (France)

    2006-07-01

    assigning the level of performance of the cask to be used. To accompany the 'nuclear renaissance' and the related fuel design and fuel management evolution, it is crucial to anticipate the associated needs in terms of cask development notably for the spent fuel management. It is the reason why an ambitious R and D program has been set up at COGEMA Logistics. It aims at proposing innovative solutions oriented by the trends guiding this 'renaissance': the proposed systems have indeed to accommodate Spent Fuel Assemblies (SFAs) characterized by ever increasing burn-ups, fissile isotopes contents, and total inventory. Flexibility may potentially mean quick evacuation of UO{sub 2} or MOX spent fuel with high thermal power to be dealt with. As described in the present paper, these evolutions directly guide the R and D actions on thermal and structural analysis, criticality and containment. The approach shall also include predictable licensing processes in an ever-demanding regulatory environment. The paper has the following structure: I. Evolutionary casks for evolving needs; II COGEMA Logistics R and D on evolutionary packagings; 1- The objectives of COGEMA Logistics R and D on evolutionary packagings; 2- Which R and D actions are currently implemented to anticipate evolving needs?; 2-1 Building an R and D program; 2-2 High performance design solutions for subcriticality; 2-3 Solutions for thermal and structural management; 2-4 Solution for enhanced shielding design; 2-5 Solutions for double containment systems; 2-6 Solutions for shock absorbers; 3- Examples of packaging developments; III Conclusion. To summarize, COGEMA Logistics is actively involved in Research and Development to accompany the improvement brought by Areva in fuel design and management linked either to extensive programs of nuclear plant life extension or to new constructions, such as Areva's European Pressurized Reactor. These evolutions coupled with needs for ever higher flexibility

  17. Anticipation of the needs linked with new generation reactors: COGEMA Logistics casks developments

    Issard, H.; Grygiel, J.M.

    2006-01-01

    to be used. To accompany the 'nuclear renaissance' and the related fuel design and fuel management evolution, it is crucial to anticipate the associated needs in terms of cask development notably for the spent fuel management. It is the reason why an ambitious R and D program has been set up at COGEMA Logistics. It aims at proposing innovative solutions oriented by the trends guiding this 'renaissance': the proposed systems have indeed to accommodate Spent Fuel Assemblies (SFAs) characterized by ever increasing burn-ups, fissile isotopes contents, and total inventory. Flexibility may potentially mean quick evacuation of UO 2 or MOX spent fuel with high thermal power to be dealt with. As described in the present paper, these evolutions directly guide the R and D actions on thermal and structural analysis, criticality and containment. The approach shall also include predictable licensing processes in an ever-demanding regulatory environment. The paper has the following structure: I. Evolutionary casks for evolving needs; II COGEMA Logistics R and D on evolutionary packagings; 1- The objectives of COGEMA Logistics R and D on evolutionary packagings; 2- Which R and D actions are currently implemented to anticipate evolving needs?; 2-1 Building an R and D program; 2-2 High performance design solutions for subcriticality; 2-3 Solutions for thermal and structural management; 2-4 Solution for enhanced shielding design; 2-5 Solutions for double containment systems; 2-6 Solutions for shock absorbers; 3- Examples of packaging developments; III Conclusion. To summarize, COGEMA Logistics is actively involved in Research and Development to accompany the improvement brought by Areva in fuel design and management linked either to extensive programs of nuclear plant life extension or to new constructions, such as Areva's European Pressurized Reactor. These evolutions coupled with needs for ever higher flexibility in terms of spent fuel management clearly guide the packaging

  18. Integration of spectral domain optical coherence tomography with microperimetry generates unique datasets for the simultaneous identification of visual function and retinal structure in ophthalmological applications

    Koulen, Peter; Gallimore, Gary; Vincent, Ryan D.; Sabates, Nelson R.; Sabates, Felix N.

    2011-06-01

    Conventional perimeters are used routinely in various eye disease states to evaluate the central visual field and to quantitatively map sensitivity. However, standard automated perimetry proves difficult for retina and specifically macular disease due to the need for central and steady fixation. Advances in instrumentation have led to microperimetry, which incorporates eye tracking for placement of macular sensitivity values onto an image of the macular fundus thus enabling a precise functional and anatomical mapping of the central visual field. Functional sensitivity of the retina can be compared with the observed structural parameters that are acquired with high-resolution spectral domain optical coherence tomography and by integration of scanning laser ophthalmoscope-driven imaging. Findings of the present study generate a basis for age-matched comparison of sensitivity values in patients with macular pathology. Microperimetry registered with detailed structural data performed before and after intervention treatments provides valuable information about macular function, disease progression and treatment success. This approach also allows for the detection of disease or treatment related changes in retinal sensitivity when visual acuity is not affected and can drive the decision making process in choosing different treatment regimens and guiding visual rehabilitation. This has immediate relevance for applications in central retinal vein occlusion, central serous choroidopathy, age-related macular degeneration, familial macular dystrophy and several other forms of retina related visual disability.

  19. Information Transfer between Generations Linked to Biodiversity in Rock-Paper-Scissors Games

    Ranjan Bose

    2015-01-01

    Full Text Available Ecological processes, such as reproduction, mobility, and interaction between species, play important roles in the maintenance of biodiversity. Classically, the cyclic dominance of species has been modelled using the nonhierarchical interactions among competing species, represented by the “Rock-Paper-Scissors” (RPS game. Here we propose a cascaded channel model for analyzing the existence of biodiversity in the RPS game. The transition between successive generations is modelled as communication of information over a noisy communication channel. The rate of transfer of information over successive generations is studied using mutual information and it is found that “greedy” information transfer between successive generations may lead to conditions for extinction. This generalized framework can be used to study biodiversity in any number of interacting species, ecosystems with unequal rates for different species, and also competitive networks.

  20. Generation Me in the spotlight : Linking reality TV to materialism, entitlement, and narcissism

    Opree, S.J.; Kühne, R.

    2016-01-01

    Today’s youth, the Generation Me, is deemed materialistic, entitled, and narcissistic. Individuality has become an important value in child-rearing and is cultivated in the media—especially within the reality TV genre. The aim of this study was to investigate whether adolescents’ and emerging

  1. P-Link: A method for generating multicomponent cytochrome P450 fusions with variable linker length

    Belsare, Ketaki D.; Ruff, Anna Joelle; Martinez, Ronny

    2014-01-01

    Fusion protein construction is a widely employed biochemical technique, especially when it comes to multi-component enzymes such as cytochrome P450s. Here we describe a novel method for generating fusion proteins with variable linker lengths, protein fusion with variable linker insertion (P...

  2. Isfahan MISP Dataset.

    Kashefpur, Masoud; Kafieh, Rahele; Jorjandi, Sahar; Golmohammadi, Hadis; Khodabande, Zahra; Abbasi, Mohammadreza; Teifuri, Nilufar; Fakharzadeh, Ali Akbar; Kashefpoor, Maryam; Rabbani, Hossein

    2017-01-01

    An online depository was introduced to share clinical ground truth with the public and provide open access for researchers to evaluate their computer-aided algorithms. PHP was used for web programming and MySQL for database managing. The website was entitled "biosigdata.com." It was a fast, secure, and easy-to-use online database for medical signals and images. Freely registered users could download the datasets and could also share their own supplementary materials while maintaining their privacies (citation and fee). Commenting was also available for all datasets, and automatic sitemap and semi-automatic SEO indexing have been set for the site. A comprehensive list of available websites for medical datasets is also presented as a Supplementary (http://journalonweb.com/tempaccess/4800.584.JMSS_55_16I3253.pdf).

  3. Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions

    Argout, Xavier; Fouet, Olivier; Wincker, Patrick; Gramacho, Karina; Legavre, Thierry; Sabau, Xavier; Risterucci, Ange Marie; Da Silva, Corinne; Cascardo, Julio; Allegre, Mathilde; Kuhn, David; Verica, Joseph; Courtois, Brigitte; Loor, Gaston; Babin, Regis; Sounigo, Olivier; Ducamp, Michel; Guiltinan, Mark J; Ruiz, Manuel; Alemanno, Laurence; Machado, Regina; Phillips, Wilberth; Schnell, Ray; Gilmour, Martin; Rosenquist, Eric; Butler, David; Maximova, Siela; Lanaud, Claire

    2008-01-01

    Background Theobroma cacao L., is a tree originated from the tropical rainforest of South America. It is one of the major cash crops for many tropical countries. T. cacao is mainly produced on smallholdings, providing resources for 14 million farmers. Disease resistance and T. cacao quality improvement are two important challenges for all actors of cocoa and chocolate production. T. cacao is seriously affected by pests and fungal diseases, responsible for more than 40% yield losses and quality improvement, nutritional and organoleptic, is also important for consumers. An international collaboration was formed to develop an EST genomic resource database for cacao. Results Fifty-six cDNA libraries were constructed from different organs, different genotypes and different environmental conditions. A total of 149,650 valid EST sequences were generated corresponding to 48,594 unigenes, 12,692 contigs and 35,902 singletons. A total of 29,849 unigenes shared significant homology with public sequences from other species. Gene Ontology (GO) annotation was applied to distribute the ESTs among the main GO categories. A specific information system (ESTtik) was constructed to process, store and manage this EST collection allowing the user to query a database. To check the representativeness of our EST collection, we looked for the genes known to be involved in two different metabolic pathways extensively studied in other plant species and important for T. cacao qualities: the flavonoid and the terpene pathways. Most of the enzymes described in other crops for these two metabolic pathways were found in our EST collection. A large collection of new genetic markers was provided by this ESTs collection. Conclusion This EST collection displays a good representation of the T. cacao transcriptome, suitable for analysis of biochemical pathways based on oligonucleotide microarrays derived from these ESTs. It will provide numerous genetic markers that will allow the construction of a high

  4. Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions.

    Argout, Xavier; Fouet, Olivier; Wincker, Patrick; Gramacho, Karina; Legavre, Thierry; Sabau, Xavier; Risterucci, Ange Marie; Da Silva, Corinne; Cascardo, Julio; Allegre, Mathilde; Kuhn, David; Verica, Joseph; Courtois, Brigitte; Loor, Gaston; Babin, Regis; Sounigo, Olivier; Ducamp, Michel; Guiltinan, Mark J; Ruiz, Manuel; Alemanno, Laurence; Machado, Regina; Phillips, Wilberth; Schnell, Ray; Gilmour, Martin; Rosenquist, Eric; Butler, David; Maximova, Siela; Lanaud, Claire

    2008-10-30

    Theobroma cacao L., is a tree originated from the tropical rainforest of South America. It is one of the major cash crops for many tropical countries. T. cacao is mainly produced on smallholdings, providing resources for 14 million farmers. Disease resistance and T. cacao quality improvement are two important challenges for all actors of cocoa and chocolate production. T. cacao is seriously affected by pests and fungal diseases, responsible for more than 40% yield losses and quality improvement, nutritional and organoleptic, is also important for consumers. An international collaboration was formed to develop an EST genomic resource database for cacao. Fifty-six cDNA libraries were constructed from different organs, different genotypes and different environmental conditions. A total of 149,650 valid EST sequences were generated corresponding to 48,594 unigenes, 12,692 contigs and 35,902 singletons. A total of 29,849 unigenes shared significant homology with public sequences from other species.Gene Ontology (GO) annotation was applied to distribute the ESTs among the main GO categories.A specific information system (ESTtik) was constructed to process, store and manage this EST collection allowing the user to query a database.To check the representativeness of our EST collection, we looked for the genes known to be involved in two different metabolic pathways extensively studied in other plant species and important for T. cacao qualities: the flavonoid and the terpene pathways. Most of the enzymes described in other crops for these two metabolic pathways were found in our EST collection.A large collection of new genetic markers was provided by this ESTs collection. This EST collection displays a good representation of the T. cacao transcriptome, suitable for analysis of biochemical pathways based on oligonucleotide microarrays derived from these ESTs. It will provide numerous genetic markers that will allow the construction of a high density gene map of T. cacao

  5. Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions

    Ruiz Manuel

    2008-10-01

    Full Text Available Abstract Background Theobroma cacao L., is a tree originated from the tropical rainforest of South America. It is one of the major cash crops for many tropical countries. T. cacao is mainly produced on smallholdings, providing resources for 14 million farmers. Disease resistance and T. cacao quality improvement are two important challenges for all actors of cocoa and chocolate production. T. cacao is seriously affected by pests and fungal diseases, responsible for more than 40% yield losses and quality improvement, nutritional and organoleptic, is also important for consumers. An international collaboration was formed to develop an EST genomic resource database for cacao. Results Fifty-six cDNA libraries were constructed from different organs, different genotypes and different environmental conditions. A total of 149,650 valid EST sequences were generated corresponding to 48,594 unigenes, 12,692 contigs and 35,902 singletons. A total of 29,849 unigenes shared significant homology with public sequences from other species. Gene Ontology (GO annotation was applied to distribute the ESTs among the main GO categories. A specific information system (ESTtik was constructed to process, store and manage this EST collection allowing the user to query a database. To check the representativeness of our EST collection, we looked for the genes known to be involved in two different metabolic pathways extensively studied in other plant species and important for T. cacao qualities: the flavonoid and the terpene pathways. Most of the enzymes described in other crops for these two metabolic pathways were found in our EST collection. A large collection of new genetic markers was provided by this ESTs collection. Conclusion This EST collection displays a good representation of the T. cacao transcriptome, suitable for analysis of biochemical pathways based on oligonucleotide microarrays derived from these ESTs. It will provide numerous genetic markers that will allow

  6. Dual Cross-Linked Biofunctional and Self-Healing Networks to Generate User-Defined Modular Gradient Hydrogel Constructs.

    Wei, Zhao; Lewis, Daniel M; Xu, Yu; Gerecht, Sharon

    2017-08-01

    Gradient hydrogels have been developed to mimic the spatiotemporal differences of multiple gradient cues in tissues. Current approaches used to generate such hydrogels are restricted to a single gradient shape and distribution. Here, a hydrogel is designed that includes two chemical cross-linking networks, biofunctional, and self-healing networks, enabling the customizable formation of modular gradient hydrogel construct with various gradient distributions and flexible shapes. The biofunctional networks are formed via Michael addition between the acrylates of oxidized acrylated hyaluronic acid (OAHA) and the dithiol of matrix metalloproteinase (MMP)-sensitive cross-linker and RGD peptides. The self-healing networks are formed via dynamic Schiff base reaction between N-carboxyethyl chitosan (CEC) and OAHA, which drives the modular gradient units to self-heal into an integral modular gradient hydrogel. The CEC-OAHA-MMP hydrogel exhibits excellent flowability at 37 °C under shear stress, enabling its injection to generate gradient distributions and shapes. Furthermore, encapsulated sarcoma cells respond to the gradient cues of RGD peptides and MMP-sensitive cross-linkers in the hydrogel. With these superior properties, the dual cross-linked CEC-OAHA-MMP hydrogel holds significant potential for generating customizable gradient hydrogel constructs, to study and guide cellular responses to their microenvironment such as in tumor mimicking, tissue engineering, and stem cell differentiation and morphogenesis. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Generation of a monoclonal antibody against the glycosylphosphatidylinositol-linked protein Rae-1 using genetically engineered tumor cells.

    Hu, Jiemiao; Vien, Long T; Xia, Xueqing; Bover, Laura; Li, Shulin

    2014-02-04

    Although genetically engineered cells have been used to generate monoclonal antibodies (mAbs) against numerous proteins, no study has used them to generate mAbs against glycosylphosphatidylinositol (GPI)-anchored proteins. The GPI-linked protein Rae-1, an NKG2D ligand member, is responsible for interacting with immune surveillance cells. However, very few high-quality mAbs against Rae-1 are available for use in multiple analyses, including Western blotting, immunohistochemistry, and flow cytometry. The lack of high-quality mAbs limits the in-depth analysis of Rae-1 fate, such as shedding and internalization, in murine models. Moreover, currently available screening approaches for identifying high-quality mAbs are excessively time-consuming and costly. We used Rae-1-overexpressing CT26 tumor cells to generate 60 hybridomas that secreted mAbs against Rae-1. We also developed a streamlined screening strategy for selecting the best anti-Rae-1 mAb for use in flow cytometry assay, enzyme-linked immunosorbent assay, Western blotting, and immunostaining. Our cell line-based immunization approach can yield mAbs against GPI-anchored proteins, and our streamlined screening strategy can be used to select the ideal hybridoma for producing such mAbs.

  8. Mridangam stroke dataset

    CompMusic

    2014-01-01

    The audio examples were recorded from a professional Carnatic percussionist in a semi-anechoic studio conditions by Akshay Anantapadmanabhan using SM-58 microphones and an H4n ZOOM recorder. The audio was sampled at 44.1 kHz and stored as 16 bit wav files. The dataset can be used for training models for each Mridangam stroke. /n/nA detailed description of the Mridangam and its strokes can be found in the paper below. A part of the dataset was used in the following paper. /nAkshay Anantapadman...

  9. The GTZAN dataset

    Sturm, Bob L.

    2013-01-01

    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge...... of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN...

  10. Two new fern chloroplasts and decelerated evolution linked to the long generation time in tree ferns.

    Zhong, Bojian; Fong, Richard; Collins, Lesley J; McLenachan, Patricia A; Penny, David

    2014-04-30

    We report the chloroplast genomes of a tree fern (Dicksonia squarrosa) and a "fern ally" (Tmesipteris elongata), and show that the phylogeny of early land plants is basically as expected, and the estimates of divergence time are largely unaffected after removing the fastest evolving sites. The tree fern shows the major reduction in the rate of evolution, and there has been a major slowdown in the rate of mutation in both families of tree ferns. We suggest that this is related to a generation time effect; if there is a long time period between generations, then this is probably incompatible with a high mutation rate because otherwise nearly every propagule would probably have several lethal mutations. This effect will be especially strong in organisms that have large numbers of cell divisions between generations. This shows the necessity of going beyond phylogeny and integrating its study with other properties of organisms. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  11. The grand unified link between the Peccei-Quinn mechanism and the generation puzzle

    Davidson, A.; Wali, K.C.

    1982-03-01

    The essential ingredients of the Peccei-Quinn mechanism are shown to be dictated by a proper choice of a grand unification scheme. The presence of U(1)sub(PQ) gives rise to the possibility that the same physics which resolves the strong CP-violation problem may decode the generation puzzle with no extra cost. Multigenerational signatures of the invisible axion scenario, such as the canonical fermion mass matrix, are discussed. The uniqueness and the special values of the quantized PQ-assignments, namely 1,-3,5-7,... for successive generations, acquire an automatic explanation once the idea of ''horizontal compositeness'' is invoked. A characteristic feature then is that the muon appears to have a less complicated structure than the electron. Furthermore, U(1)sub(PQ) chooses SO(10) to be its only tenable gauge symmetry partner, and at the same time crucially restricts the associated Higgs system. All this finally results in a consistent fermion mass hierarchy with log m, to the crudest estimation, varying linearly with respect to the generation index. (author)

  12. A high space-time resolution dataset linking meteorological forcing and hydro-sedimentary response in a mesoscale Mediterranean catchment (Auzon) of the Ardèche region, France

    Nord, Guillaume; Boudevillain, Brice; Berne, Alexis; Branger, Flora; Braud, Isabelle; Dramais, Guillaume; Gérard, Simon; Coz, Le Jérôme; Legoût, Cédric; Molinié, Gilles; Teuling, Ryan

    2017-01-01

    A comprehensive hydrometeorological dataset is presented spanning the period 1 January 2011-31 December 2014 to improve the understanding of the hydrological processes leading to flash floods and the relation between rainfall, runoff, erosion and sediment transport in a mesoscale catchment

  13. A control strategy for DC-link voltage control containing PV generation and energy storage — An intelligent approach

    Rouzbehi, Kumars; Miranian, Arash; Candela García, José Ignacio; Luna Alloza, Álvaro; Rodríguez Cortés, Pedro

    2014-01-01

    In this paper, DC-link voltage control in DC microgrids with photovoltaic (PV) generation and battery, is addressed based on an intelligent approach. The proposed strategy is based on the modeling of the power interface, i.e. power electronic converter, located between the PV array, battery and DC bus, by use of measurement data. For this purpose, a local model network (LMN) is developed to model the converter and then a local linear control (LLC) strategy is designed based on the LMN. Simula...

  14. Generation of nitric oxide from nitrite by carbonic anhydrase: a possible link between metabolic activity and vasodilation

    Aamand, Rasmus; Dalsgaard, Thomas; Jensen, Frank Bo

    2009-01-01

    In catalyzing the reversible hydration of CO2 to bicarbonate and protons, the ubiquitous enzyme carbonic anhydrase (CA) plays a crucial role in CO2 transport, in acid-base balance, and in linking local acidosis to O2 unloading from hemoglobin. Considering the structural similarity between...... bicarbonate and nitrite, we hypothesized that CA uses nitrite as a substrate to produce the potent vasodilator nitric oxide (NO) to increase local blood flow to metabolically active tissues. Here we show that CA readily reacts with nitrite to generate NO, particularly at low pH, and that the NO produced...

  15. Dataset - Adviesregel PPL 2010

    Evert, van F.K.; Schans, van der D.A.; Geel, van W.C.A.; Slabbekoorn, J.J.; Booij, R.; Jukema, J.N.; Meurs, E.J.J.; Uenk, D.

    2011-01-01

    This dataset contains experimental data from a number of field experiments with potato in The Netherlands (Van Evert et al., 2011). The data are presented as an SQL dump of a PostgreSQL database (version 8.4.4). An outline of the entity-relationship diagram of the database is given in an

  16. Simulation of Smart Home Activity Datasets

    Jonathan Synnott

    2015-06-01

    Full Text Available A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  17. Simulation of Smart Home Activity Datasets.

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-06-16

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  18. The missing link in El Niño’s phenomenon generation

    Mato Méndez, Fernando José

    2017-01-01

    The study of the El Niño phenomenon has been addressed for decades by means of the well-known ocean-atmosphere coupling model described by El Niño Southern Oscillation (ENSO) phenomenon. However, its generation mechanism has remained unknown until now, hindering the forecast of such occurrence and the degree of its intensity. Our research provides for the first time the discovery of a clear correlation pattern between a temporal inmense increase in seismicity at localized regions inside the P...

  19. Analysis of the Effect of Electron Density Perturbations Generated by Gravity Waves on HF Communication Links

    Fagre, M.; Elias, A. G.; Chum, J.; Cabrera, M. A.

    2017-12-01

    In the present work, ray tracing of high frequency (HF) signals in ionospheric disturbed conditions is analyzed, particularly in the presence of electron density perturbations generated by gravity waves (GWs). The three-dimensional numerical ray tracing code by Jones and Stephenson, based on Hamilton's equations, which is commonly used to study radio propagation through the ionosphere, is used. An electron density perturbation model is implemented to this code based upon the consideration of atmospheric GWs generated at a height of 150 km in the thermosphere and propagating up into the ionosphere. The motion of the neutral gas at these altitudes induces disturbances in the background plasma which affects HF signals propagation. To obtain a realistic model of GWs in order to analyze the propagation and dispersion characteristics, a GW ray tracing method with kinematic viscosity and thermal diffusivity was applied. The IRI-2012, HWM14 and NRLMSISE-00 models were incorporated to assess electron density, wind velocities, neutral temperature and total mass density needed for the ray tracing codes. Preliminary results of gravity wave effects on ground range and reflection height are presented for low-mid latitude ionosphere.

  20. Wind Integration National Dataset Toolkit | Grid Modernization | NREL

    Integration National Dataset Toolkit Wind Integration National Dataset Toolkit The Wind Integration National Dataset (WIND) Toolkit is an update and expansion of the Eastern Wind Integration Data Set and Western Wind Integration Data Set. It supports the next generation of wind integration studies. WIND

  1. Solar Integration National Dataset Toolkit | Grid Modernization | NREL

    Solar Integration National Dataset Toolkit Solar Integration National Dataset Toolkit NREL is working on a Solar Integration National Dataset (SIND) Toolkit to enable researchers to perform U.S . regional solar generation integration studies. It will provide modeled, coherent subhourly solar power data

  2. A high space-time resolution dataset linking meteorological forcing and hydro-sedimentary response in a mesoscale Mediterranean catchment (Auzon) of the Ardèche region, France

    Nord, Guillaume; Boudevillain, Brice; Berne, Alexis; Branger, Flora; Braud, Isabelle; Dramais, Guillaume; Gérard, Simon; Coz, Le, Jérôme; Legoût, Cédric; Molinié, Gilles; Teuling, Ryan

    2017-01-01

    A comprehensive hydrometeorological dataset is presented spanning the period 1 January 2011-31 December 2014 to improve the understanding of the hydrological processes leading to flash floods and the relation between rainfall, runoff, erosion and sediment transport in a mesoscale catchment (Auzon, 116km2) of the Mediterranean region. Badlands are present in the Auzon catchment and well connected to high-gradient channels of bedrock rivers which promotes the transfer of suspended solids downst...

  3. A viral metagenomic approach on a non-metagenomic experiment: Mining next generation sequencing datasets from pig DNA identified several porcine parvoviruses for a retrospective evaluation of viral infections.

    Samuele Bovo

    Full Text Available Shot-gun next generation sequencing (NGS on whole DNA extracted from specimens collected from mammals often produces reads that are not mapped (i.e. unmapped reads on the host reference genome and that are usually discarded as by-products of the experiments. In this study, we mined Ion Torrent reads obtained by sequencing DNA isolated from archived blood samples collected from 100 performance tested Italian Large White pigs. Two reduced representation libraries were prepared from two DNA pools constructed each from 50 equimolar DNA samples. Bioinformatic analyses were carried out to mine unmapped reads on the reference pig genome that were obtained from the two NGS datasets. In silico analyses included read mapping and sequence assembly approaches for a viral metagenomic analysis using the NCBI Viral Genome Resource. Our approach identified sequences matching several viruses of the Parvoviridae family: porcine parvovirus 2 (PPV2, PPV4, PPV5 and PPV6 and porcine bocavirus 1-H18 isolate (PBoV1-H18. The presence of these viruses was confirmed by PCR and Sanger sequencing of individual DNA samples. PPV2, PPV4, PPV5, PPV6 and PBoV1-H18 were all identified in samples collected in 1998-2007, 1998-2000, 1997-2000, 1998-2004 and 2003, respectively. For most of these viruses (PPV4, PPV5, PPV6 and PBoV1-H18 previous studies reported their first occurrence much later (from 5 to more than 10 years than our identification period and in different geographic areas. Our study provided a retrospective evaluation of apparently asymptomatic parvovirus infected pigs providing information that could be important to define occurrence and prevalence of different parvoviruses in South Europe. This study demonstrated the potential of mining NGS datasets non-originally derived by metagenomics experiments for viral metagenomics analyses in a livestock species.

  4. Next Generation Waste Tracking: Linking Legacy Systems with Modern Networking Technologies

    Walker, Randy M.; Resseguie, David R.; Shankar, Mallikarjun; Gorman, Bryan L.; Smith, Cyrus M.; Hill, David E.

    2010-01-01

    of existing legacy hazardous, radioactive and related informational databases and systems using emerging Web 2.0 technologies. These capabilities were used to interoperate ORNL s waste generating, packaging, transportation and disposal with other DOE ORO waste management contractors. Importantly, the DOE EM objectives were accomplished in a cost effective manner without altering existing information systems. A path forward is to demonstrate and share these technologies with DOE EM, contractors and stakeholders. This approach will not alter existing DOE assets, i.e. Automated Traffic Management Systems (ATMS), Transportation Tracking and Communications System (TRANSCOM), the Argonne National Laboratory (ANL) demonstrated package tracking system, etc.

  5. Generation of knockout rats with X-linked severe combined immunodeficiency (X-SCID using zinc-finger nucleases.

    Tomoji Mashimo

    Full Text Available BACKGROUND: Although the rat is extensively used as a laboratory model, the inability to utilize germ line-competent rat embryonic stem (ES cells has been a major drawback for studies that aim to elucidate gene functions. Recently, zinc-finger nucleases (ZFNs were successfully used to create genome-specific double-stranded breaks and thereby induce targeted gene mutations in a wide variety of organisms including plants, drosophila, zebrafish, etc. METHODOLOGY/PRINCIPAL FINDINGS: We report here on ZFN-induced gene targeting of the rat interleukin 2 receptor gamma (Il2rg locus, where orthologous human and mouse mutations cause X-linked severe combined immune deficiency (X-SCID. Co-injection of mRNAs encoding custom-designed ZFNs into the pronucleus of fertilized oocytes yielded genetically modified offspring at rates greater than 20%, which possessed a wide variety of deletion/insertion mutations. ZFN-modified founders faithfully transmitted their genetic changes to the next generation along with the severe combined immune deficiency phenotype. CONCLUSIONS AND SIGNIFICANCE: The efficient and rapid generation of gene knockout rats shows that using ZFN technology is a new strategy for creating gene-targeted rat models of human diseases. In addition, the X-SCID rats that were established in this study will be valuable in vivo tools for evaluating drug treatment or gene therapy as well as model systems for examining the treatment of xenotransplanted malignancies.

  6. A high space-time resolution dataset linking meteorological forcing and hydro-sedimentary response in a mesoscale Mediterranean catchment (Auzon) of the Ardèche region, France

    Nord, Guillaume; Boudevillain, Brice; Berne, Alexis; Branger, Flora; Braud, Isabelle; Dramais, Guillaume; Gérard, Simon; Le Coz, Jérôme; Legoût, Cédric; Molinié, Gilles; Van Baelen, Joel; Vandervaere, Jean-Pierre; Andrieu, Julien; Aubert, Coralie; Calianno, Martin; Delrieu, Guy; Grazioli, Jacopo; Hachani, Sahar; Horner, Ivan; Huza, Jessica; Le Boursicaud, Raphaël; Raupach, Timothy H.; Teuling, Adriaan J.; Uber, Magdalena; Vincendon, Béatrice; Wijbrans, Annette

    2017-03-01

    A comprehensive hydrometeorological dataset is presented spanning the period 1 January 2011-31 December 2014 to improve the understanding of the hydrological processes leading to flash floods and the relation between rainfall, runoff, erosion and sediment transport in a mesoscale catchment (Auzon, 116 km2) of the Mediterranean region. Badlands are present in the Auzon catchment and well connected to high-gradient channels of bedrock rivers which promotes the transfer of suspended solids downstream. The number of observed variables, the various sensors involved (both in situ and remote) and the space-time resolution ( ˜ km2, ˜ min) of this comprehensive dataset make it a unique contribution to research communities focused on hydrometeorology, surface hydrology and erosion. Given that rainfall is highly variable in space and time in this region, the observation system enables assessment of the hydrological response to rainfall fields. Indeed, (i) rainfall data are provided by rain gauges (both a research network of 21 rain gauges with a 5 min time step and an operational network of 10 rain gauges with a 5 min or 1 h time step), S-band Doppler dual-polarization radars (1 km2, 5 min resolution), disdrometers (16 sensors working at 30 s or 1 min time step) and Micro Rain Radars (5 sensors, 100 m height resolution). Additionally, during the special observation period (SOP-1) of the HyMeX (Hydrological Cycle in the Mediterranean Experiment) project, two X-band radars provided precipitation measurements at very fine spatial and temporal scales (1 ha, 5 min). (ii) Other meteorological data are taken from the operational surface weather observation stations of Météo-France (including 2 m air temperature, atmospheric pressure, 2 m relative humidity, 10 m wind speed and direction, global radiation) at the hourly time resolution (six stations in the region of interest). (iii) The monitoring of surface hydrology and suspended sediment is multi-scale and based on nested

  7. New-Generation NASA Aura Ozone Monitoring Instrument (OMI) Volcanic SO2 Dataset: Algorithm Description, Initial Results, and Continuation with the Suomi-NPP Ozone Mapping and Profiler Suite (OMPS)

    Li, Can; Krotkov, Nickolay A.; Carn, Simon; Zhang, Yan; Spurr, Robert J. D.; Joiner, Joanna

    2017-01-01

    Since the fall of 2004, the Ozone Monitoring Instrument (OMI) has been providing global monitoring of volcanic SO2 emissions, helping to understand their climate impacts and to mitigate aviation hazards. Here we introduce a new-generation OMI volcanic SO2 dataset based on a principal component analysis (PCA) retrieval technique. To reduce retrieval noise and artifacts as seen in the current operational linear fit (LF) algorithm, the new algorithm, OMSO2VOLCANO, uses characteristic features extracted directly from OMI radiances in the spectral fitting, thereby helping to minimize interferences from various geophysical processes (e.g., O3 absorption) and measurement details (e.g., wavelength shift). To solve the problem of low bias for large SO2 total columns in the LF product, the OMSO2VOLCANO algorithm employs a table lookup approach to estimate SO2 Jacobians (i.e., the instrument sensitivity to a perturbation in the SO2 column amount) and iteratively adjusts the spectral fitting window to exclude shorter wavelengths where the SO2 absorption signals are saturated. To first order, the effects of clouds and aerosols are accounted for using a simple Lambertian equivalent reflectivity approach. As with the LF algorithm, OMSO2VOLCANO provides total column retrievals based on a set of predefined SO2 profiles from the lower troposphere to the lower stratosphere, including a new profile peaked at 13 km for plumes in the upper troposphere. Examples given in this study indicate that the new dataset shows significant improvement over the LF product, with at least 50% reduction in retrieval noise over the remote Pacific. For large eruptions such as Kasatochi in 2008 (approximately 1700 kt total SO2/ and Sierra Negra in 2005 (greater than 1100DU maximum SO2), OMSO2VOLCANO generally agrees well with other algorithms that also utilize the full spectral content of satellite measurements, while the LF algorithm tends to underestimate SO2. We also demonstrate that, despite the

  8. Consolidating drug data on a global scale using Linked Data.

    Jovanovik, Milos; Trajanov, Dimitar

    2017-01-21

    Drug product data is available on the Web in a distributed fashion. The reasons lie within the regulatory domains, which exist on a national level. As a consequence, the drug data available on the Web are independently curated by national institutions from each country, leaving the data in varying languages, with a varying structure, granularity level and format, on different locations on the Web. Therefore, one of the main challenges in the realm of drug data is the consolidation and integration of large amounts of heterogeneous data into a comprehensive dataspace, for the purpose of developing data-driven applications. In recent years, the adoption of the Linked Data principles has enabled data publishers to provide structured data on the Web and contextually interlink them with other public datasets, effectively de-siloing them. Defining methodological guidelines and specialized tools for generating Linked Data in the drug domain, applicable on a global scale, is a crucial step to achieving the necessary levels of data consolidation and alignment needed for the development of a global dataset of drug product data. This dataset would then enable a myriad of new usage scenarios, which can, for instance, provide insight into the global availability of different drug categories in different parts of the world. We developed a methodology and a set of tools which support the process of generating Linked Data in the drug domain. Using them, we generated the LinkedDrugs dataset by seamlessly transforming, consolidating and publishing high-quality, 5-star Linked Drug Data from twenty-three countries, containing over 248,000 drug products, over 99,000,000 RDF triples and over 278,000 links to generic drugs from the LOD Cloud. Using the linked nature of the dataset, we demonstrate its ability to support advanced usage scenarios in the drug domain. The process of generating the LinkedDrugs dataset demonstrates the applicability of the methodological guidelines and the

  9. The Role of Datasets on Scientific Influence within Conflict Research

    Van Holt, Tracy; Johnson, Jeffery C.; Moates, Shiloh; Carley, Kathleen M.

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving “conflict” in the Web of Science (WoS) over a 66-year period (1945–2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed—such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957–1971 where ideas didn’t persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped

  10. The Role of Datasets on Scientific Influence within Conflict Research.

    Tracy Van Holt

    Full Text Available We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS over a 66-year period (1945-2011. We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA, a specialized social network analysis on this citation network (~1.5 million works, to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993. The critical path consisted of a number of key features: 1 Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2 Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3 We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography. Publically available conflict datasets developed early on helped

  11. The Role of Datasets on Scientific Influence within Conflict Research.

    Van Holt, Tracy; Johnson, Jeffery C; Moates, Shiloh; Carley, Kathleen M

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS) over a 66-year period (1945-2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped shape the

  12. “The influence of control group reproduction on the statistical power of the Environmental Protection Agency’s Medaka Extended One Generation Reproduction Test (MEOGRT) data for simulations” Dataset

    U.S. Environmental Protection Agency — Excel spreadsheet that contains the raw fecundity data used to conduct power simulations specific to the MEOGRT reproductive assessment. This dataset is associated...

  13. Link to paper

    U.S. Environmental Protection Agency — Link to the paper. This dataset is associated with the following publication: Naile, J., A.W. Garrison, J. Avants, and J. Washington. Isomers/enantiomers of...

  14. NERIES: Seismic Data Gateways and User Composed Datasets Metadata Management

    Spinuso, Alessandro; Trani, Luca; Kamb, Linus; Frobert, Laurent

    2010-05-01

    One of the NERIES EC project main objectives is to establish and improve the networking of seismic waveform data exchange and access among four main data centers in Europe: INGV, GFZ, ORFEUS and IPGP. Besides the implementation of the data backbone, several investigations and developments have been conducted in order to offer to the users the data available from this network, either programmatically or interactively. One of the challenges is to understand how to enable users` activities such as discovering, aggregating, describing and sharing datasets to obtain a decrease in the replication of similar data queries towards the network, exempting the data centers to guess and create useful pre-packed products. We`ve started to transfer this task more and more towards the users community, where the users` composed data products could be extensively re-used. The main link to the data is represented by a centralized webservice (SeismoLink) acting like a single access point to the whole data network. Users can download either waveform data or seismic station inventories directly from their own software routines by connecting to this webservice, which routes the request to the data centers. The provenance of the data is maintained and transferred to the users in the form of URIs, that identify the dataset and implicitly refer to the data provider. SeismoLink, combined with other webservices (eg EMSC-QuakeML earthquakes catalog service), is used from a community gateway such as the NERIES web portal (http://www.seismicportal.eu). Here the user interacts with a map based portlet which allows the dynamic composition of a data product, binding seismic event`s parameters with a set of seismic stations. The requested data is collected by the back-end processes of the portal, preserved and offered to the user in a personal data cart, where metadata can be generated interactively on-demand. The metadata, expressed in RDF, can also be remotely ingested. They offer rating

  15. Genomics dataset of unidentified disclosed isolates

    Bhagwan N. Rekadwad

    2016-09-01

    Full Text Available Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis. Keywords: BioLABs, Blunt ends, Genomics, NEB cutter, Restriction digestion, Short DNA sequences, Sticky ends

  16. National Elevation Dataset

    ,

    2002-01-01

    The National Elevation Dataset (NED) is a new raster product assembled by the U.S. Geological Survey. NED is designed to provide National elevation data in a seamless form with a consistent datum, elevation unit, and projection. Data corrections were made in the NED assembly process to minimize artifacts, perform edge matching, and fill sliver areas of missing data. NED has a resolution of one arc-second (approximately 30 meters) for the conterminous United States, Hawaii, Puerto Rico and the island territories and a resolution of two arc-seconds for Alaska. NED data sources have a variety of elevation units, horizontal datums, and map projections. In the NED assembly process the elevation values are converted to decimal meters as a consistent unit of measure, NAD83 is consistently used as horizontal datum, and all the data are recast in a geographic projection. Older DEM's produced by methods that are now obsolete have been filtered during the NED assembly process to minimize artifacts that are commonly found in data produced by these methods. Artifact removal greatly improves the quality of the slope, shaded-relief, and synthetic drainage information that can be derived from the elevation data. Figure 2 illustrates the results of this artifact removal filtering. NED processing also includes steps to adjust values where adjacent DEM's do not match well, and to fill sliver areas of missing data between DEM's. These processing steps ensure that NED has no void areas and artificial discontinuities have been minimized. The artifact removal filtering process does not eliminate all of the artifacts. In areas where the only available DEM is produced by older methods, then "striping" may still occur.

  17. Development of research activity support system. 3. Automatic link generation/maintenance on self-evolving database; Kenkyu katsudo shien system no kaihatsu. 3. Jiko zoshokugata database deno bunsho link no jido sakusei/shufuku

    Shimada, T.; Futakata, A. [Central Research Institute of Electric Power Industry, Tokyo (Japan)

    1997-04-01

    For a coordinated task to be accomplished in an organization, documents, charts, and data produced by plural workers need to be shared by the plural workers. This information sharing setup will function more effectively when the meanings and purposes of documents, etc., are arranged in good order relative to the other documents and when they are managed as a group of documents organically linked with each other and properly updated as the task approaches completion. In the self-evolving databases proposed so far, five types of document links representing the relations between documents are automatically generated and the documents are unifiedly managed for the documents yielded by coordinated work to be arranged in a proper order. A procedure for automatically generating document links are established on the basis of information received from the document retrieval system and Lotus Notes application. In a self-evolving database, the document on either side of a link is apt to be lost due to users` moving or deleting documents. An automatic procedure is developed in this report which will enable such document links to correctly restore themselves without loss of semantic relations. 12 refs., 11 figs., 3 tabs.

  18. Geostatistics for Large Datasets

    Sun, Ying

    2011-10-31

    Each chapter should be preceded by an abstract (10–15 lines long) that summarizes the content. The abstract will appear onlineat www.SpringerLink.com and be available with unrestricted access. This allows unregistered users to read the abstract as a teaser for the complete chapter. As a general rule the abstracts will not appear in the printed version of your book unless it is the style of your particular book or that of the series to which your book belongs. Please use the ’starred’ version of the new Springer abstractcommand for typesetting the text of the online abstracts (cf. source file of this chapter template abstract) and include them with the source files of your manuscript. Use the plain abstractcommand if the abstract is also to appear in the printed version of the book.

  19. Geostatistics for Large Datasets

    Sun, Ying; Li, Bo; Genton, Marc G.

    2011-01-01

    Each chapter should be preceded by an abstract (10–15 lines long) that summarizes the content. The abstract will appear onlineat www.SpringerLink.com and be available with unrestricted access. This allows unregistered users to read the abstract as a teaser for the complete chapter. As a general rule the abstracts will not appear in the printed version of your book unless it is the style of your particular book or that of the series to which your book belongs. Please use the ’starred’ version of the new Springer abstractcommand for typesetting the text of the online abstracts (cf. source file of this chapter template abstract) and include them with the source files of your manuscript. Use the plain abstractcommand if the abstract is also to appear in the printed version of the book.

  20. NP-PAH Interaction Dataset

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  1. 16 Gb/s QPSK Wireless-over-Fibre Link in 75-110GHz Band Employing Optical Heterodyne Generation and Coherent Detection

    Zibar, Darko; Sambaraju, Rakesh; Caballero Jambrina, Antonio

    2010-01-01

    We report on the first demonstration of QPSK based Wireless-over-Fibre link in 75-110GHz band with a record capacity of up to 16Gb/s. Photonic wireless signal generation by heterodyne beating of free-running lasers and baud-rate digital coherent detection are employed....

  2. Analysis of a novel autonomous marine hybrid power generation/energy storage system with a high-voltage direct current link

    Wang, L.; Lee, D. J.; Lee, W. J.

    2008-01-01

    wind turbines andWells turbines to respectively capture wind energy and wave energy from marine wind and oceanwave. In addition to wind-turbine generators(WTGs) andwave-energy turbine generators (WETGs) employed in the studied system, diesel-engine generators (DEGs) and an aqua electrolyzer (AE......This paper presents both time-domain and frequency-domain simulated results of a novel marine hybrid renewable-energy power generation/energy storage system (PG/ESS) feeding isolated loads through an high-voltage direct current (HVDC) link. The studied marine PG subsystems comprise both offshore......) absorbing a part of generated energy from WTGs and WETGs to generate available hydrogen for fuel cells (FCs) are also included in the PG subsystems. The ES subsystems consist of a flywheel energy storage system(FESS) and a compressed air energy storage (CAES) system to balance the required energy...

  3. Open University Learning Analytics dataset.

    Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

    2017-11-28

    Learning Analytics focuses on the collection and analysis of learners' data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students' interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license.

  4. Initial results of centralized autonomous orbit determination of the new-generation BDS satellites with inter-satellite link measurements

    Tang, Chengpan; Hu, Xiaogong; Zhou, Shanshi; Liu, Li; Pan, Junyang; Chen, Liucheng; Guo, Rui; Zhu, Lingfeng; Hu, Guangming; Li, Xiaojie; He, Feng; Chang, Zhiqiao

    2018-01-01

    Autonomous orbit determination is the ability of navigation satellites to estimate the orbit parameters on-board using inter-satellite link (ISL) measurements. This study mainly focuses on data processing of the ISL measurements as a new measurement type and its application on the centralized autonomous orbit determination of the new-generation Beidou navigation satellite system satellites for the first time. The ISL measurements are dual one-way measurements that follow a time division multiple access (TDMA) structure. The ranging error of the ISL measurements is less than 0.25 ns. This paper proposes a derivation approach to the satellite clock offsets and the geometric distances from TDMA dual one-way measurements without a loss of accuracy. The derived clock offsets are used for time synchronization, and the derived geometry distances are used for autonomous orbit determination. The clock offsets from the ISL measurements are consistent with the L-band two-way satellite, and time-frequency transfer clock measurements and the detrended residuals vary within 0.5 ns. The centralized autonomous orbit determination is conducted in a batch mode on a ground-capable server for the feasibility study. Constant hardware delays are present in the geometric distances and become the largest source of error in the autonomous orbit determination. Therefore, the hardware delays are estimated simultaneously with the satellite orbits. To avoid uncertainties in the constellation orientation, a ground anchor station that "observes" the satellites with on-board ISL payloads is introduced into the orbit determination. The root-mean-square values of orbit determination residuals are within 10.0 cm, and the standard deviation of the estimated ISL hardware delays is within 0.2 ns. The accuracy of the autonomous orbits is evaluated by analysis of overlap comparison and the satellite laser ranging (SLR) residuals and is compared with the accuracy of the L-band orbits. The results indicate

  5. Self-generated covalent cross-links in the cell-surface adhesins of Gram-positive bacteria.

    Baker, Edward N; Squire, Christopher J; Young, Paul G

    2015-10-01

    The ability of bacteria to adhere to other cells or to surfaces depends on long, thin adhesive structures that are anchored to their cell walls. These structures include extended protein oligomers known as pili and single, multi-domain polypeptides, mostly based on multiple tandem Ig-like domains. Recent structural studies have revealed the widespread presence of covalent cross-links, not previously seen within proteins, which stabilize these domains. The cross-links discovered so far are either isopeptide bonds that link lysine side chains to the side chains of asparagine or aspartic acid residues or ester bonds between threonine and glutamine side chains. These bonds appear to be formed by spontaneous intramolecular reactions as the proteins fold and are strategically placed so as to impart considerable mechanical strength. © 2015 Authors; published by Portland Press Limited.

  6. Turkey Run Landfill Emissions Dataset

    U.S. Environmental Protection Agency — landfill emissions measurements for the Turkey run landfill in Georgia. This dataset is associated with the following publication: De la Cruz, F., R. Green, G....

  7. Dataset of NRDA emission data

    U.S. Environmental Protection Agency — Emissions data from open air oil burns. This dataset is associated with the following publication: Gullett, B., J. Aurell, A. Holder, B. Mitchell, D. Greenwell, M....

  8. Chemical product and function dataset

    U.S. Environmental Protection Agency — Merged product weight fraction and chemical function data. This dataset is associated with the following publication: Isaacs , K., M. Goldsmith, P. Egeghy , K....

  9. DC Linked Hybrid Generation System with an Energy Storage Device including a Photo-Voltaic Generation and a Gas Engine Cogeneration for Residential Houses

    Lung, Chienru; Miyake, Shota; Kakigano, Hiroaki; Miura, Yushi; Ise, Toshifumi; Momose, Toshinari; Hayakawa, Hideki

    For the past few years, a hybrid generation system including solar panel and gas cogeneration is being used for residential houses. Solar panels can generate electronic power at daytime; meanwhile, it cannot generate electronic power at night time. But the power consumption of residential houses usually peaks in the evening. The gas engine cogeneration system can generate electronic power without such a restriction, and it also can generate heat power to warm up house or to produce hot water. In this paper, we propose the solar panel and gas engine co-generation hybrid system with an energy storage device that is combined by dc bus. If a black out occurs, the system still can supply electronic power for special house loads. We propose the control scheme for the system which are related with the charging level of the energy storage device, the voltage of the utility grid which can be applied both grid connected and stand alone operation. Finally, we carried out some experiments to demonstrate the system operation and calculation for loss estimation.

  10. Effects of DC-link Filter on Harmonic and Interharmonic Generation in Three-phase Adjustable Speed Drive Systems

    Soltani, Hamid; Davari, Pooya; Kumar, Dinesh

    2017-01-01

    Harmonic and interharmonic distortions are considered as the main power quality issues especially in the distribution networks. The double-stage Adjustable Speed Drives (ASDs) in which the front-end diode rectifier is connected to a rear-end inverter through an intermediate DC-link filter may inj...

  11. Markers of human immunodeficiency virus infection in high-risk individuals seronegative by first generation enzyme-linked immunosorbent assay

    Pedersen, C; Lindhardt, B O; Lauritzen, E

    1989-01-01

    -linked immunoassay (ELISA). Seventy-four of the serum samples had been obtained from 40 sexual partners of HIV antibody positive individuals. Two of the samples were reactive for p24 in immunoblot, but no other markers of HIV infection were found. From 80 sexually active male homosexuals, 117 serum samples were...

  12. On possibility of application of the parallel-mixed type coolant flow scheme to NPP steam generators linked with superheaters

    Malkis, V.A.; Lokshin, V.A.

    1983-01-01

    Optimum distribution of the coolant straight-through flow between the superheater, evaporator and economizer is determined and the parallel-mixed type flow scheme is compared with other schemes. The calculations are performed for the 250 MW(e) steam generator for the WWER-1000 reactor unit the inlet and outlet primary coolant temperature of which is 324 and 290 deg C, respectively, while the feed water and saturation temperatures are 220 and 278.5 deg C, respectively. The rated superheating temperature is 300 deg C. The comparison of different schemes has been performed according to the average temperature head value at the steam-generator under the condition of equality as well as essential difference in the heat transfer coefficients in certain steam-generator sections. The calculations have shown that the use of parallel-mixed type flow permits to essentially increase the temperature head of the steam generator. At a constant heat transfer coefficient in all steam generator sections the highest temperature head is reached. At relative flow rates in the steam generator, economizer and evaporator equal to 6, 8 and 86%, respectively. The superheated steam generator temperature head in this case by 12% exceeds the temperature head of the WWER-1000 reactor unit wet steam generator. In case of heat transfer coefficient reduction in the superheater by a factor of three, the choice of the primary coolant, optimum distribution permits to maintain the steam generator temperature head at the level of the WWER-1000 reactor unit wet-steam steam generator. The use of the parallel-mixed type flow scheme permits to design a steam generator of slightly superheated steam for the parameters of the WWER-1000 unit

  13. Automatic processing of multimodal tomography datasets.

    Parsons, Aaron D; Price, Stephen W T; Wadeson, Nicola; Basham, Mark; Beale, Andrew M; Ashton, Alun W; Mosselmans, J Frederick W; Quinn, Paul D

    2017-01-01

    With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.

  14. Toward computational cumulative biology by combining models of biological datasets.

    Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

    2014-01-01

    A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.

  15. The Domesday Dataset: Linked Open Data in Disability Studies

    Reddington, Joseph

    2013-01-01

    Augmentative and alternative communication (AAC) devices provide the ability for many people with disabilities to make themselves understood. For the large proportion of users with an intellectual disability, these devices may be their only means of communication. Estimates of the number of AAC devices in use are vague and lack transparency. This…

  16. The NOAA Dataset Identifier Project

    de la Beaujardiere, J.; Mccullough, H.; Casey, K. S.

    2013-12-01

    The US National Oceanic and Atmospheric Administration (NOAA) initiated a project in 2013 to assign persistent identifiers to datasets archived at NOAA and to create informational landing pages about those datasets. The goals of this project are to enable the citation of datasets used in products and results in order to help provide credit to data producers, to support traceability and reproducibility, and to enable tracking of data usage and impact. A secondary goal is to encourage the submission of datasets for long-term preservation, because only archived datasets will be eligible for a NOAA-issued identifier. A team was formed with representatives from the National Geophysical, Oceanographic, and Climatic Data Centers (NGDC, NODC, NCDC) to resolve questions including which identifier scheme to use (answer: Digital Object Identifier - DOI), whether or not to embed semantics in identifiers (no), the level of granularity at which to assign identifiers (as coarsely as reasonable), how to handle ongoing time-series data (do not break into chunks), creation mechanism for the landing page (stylesheet from formal metadata record preferred), and others. Decisions made and implementation experience gained will inform the writing of a Data Citation Procedural Directive to be issued by the Environmental Data Management Committee in 2014. Several identifiers have been issued as of July 2013, with more on the way. NOAA is now reporting the number as a metric to federal Open Government initiatives. This paper will provide further details and status of the project.

  17. Named Entity Linking Algorithm

    M. F. Panteleev

    2017-01-01

    Full Text Available In the tasks of processing text in natural language, Named Entity Linking (NEL represents the task to define and link some entity, which is found in the text, with some entity in the knowledge base (for example, Dbpedia. Currently, there is a diversity of approaches to solve this problem, but two main classes can be identified: graph-based approaches and machine learning-based ones. Graph and Machine Learning approaches-based algorithm is proposed accordingly to the stated assumptions about the interrelations of named entities in a sentence and in general.In the case of graph-based approaches, it is necessary to solve the problem of identifying an optimal set of the related entities according to some metric that characterizes the distance between these entities in a graph built on some knowledge base. Due to limitations in processing power, to solve this task directly is impossible. Therefore, its modification is proposed. Based on the algorithms of machine learning, an independent solution cannot be built due to small volumes of training datasets relevant to NEL task. However, their use can contribute to improving the quality of the algorithm. The adaptation of the Latent Dirichlet Allocation model is proposed in order to obtain a measure of the compatibility of attributes of various entities encountered in one context.The efficiency of the proposed algorithm was experimentally tested. A test dataset was independently generated. On its basis the performance of the model was compared using the proposed algorithm with the open source product DBpedia Spotlight, which solves the NEL problem.The mockup, based on the proposed algorithm, showed a low speed as compared to DBpedia Spotlight. However, the fact that it has shown higher accuracy, stipulates the prospects for work in this direction.The main directions of development were proposed in order to increase the accuracy of the system and its productivity.

  18. Design Criteria for DC Link Filters in a Synchronous Generator-Phase Controlled Rectifier-Filter-Load System

    Greseth, Gregory

    1999-01-01

    .... The proposed Navy DC Zonal Electrical Distribution System (DC ZEDS) being designed for the new DD-21 utilizes a rectified ac generator output which is filtered and stepped to usable voltages by local dc-dc converters...

  19. GHz wireless On-off-Keying link employing all photonic RF carrier generation and digital coherent detection

    Sambaraju, Rakesh; Zibar, Darko; Caballero Jambrina, Antonio

    2010-01-01

    Gb/s wireless signals at 82, 88 and 100 GHz carrier frequencies are successfully generated by heterodyne mixing of two optical carriers. A photonic detection technique with optical coherent receiver and digital signal processing is implemented for signal demodulation....

  20. Developing a Data-Set for Stereopsis

    D.W Hunter

    2014-08-01

    Full Text Available Current research on binocular stereopsis in humans and non-human primates has been limited by a lack of available data-sets. Current data-sets fall into two categories; stereo-image sets with vergence but no ranging information (Hibbard, 2008, Vision Research, 48(12, 1427-1439 or combinations of depth information with binocular images and video taken from cameras in fixed fronto-parallel configurations exhibiting neither vergence or focus effects (Hirschmuller & Scharstein, 2007, IEEE Conf. Computer Vision and Pattern Recognition. The techniques for generating depth information are also imperfect. Depth information is normally inaccurate or simply missing near edges and on partially occluded surfaces. For many areas of vision research these are the most interesting parts of the image (Goutcher, Hunter, Hibbard, 2013, i-Perception, 4(7, 484; Scarfe & Hibbard, 2013, Vision Research. Using state-of-the-art open-source ray-tracing software (PBRT as a back-end, our intention is to release a set of tools that will allow researchers in this field to generate artificial binocular stereoscopic data-sets. Although not as realistic as photographs, computer generated images have significant advantages in terms of control over the final output and ground-truth information about scene depth is easily calculated at all points in the scene, even partially occluded areas. While individual researchers have been developing similar stimuli by hand for many decades, we hope that our software will greatly reduce the time and difficulty of creating naturalistic binocular stimuli. Our intension in making this presentation is to elicit feedback from the vision community about what sort of features would be desirable in such software.

  1. Hardware-efficient signal generation of layered/enhanced ACO-OFDM for short-haul fiber-optic links.

    Wang, Qibing; Song, Binhuang; Corcoran, Bill; Boland, David; Zhu, Chen; Zhuang, Leimeng; Lowery, Arthur J

    2017-06-12

    Layered/enhanced ACO-OFDM is a promising candidate for intensity modulation and direct-detection based short-haul fiber-optic links due to its both power and spectral efficiency. In this paper, we firstly demonstrate a hardware-efficient real-time 9.375 Gb/s QPSK-encoded layered/enhanced asymmetrical clipped optical OFDM (L/E-ACO-OFDM) transmitter using a Virtex-6 FPGA. This L/E-ACO-OFDM signal is successfully transmitted over 20-km uncompensated standard single-mode fiber (S-SMF) using a directly modulated laser. Several methods are explored to reduce the FPGA's logic resource utilization by taking advantage of the L/E-ACO-OFDM's signal characteristics. We show that the logic resource occupation of L/E-ACO-OFDM transmitter is almost the same as that of DC-biased OFDM transmitter when they achieve the same spectral efficiency, proving its great potential to be used in a real-time short-haul optical transmission link.

  2. A Research Graph dataset for connecting research data repositories using RD-Switchboard.

    Aryani, Amir; Poblet, Marta; Unsworth, Kathryn; Wang, Jingbo; Evans, Ben; Devaraju, Anusuriya; Hausstein, Brigitte; Klas, Claus-Peter; Zapilko, Benjamin; Kaplun, Samuele

    2018-05-29

    This paper describes the open access graph dataset that shows the connections between Dryad, CERN, ANDS and other international data repositories to publications and grants across multiple research data infrastructures. The graph dataset was created using the Research Graph data model and the Research Data Switchboard (RD-Switchboard), a collaborative project by the Research Data Alliance DDRI Working Group (DDRI WG) with the aim to discover and connect the related research datasets based on publication co-authorship or jointly funded grants. The graph dataset allows researchers to trace and follow the paths to understanding a body of work. By mapping the links between research datasets and related resources, the graph dataset improves both their discovery and visibility, while avoiding duplicate efforts in data creation. Ultimately, the linked datasets may spur novel ideas, facilitate reproducibility and re-use in new applications, stimulate combinatorial creativity, and foster collaborations across institutions.

  3. Discovery and Reuse of Open Datasets: An Exploratory Study

    Sara

    2016-07-01

    Full Text Available Objective: This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories. Methods: Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric. The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description. Results: Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories. Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers. The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates. Conclusions: The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets. Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

  4. Fluxnet Synthesis Dataset Collaboration Infrastructure

    Agarwal, Deborah A. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Humphrey, Marty [Univ. of Virginia, Charlottesville, VA (United States); van Ingen, Catharine [Microsoft. San Francisco, CA (United States); Beekwilder, Norm [Univ. of Virginia, Charlottesville, VA (United States); Goode, Monte [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jackson, Keith [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Rodriguez, Matt [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Weber, Robin [Univ. of California, Berkeley, CA (United States)

    2008-02-06

    The Fluxnet synthesis dataset originally compiled for the La Thuile workshop contained approximately 600 site years. Since the workshop, several additional site years have been added and the dataset now contains over 920 site years from over 240 sites. A data refresh update is expected to increase those numbers in the next few months. The ancillary data describing the sites continues to evolve as well. There are on the order of 120 site contacts and 60proposals have been approved to use thedata. These proposals involve around 120 researchers. The size and complexity of the dataset and collaboration has led to a new approach to providing access to the data and collaboration support and the support team attended the workshop and worked closely with the attendees and the Fluxnet project office to define the requirements for the support infrastructure. As a result of this effort, a new website (http://www.fluxdata.org) has been created to provide access to the Fluxnet synthesis dataset. This new web site is based on a scientific data server which enables browsing of the data on-line, data download, and version tracking. We leverage database and data analysis tools such as OLAP data cubes and web reports to enable browser and Excel pivot table access to the data.

  5. The marker: A precious link between generations and a part of the long-term safety story

    Massart, Cecile

    2014-01-01

    High-level radioactive waste brings us face to face with a social, emotional, ethical, political and environmental situation at the heart of which lies the security of the living world. From now on, mankind has to make commitments to protect itself. Building an artistic device, a continuous creation that links and reveals the situation will inform and shed some light on the objectives. In co-operation with industries that manage geological repositories, the installation of markers above ground introduces non-technical aspects that can increase safety. In the art world, many works of art produced throughout the 20. century associate art with waste as vestiges for keeping. The use of waste has shaken the nature of art since Marcel Duchamp, the eye of Man Ray as well as the definition of the artist. The waste recycling industry has modified the way we see it. For what is waste? It embodies above all the imperfect and also a 'time capsule'. In the case that concerns us, waste stays buried in our doubt, secured by our current financial means that are substantial. The notion of time for its decay is calculated and, at the end, it vanishes. Its visibility is next to nil, whereas its presence remains very powerful and thrills the imagination. The hosting community is permeated with it. At the same time the laboratory at the disposal facility provides economic dynamism, an analysis of upheavals in this territory, an exhibition of artistic devices and an awareness of danger. The proposals that are described here are the result of a decade of research. It seems that from the beginning there has been negligence on the part of the nuclear industry regarding the storage of high-level radioactive waste. The general public is not very knowledgeable about the subject. Projects for geological disposal often give the local population the feeling of an ablation of a piece of territory. To sum it up, in the long run, we consider the study of two barriers for the safety of high

  6. EPIC Forest LAI Dataset: LAI estimates generated from the USDA Environmental Policy Impact Climate (EPIC) model (a widely used, field-scale, biogeochemical model) on four forest complexes spanning three physiographic provinces in VA and NC.

    U.S. Environmental Protection Agency — This data depicts calculated and validated LAI estimates generated from the USDA Environmental Policy Impact Climate (EPIC) model (a widely used, field-scale,...

  7. Generation of integration-free induced pluripotent stem cell lines derived from two patients with X-linked Alport syndrome (XLAS).

    Kuebler, Bernd; Aran, Begoña; Miquel-Serra, Laia; Muñoz, Yolanda; Ars, Elisabet; Bullich, Gemma; Furlano, Monica; Torra, Roser; Marti, Merce; Veiga, Anna; Raya, Angel

    2017-12-01

    Skin biopsies were obtained from two male patients with X-linked Alport syndrome (XLAS) with hemizygous COL4A5 mutations in exon 41 or exon 46. Dermal fibroblasts were extracted and reprogrammed by nucleofection with episomal plasmids carrying OCT3/4, SOX2, KLF4 LIN28, L-MYC and p53 shRNA. The generated induced Pluripotent Stem Cell (iPSC) lines AS-FiPS2-Ep6F-28 and AS-FiPS3-Ep6F-9 were free of genomically integrated reprogramming genes, had the specific mutations, a stable karyotype, expressed pluripotency markers and generated embryoid bodies which were differentiated towards the three germ layers in vitro. These iPSC lines offer a useful resource to study Alport syndrome pathomechanisms and drug testing. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  8. CERC Dataset (Full Hadza Data)

    2016-01-01

    The dataset includes demographic, behavioral, and religiosity data from eight different populations from around the world. The samples were drawn from: (1) Coastal and (2) Inland Tanna, Vanuatu; (3) Hadzaland, Tanzania; (4) Lovu, Fiji; (5) Pointe aux Piment, Mauritius; (6) Pesqueiro, Brazil; (7......) Kyzyl, Tyva Republic; and (8) Yasawa, Fiji. Related publication: Purzycki, et al. (2016). Moralistic Gods, Supernatural Punishment and the Expansion of Human Sociality. Nature, 530(7590): 327-330....

  9. PHYSICS PERFORMANCE AND DATASET (PPD)

    L. Silvestris

    2013-01-01

    The first part of the Long Shutdown period has been dedicated to the preparation of the samples for the analysis targeting the summer conferences. In particular, the 8 TeV data acquired in 2012, including most of the “parked datasets”, have been reconstructed profiting from improved alignment and calibration conditions for all the sub-detectors. A careful planning of the resources was essential in order to deliver the datasets well in time to the analysts, and to schedule the update of all the conditions and calibrations needed at the analysis level. The newly reprocessed data have undergone detailed scrutiny by the Dataset Certification team allowing to recover some of the data for analysis usage and further improving the certification efficiency, which is now at 91% of the recorded luminosity. With the aim of delivering a consistent dataset for 2011 and 2012, both in terms of conditions and release (53X), the PPD team is now working to set up a data re-reconstruction and a new MC pro...

  10. Development of a SPARK Training Dataset

    Sayre, Amanda M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Olson, Jarrod R. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2015-03-01

    In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the

  11. Development of a SPARK Training Dataset

    Sayre, Amanda M.; Olson, Jarrod R.

    2015-01-01

    In its first five years, the National Nuclear Security Administration's (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK's intended analysis capability. The analysis demonstration sought to answer

  12. A Link between Nano- and Classical Thermodynamics: Dissipation Analysis (The Entropy Generation Approach in Nano-Thermodynamics

    Umberto Lucia

    2015-03-01

    Full Text Available The interest in designing nanosystems is continuously growing. Engineers apply a great number of optimization methods to design macroscopic systems. If these methods could be introduced into the design of small systems, a great improvement in nanotechnologies could be achieved. To do so, however, it is necessary to extend classical thermodynamic analysis to small systems, but irreversibility is also present in small systems, as the Loschmidt paradox highlighted. Here, the use of the recent improvement of the Gouy-Stodola theorem to complex systems (GSGL approach, based on the use of entropy generation, is suggested to obtain the extension of classical thermodynamics to nanothermodynamics. The result is a new approach to nanosystems which avoids the difficulties highlighted in the usual analysis of the small systems, such as the definition of temperature for nanosystems.

  13. Ill-defined problem solving in amnestic mild cognitive impairment: linking episodic memory to effective solution generation.

    Sheldon, S; Vandermorris, S; Al-Haj, M; Cohen, S; Winocur, G; Moscovitch, M

    2015-02-01

    It is well accepted that the medial temporal lobes (MTL), and the hippocampus specifically, support episodic memory processes. Emerging evidence suggests that these processes also support the ability to effectively solve ill-defined problems which are those that do not have a set routine or solution. To test the relation between episodic memory and problem solving, we examined the ability of individuals with single domain amnestic mild cognitive impairment (aMCI), a condition characterized by episodic memory impairment, to solve ill-defined social problems. Participants with aMCI and age and education matched controls were given a battery of tests that included standardized neuropsychological measures, the Autobiographical Interview (Levine et al., 2002) that scored for episodic content in descriptions of past personal events, and a measure of ill-defined social problem solving. Corroborating previous findings, the aMCI group generated less episodically rich narratives when describing past events. Individuals with aMCI also generated less effective solutions when solving ill-defined problems compared to the control participants. Correlation analyses demonstrated that the ability to recall episodic elements from autobiographical memories was positively related to the ability to effectively solve ill-defined problems. The ability to solve these ill-defined problems was related to measures of activities of daily living. In conjunction with previous reports, the results of the present study point to a new functional role of episodic memory in ill-defined goal-directed behavior and other non-memory tasks that require flexible thinking. Our findings also have implications for the cognitive and behavioural profile of aMCI by suggesting that the ability to effectively solve ill-defined problems is related to sustained functional independence. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. RARD: The Related-Article Recommendation Dataset

    Beel, Joeran; Carevic, Zeljko; Schaible, Johann; Neusch, Gabor

    2017-01-01

    Recommender-system datasets are used for recommender-system evaluations, training machine-learning algorithms, and exploring user behavior. While there are many datasets for recommender systems in the domains of movies, books, and music, there are rather few datasets from research-paper recommender systems. In this paper, we introduce RARD, the Related-Article Recommendation Dataset, from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains ...

  15. LINK2009 Phase 1: Development of 2. generation fuel cell vehicles and hydrogen refueling station. Final report; LINK2009 fase 1: Udvikling af 2. gen. braendselscelle koeretoejer og brinttankstation. Slutrapport

    2010-03-15

    LINK2009 project was to develop 2nd gen. technologies fuel cell systems for vehicles and 350bar hydrogen refueling stations. Also the LINK2009 project were to ensure a continuously positioning of Denmark and the Scandinavian Region within hydrogen for transport and continue to attract international car manufacturers to conduct demonstration and later market introduction in the region. The LINK2009 project is divided in two phases where this first phase only deals with the development of the 2nd generation technologies, whereas the following phase 2 will include the demonstration hereof as well as additional research activities. This Report describes the results of the phase 1 that was commenced in summer 2008 and ended in late 2009. Phase 1 has resulted in the development of new 2nd generation fuel cell technology for use in a city car and a service vehicle. Stated targets for price and efficiency have been reached and the following demonstration in Phase 2 is to confirm reaching of life time targets. The efficiency of the fuel cell system for the city car has been measured to be 42-48% at a power delivery of respectively 10kW and 2kW, which is significantly above the target of >40%. System simplifications and selection of new components have enabled a 50% reduction in the kW price for the fuel cell system, including 700bar hydrogen storage, now totalling Euro 4.500/kW. This creates sufficient basis for conducting demonstration of the system in vehicles. 9 vehicles are planned to be demonstrated in the following phase 2. Additional 8 vehicles were put in operation in Copenhagen in November 2009. Phase 1 has conducted development of 2nd gen. hydrogen refuelling technology that has resulted in concepts for both 350bar and 700bar refuelling as well as a concept for onsite hydrogen production at refuelling stations. In separate projects the developed 350bar technology has been brought to use in a newly established hydrogen station in Copenhagen, and the hydrogen

  16. Sparse Group Penalized Integrative Analysis of Multiple Cancer Prognosis Datasets

    Liu, Jin; Huang, Jian; Xie, Yang; Ma, Shuangge

    2014-01-01

    SUMMARY In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Because of the “large d, small n” characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyzes multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group MCP (minimax concave penalty) approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach. PMID:23938111

  17. Heuristics for Relevancy Ranking of Earth Dataset Search Results

    Lynnes, Christopher; Quinn, Patrick; Norton, James

    2016-01-01

    As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.

  18. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 Streams and Catchments (Version 2.1) for the Conterminous United States: 2006 National Land Cover Database Agricultural Land Cover on Slopes

    U.S. Environmental Protection Agency — This dataset represents data derived from the NLCD dataset and the National Hydrography Dataset version 2.1(NHDPlusV2) (see Data Sources for links to NHDPlusV2 data...

  19. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments Riparian Buffer for the Conterminous United States - 2011 National Land Cover Database

    U.S. Environmental Protection Agency — This dataset represents data derived from the NLCD dataset and the National Hydrography Dataset version 2.1(NHDPlusV2) (see Data Sources for links to NHDPlusV2 data...

  20. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 Streams and Catchments (Version 2.1) for the Conterminous United States: 2006 National Land Cover Database Riparian Zones

    U.S. Environmental Protection Agency — This dataset represents data derived from the NLCD dataset and the National Hydrography Dataset version 2.1(NHDPlusV2) (see Data Sources for links to NHDPlusV2 data...

  1. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments Riparian Buffer for the Conterminous United States: 2001 National Land Cover Database Impervious Surfaces

    U.S. Environmental Protection Agency — This dataset represents data derived from the NLCD dataset and the National Hydrography Dataset version 2.1(NHDPlusV2) (see Data Sources for links to NHDPlusV2 data...

  2. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments Riparian Buffer for the Conterminous United States: 2006 National Land Cover Database Impervious Surfaces

    U.S. Environmental Protection Agency — This dataset represents data derived from the NLCD dataset and the National Hydrography Dataset version 2.1(NHDPlusV2) (see Data Sources for links to NHDPlusV2 data...

  3. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments for the Conterminous United States: 2011 National Land Cover Database Impervious Surfaces

    U.S. Environmental Protection Agency — This dataset represents data derived from the NLCD dataset and the National Hydrography Dataset version 2.1(NHDPlusV2) (see Data Sources for links to NHDPlusV2 data...

  4. A link between interferon and augmented plasmin generation in exocrine gland damage in Sjögren's syndrome.

    Gliozzi, Maria; Greenwell-Wild, Teresa; Jin, Wenwen; Moutsopoulos, Niki M; Kapsogeorgou, Efstathia; Moutsopoulos, Haralampos M; Wahl, Sharon M

    2013-02-01

    Sjögren's syndrome is an autoimmune disease that targets exocrine glands, but often exhibits systemic manifestations. Infiltration of the salivary and lacrimal glands by lymphoid and myeloid cells orchestrates a perpetuating immune response leading to exocrine gland damage and dysfunction. Th1 and Th17 lymphocyte populations and their products recruit additional lymphocytes, including B cells, but also large numbers of macrophages, which accumulate with disease progression. In addition to cytokines, chemokines, chitinases, and lipid mediators, macrophages contribute to a proteolytic milieu, underlying tissue destruction, inappropriate repair, and compromised glandular functions. Among the proteases enhanced in this local environment are matrix metalloproteases (MMP) and plasmin, generated by plasminogen activation, dependent upon plasminogen activators, such as tissue plasminogen activator (tPA). Not previously associated with salivary gland pathology, our evidence implicates enhanced tPA in the context of inflamed salivary glands revolving around lymphocyte-mediated activation of macrophages. Tracking down the mechanism of macrophage plasmin activation, the cytokines IFNγ and to a lesser extent, IFNα, via Janus kinase (JAK) and signal transducer and activator of transcription (STAT) activation, were found to be pivotal for driving the plasmin cascade of proteolytic events culminating in perpetuation of the inflammation and tissue damage, and suggesting intervention strategies to blunt irreversible tissue destruction. Published by Elsevier Ltd.

  5. Quantifying uncertainty in observational rainfall datasets

    Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

    2015-04-01

    rainfall datasets available over Africa on monthly, daily and sub-daily time scales as appropriate to quantify spatial and temporal differences between the datasets. We find regional wet and dry biases between datasets (using the ensemble mean as a reference) with generally larger biases in reanalysis products. Rainfall intensity is poorly represented in some datasets which demonstrates some datasets should not be used for rainfall intensity analyses. Using 10 CORDEX models we show in east Africa that the spread between observed datasets is often similar to the spread between models. We recommend that specific observational rainfall datasets datasets be used for specific investigations and also that where many datasets are applicable to an investigation, a probabilistic view be adopted for rainfall studies over Africa. Endris, H. S., P. Omondi, S. Jain, C. Lennard, B. Hewitson, L. Chang'a, J. L. Awange, A. Dosio, P. Ketiem, G. Nikulin, H-J. Panitz, M. Büchner, F. Stordal, and L. Tazalika (2013) Assessment of the Performance of CORDEX Regional Climate Models in Simulating East African Rainfall. J. Climate, 26, 8453-8475. DOI: 10.1175/JCLI-D-12-00708.1 Gbobaniyi, E., A. Sarr, M. B. Sylla, I. Diallo, C. Lennard, A. Dosio, A. Dhie ?diou, A. Kamga, N. A. B. Klutse, B. Hewitson, and B. Lamptey (2013) Climatology, annual cycle and interannual variability of precipitation and temperature in CORDEX simulations over West Africa. Int. J. Climatol., DOI: 10.1002/joc.3834 Hernández-Díaz, L., R. Laprise, L. Sushama, A. Martynov, K. Winger, and B. Dugas (2013) Climate simulation over CORDEX Africa domain using the fifth-generation Canadian Regional Climate Model (CRCM5). Clim. Dyn. 40, 1415-1433. DOI: 10.1007/s00382-012-1387-z Kalognomou, E., C. Lennard, M. Shongwe, I. Pinto, A. Favre, M. Kent, B. Hewitson, A. Dosio, G. Nikulin, H. Panitz, and M. Büchner (2013) A diagnostic evaluation of precipitation in CORDEX models over southern Africa. Journal of Climate, 26, 9477-9506. DOI:10

  6. US AMLR Program zooplankton dataset

    National Oceanic and Atmospheric Administration, Department of Commerce — Zooplankton research in the US AMLR Program focuses on the link between prey production, availability, and climate variability in relation to predator and fishery...

  7. Passive Containment DataSet

    This data is for Figures 6 and 7 in the journal article. The data also includes the two EPANET input files used for the analysis described in the paper, one for the looped system and one for the block system.This dataset is associated with the following publication:Grayman, W., R. Murray , and D. Savic. Redesign of Water Distribution Systems for Passive Containment of Contamination. JOURNAL OF THE AMERICAN WATER WORKS ASSOCIATION. American Water Works Association, Denver, CO, USA, 108(7): 381-391, (2016).

  8. The CMS dataset bookkeeping service

    Afaq, A.; Dolgert, A.; Guo, Y.; Jones, C.; Kosyakov, S.; Kuznetsov, V.; Lueking, L.; Riley, D.; Sekhri, V.

    2008-07-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  9. The CMS dataset bookkeeping service

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V [Fermilab, Batavia, Illinois 60510 (United States); Dolgert, A; Jones, C; Kuznetsov, V; Riley, D [Cornell University, Ithaca, New York 14850 (United States)

    2008-07-15

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  10. The CMS dataset bookkeeping service

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V; Dolgert, A; Jones, C; Kuznetsov, V; Riley, D

    2008-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  11. The CMS dataset bookkeeping service

    Afaq, Anzar; Dolgert, Andrew; Guo, Yuyi; Jones, Chris; Kosyakov, Sergey; Kuznetsov, Valentin; Lueking, Lee; Riley, Dan; Sekhri, Vijay

    2007-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  12. MIPS bacterial genomes functional annotation benchmark dataset.

    Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

    2005-05-15

    Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab

  13. 2008 TIGER/Line Nationwide Dataset

    California Natural Resource Agency — This dataset contains a nationwide build of the 2008 TIGER/Line datasets from the US Census Bureau downloaded in April 2009. The TIGER/Line Shapefiles are an extract...

  14. Satellite-Based Precipitation Datasets

    Munchak, S. J.; Huffman, G. J.

    2017-12-01

    Of the possible sources of precipitation data, those based on satellites provide the greatest spatial coverage. There is a wide selection of datasets, algorithms, and versions from which to choose, which can be confusing to non-specialists wishing to use the data. The International Precipitation Working Group (IPWG) maintains tables of the major publicly available, long-term, quasi-global precipitation data sets (http://www.isac.cnr.it/ ipwg/data/datasets.html), and this talk briefly reviews the various categories. As examples, NASA provides two sets of quasi-global precipitation data sets: the older Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA) and current Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (GPM) mission (IMERG). Both provide near-real-time and post-real-time products that are uniformly gridded in space and time. The TMPA products are 3-hourly 0.25°x0.25° on the latitude band 50°N-S for about 16 years, while the IMERG products are half-hourly 0.1°x0.1° on 60°N-S for over 3 years (with plans to go to 16+ years in Spring 2018). In addition to the precipitation estimates, each data set provides fields of other variables, such as the satellite sensor providing estimates and estimated random error. The discussion concludes with advice about determining suitability for use, the necessity of being clear about product names and versions, and the need for continued support for satellite- and surface-based observation.

  15. A curated transcriptome dataset collection to investigate the functional programming of human hematopoietic cells in early life.

    Rahman, Mahbuba; Boughorbel, Sabri; Presnell, Scott; Quinn, Charlie; Cugno, Chiara; Chaussabel, Damien; Marr, Nico

    2016-01-01

    Compendia of large-scale datasets made available in public repositories provide an opportunity to identify and fill gaps in biomedical knowledge. But first, these data need to be made readily accessible to research investigators for interpretation. Here we make available a collection of transcriptome datasets to investigate the functional programming of human hematopoietic cells in early life. Thirty two datasets were retrieved from the NCBI Gene Expression Omnibus (GEO) and loaded in a custom web application called the Gene Expression Browser (GXB), which was designed for interactive query and visualization of integrated large-scale data. Quality control checks were performed. Multiple sample groupings and gene rank lists were created allowing users to reveal age-related differences in transcriptome profiles, changes in the gene expression of neonatal hematopoietic cells to a variety of immune stimulators and modulators, as well as during cell differentiation. Available demographic, clinical, and cell phenotypic information can be overlaid with the gene expression data and used to sort samples. Web links to customized graphical views can be generated and subsequently inserted in manuscripts to report novel findings. GXB also enables browsing of a single gene across projects, thereby providing new perspectives on age- and developmental stage-specific expression of a given gene across the human hematopoietic system. This dataset collection is available at: http://developmentalimmunology.gxbsidra.org/dm3/geneBrowser/list.

  16. Invitation to a forum: architecting operational `next generation' earth monitoring satellites based on best modeling, existing sensor capabilities, with constellation efficiencies to secure trusted datasets for the next 20 years

    Helmuth, Douglas B.; Bell, Raymond M.; Grant, David A.; Lentz, Christopher A.

    2012-09-01

    Architecting the operational Next Generation of earth monitoring satellites based on matured climate modeling, reuse of existing sensor & satellite capabilities, attention to affordability and evolutionary improvements integrated with constellation efficiencies - becomes our collective goal for an open architectural design forum. Understanding the earth's climate and collecting requisite signatures over the next 30 years is a shared mandate by many of the world's governments. But there remains a daunting challenge to bridge scientific missions to 'operational' systems that truly support the demands of decision makers, scientific investigators and global users' requirements for trusted data. In this paper we will suggest an architectural structure that takes advantage of current earth modeling examples including cross-model verification and a first order set of critical climate parameters and metrics; that in turn, are matched up with existing space borne collection capabilities and sensors. The tools used and the frameworks offered are designed to allow collaborative overlays by other stakeholders nominating different critical parameters and their own treaded connections to existing international collection experience. These aggregate design suggestions will be held up to group review and prioritized as potential constellation solutions including incremental and spiral developments - including cost benefits and organizational opportunities. This Part IV effort is focused on being an inclusive 'Next Gen Constellation' design discussion and is the natural extension to earlier papers.

  17. Comparision of analysis of the QTLMAS XII common dataset

    Lund, Mogens Sandø; Sahana, Goutam; de Koning, Dirk-Jan

    2009-01-01

    A dataset was simulated and distributed to participants of the QTLMAS XII workshop who were invited to develop genomic selection models. Each contributing group was asked to describe the model development and validation as well as to submit genomic predictions for three generations of individuals...

  18. Statistical analysis of the turbulent Reynolds stress and its link to the shear flow generation in a cylindrical laboratory plasma device

    Yan, Z.; Yu, J. H.; Holland, C.; Xu, M.; Mueller, S. H.; Tynan, G. R.

    2008-01-01

    The statistical properties of the turbulent Reynolds stress arising from collisional drift turbulence in a magnetized plasma column are studied and a physical picture of turbulent driven shear flow generation is discussed. The Reynolds stress peaks near the maximal density gradient region, and is governed by the turbulence amplitude and cross-phase between the turbulent radial and azimuthal velocity fields. The amplitude probability distribution function (PDF) of the turbulent Reynolds stress is non-Gaussian and positively skewed at the density gradient maximum. The turbulent ion-saturation (Isat) current PDF shows that the region where the bursty Isat events are born coincides with the positively skewed non-Gaussian Reynolds stress PDF, which suggests that the bursts of particle transport appear to be associated with bursts of momentum transport as well. At the shear layer the density fluctuation radial correlation length has a strong minimum (∼4-6 mm∼0.5C s /Ω ci , where C s is the ion acoustic speed and Ω ci is the ion gyrofrequency), while the azimuthal turbulence correlation length is nearly constant across the shear layer. The results link the behavior of the Reynolds stress, its statistical properties, generation of bursty radially going azimuthal momentum transport events, and the formation of the large-scale shear layer.

  19. RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system

    Jensen, Tue Vissing; Pinson, Pierre

    2017-01-01

    , we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven...... to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecastingof renewable power generation....

  20. Link Label Prediction in Signed Citation Network

    Akujuobi, Uchenna

    2016-04-12

    Link label prediction is the problem of predicting the missing labels or signs of all the unlabeled edges in a network. For signed networks, these labels can either be positive or negative. In recent years, different algorithms have been proposed such as using regression, trust propagation and matrix factorization. These approaches have tried to solve the problem of link label prediction by using ideas from social theories, where most of them predict a single missing label given that labels of other edges are known. However, in most real-world social graphs, the number of labeled edges is usually less than that of unlabeled edges. Therefore, predicting a single edge label at a time would require multiple runs and is more computationally demanding. In this thesis, we look at link label prediction problem on a signed citation network with missing edge labels. Our citation network consists of papers from three major machine learning and data mining conferences together with their references, and edges showing the relationship between them. An edge in our network is labeled either positive (dataset relevant) if the reference is based on the dataset used in the paper or negative otherwise. We present three approaches to predict the missing labels. The first approach converts the label prediction problem into a standard classification problem. We then, generate a set of features for each edge and then adopt Support Vector Machines in solving the classification problem. For the second approach, we formalize the graph such that the edges are represented as nodes with links showing similarities between them. We then adopt a label propagation method to propagate the labels on known nodes to those with unknown labels. In the third approach, we adopt a PageRank approach where we rank the nodes according to the number of incoming positive and negative edges, after which we set a threshold. Based on the ranks, we can infer an edge would be positive if it goes a node above the

  1. Estimating parameters for probabilistic linkage of privacy-preserved datasets.

    Brown, Adrian P; Randall, Sean M; Ferrante, Anna M; Semmens, James B; Boyd, James H

    2017-07-10

    Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. The linkage strategy and associated match probabilities are often estimated through investigations into data quality and manual inspection. However, as privacy-preserved datasets comprise encrypted data, such methods are not possible. In this paper, we present a method for estimating the probabilities and threshold values for probabilistic privacy-preserved record linkage using Bloom filters. Our method was tested through a simulation study using synthetic data, followed by an application using real-world administrative data. Synthetic datasets were generated with error rates from zero to 20% error. Our method was used to estimate parameters (probabilities and thresholds) for de-duplication linkages. Linkage quality was determined by F-measure. Each dataset was privacy-preserved using separate Bloom filters for each field. Match probabilities were estimated using the expectation-maximisation (EM) algorithm on the privacy-preserved data. Threshold cut-off values were determined by an extension to the EM algorithm allowing linkage quality to be estimated for each possible threshold. De-duplication linkages of each privacy-preserved dataset were performed using both estimated and calculated probabilities. Linkage quality using the F-measure at the estimated threshold values was also compared to the highest F-measure. Three large administrative datasets were used to demonstrate the applicability of the probability and threshold estimation technique on real-world data. Linkage of the synthetic datasets using the estimated probabilities produced an F-measure that was comparable to the F-measure using calculated probabilities, even with up to 20% error. Linkage of the administrative datasets using estimated probabilities produced an F-measure that was higher

  2. PHYSICS PERFORMANCE AND DATASET (PPD)

    L. Silvestris

    2012-01-01

      Introduction The first part of the year presented an important test for the new Physics Performance and Dataset (PPD) group (cf. its mandate: http://cern.ch/go/8f77). The activity was focused on the validation of the new releases meant for the Monte Carlo (MC) production and the data-processing in 2012 (CMSSW 50X and 52X), and on the preparation of the 2012 operations. In view of the Chamonix meeting, the PPD and physics groups worked to understand the impact of the higher pile-up scenario on some of the flagship Higgs analyses to better quantify the impact of the high luminosity on the CMS physics potential. A task force is working on the optimisation of the reconstruction algorithms and on the code to cope with the performance requirements imposed by the higher event occupancy as foreseen for 2012. Concerning the preparation for the analysis of the new data, a new MC production has been prepared. The new samples, simulated at 8 TeV, are already being produced and the digitisation and recons...

  3. Pattern Analysis On Banking Dataset

    Amritpal Singh

    2015-06-01

    Full Text Available Abstract Everyday refinement and development of technology has led to an increase in the competition between the Tech companies and their going out of way to crack the system andbreak down. Thus providing Data mining a strategically and security-wise important area for many business organizations including banking sector. It allows the analyzes of important information in the data warehouse and assists the banks to look for obscure patterns in a group and discover unknown relationship in the data.Banking systems needs to process ample amount of data on daily basis related to customer information their credit card details limit and collateral details transaction details risk profiles Anti Money Laundering related information trade finance data. Thousands of decisionsbased on the related data are taken in a bank daily. This paper analyzes the banking dataset in the weka environment for the detection of interesting patterns based on its applications ofcustomer acquisition customer retention management and marketing and management of risk fraudulence detections.

  4. PHYSICS PERFORMANCE AND DATASET (PPD)

    L. Silvestris

    2013-01-01

    The PPD activities, in the first part of 2013, have been focused mostly on the final physics validation and preparation for the data reprocessing of the full 8 TeV datasets with the latest calibrations. These samples will be the basis for the preliminary results for summer 2013 but most importantly for the final publications on the 8 TeV Run 1 data. The reprocessing involves also the reconstruction of a significant fraction of “parked data” that will allow CMS to perform a whole new set of precision analyses and searches. In this way the CMSSW release 53X is becoming the legacy release for the 8 TeV Run 1 data. The regular operation activities have included taking care of the prolonged proton-proton data taking and the run with proton-lead collisions that ended in February. The DQM and Data Certification team has deployed a continuous effort to promptly certify the quality of the data. The luminosity-weighted certification efficiency (requiring all sub-detectors to be certified as usab...

  5. EEG datasets for motor imagery brain-computer interface.

    Cho, Hohyun; Ahn, Minkyu; Ahn, Sangtae; Kwon, Moonyoung; Jun, Sung Chan

    2017-07-01

    Most investigators of brain-computer interface (BCI) research believe that BCI can be achieved through induced neuronal activity from the cortex, but not by evoked neuronal activity. Motor imagery (MI)-based BCI is one of the standard concepts of BCI, in that the user can generate induced activity by imagining motor movements. However, variations in performance over sessions and subjects are too severe to overcome easily; therefore, a basic understanding and investigation of BCI performance variation is necessary to find critical evidence of performance variation. Here we present not only EEG datasets for MI BCI from 52 subjects, but also the results of a psychological and physiological questionnaire, EMG datasets, the locations of 3D EEG electrodes, and EEGs for non-task-related states. We validated our EEG datasets by using the percentage of bad trials, event-related desynchronization/synchronization (ERD/ERS) analysis, and classification analysis. After conventional rejection of bad trials, we showed contralateral ERD and ipsilateral ERS in the somatosensory area, which are well-known patterns of MI. Finally, we showed that 73.08% of datasets (38 subjects) included reasonably discriminative information. Our EEG datasets included the information necessary to determine statistical significance; they consisted of well-discriminated datasets (38 subjects) and less-discriminative datasets. These may provide researchers with opportunities to investigate human factors related to MI BCI performance variation, and may also achieve subject-to-subject transfer by using metadata, including a questionnaire, EEG coordinates, and EEGs for non-task-related states. © The Authors 2017. Published by Oxford University Press.

  6. Connective tissue growth factor linked to the E7 tumor antigen generates potent antitumor immune responses mediated by an antiapoptotic mechanism.

    Cheng, W-F; Chang, M-C; Sun, W-Z; Lee, C-N; Lin, H-W; Su, Y-N; Hsieh, C-Y; Chen, C-A

    2008-07-01

    A novel method for generating an antigen-specific cancer vaccine and immunotherapy has emerged using a DNA vaccine. However, antigen-presenting cells (APCs) have a limited life span, which hinders their long-term ability to prime antigen-specific T cells. Connective tissue growth factor (CTGF) has a role in cell survival. This study explored the intradermal administration of DNA encoding CTGF with a model tumor antigen, human papilloma virus type 16 E7. Mice vaccinated with CTGF/E7 DNA exhibited a dramatic increase in E7-specific CD4(+) and CD8(+) T-cell precursors. They also showed an impressive antitumor effect against E7-expressing tumors compared with mice vaccinated with the wild-type E7 DNA. The delivery of DNA encoding CTGF and E7 or CTGF alone could prolong the survival of transduced dendritic cells (DCs) in vivo. In addition, CTGF/E7-transduced DCs could enhance a higher number of E7-specific CD8(+) T cells than E7-transduced DCs. By prolonging the survival of APCs, DNA vaccine encoding CTGF linked to a tumor antigen represents an innovative approach to enhance DNA vaccine potency and holds promise for cancer prophylaxis and immunotherapy.

  7. The Geometry of Finite Equilibrium Datasets

    Balasko, Yves; Tvede, Mich

    We investigate the geometry of finite datasets defined by equilibrium prices, income distributions, and total resources. We show that the equilibrium condition imposes no restrictions if total resources are collinear, a property that is robust to small perturbations. We also show that the set...... of equilibrium datasets is pathconnected when the equilibrium condition does impose restrictions on datasets, as for example when total resources are widely non collinear....

  8. Comparison of CORA and EN4 in-situ datasets validation methods, toward a better quality merged dataset.

    Szekely, Tanguy; Killick, Rachel; Gourrion, Jerome; Reverdin, Gilles

    2017-04-01

    CORA and EN4 are both global delayed time mode validated in-situ ocean temperature and salinity datasets distributed by the Met Office (http://www.metoffice.gov.uk/) and Copernicus (www.marine.copernicus.eu). A large part of the profiles distributed by CORA and EN4 in recent years are Argo profiles from the ARGO DAC, but profiles are also extracted from the World Ocean Database and TESAC profiles from GTSPP. In the case of CORA, data coming from the EUROGOOS Regional operationnal oserving system( ROOS) operated by European institutes no managed by National Data Centres and other datasets of profiles povided by scientific sources can also be found (Sea mammals profiles from MEOP, XBT datasets from cruises ...). (EN4 also takes data from the ASBO dataset to supplement observations in the Arctic). First advantage of this new merge product is to enhance the space and time coverage at global and european scales for the period covering 1950 till a year before the current year. This product is updated once a year and T&S gridded fields are alos generated for the period 1990-year n-1. The enhancement compared to the revious CORA product will be presented Despite the fact that the profiles distributed by both datasets are mostly the same, the quality control procedures developed by the Met Office and Copernicus teams differ, sometimes leading to different quality control flags for the same profile. Started in 2016 a new study started that aims to compare both validation procedures to move towards a Copernicus Marine Service dataset with the best features of CORA and EN4 validation.A reference data set composed of the full set of in-situ temperature and salinity measurements collected by Coriolis during 2015 is used. These measurements have been made thanks to wide range of instruments (XBTs, CTDs, Argo floats, Instrumented sea mammals,...), covering the global ocean. The reference dataset has been validated simultaneously by both teams.An exhaustive comparison of the

  9. IPCC Socio-Economic Baseline Dataset

    National Aeronautics and Space Administration — The Intergovernmental Panel on Climate Change (IPCC) Socio-Economic Baseline Dataset consists of population, human development, economic, water resources, land...

  10. Veterans Affairs Suicide Prevention Synthetic Dataset

    Department of Veterans Affairs — The VA's Veteran Health Administration, in support of the Open Data Initiative, is providing the Veterans Affairs Suicide Prevention Synthetic Dataset (VASPSD). The...

  11. Nanoparticle-organic pollutant interaction dataset

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  12. An Annotated Dataset of 14 Meat Images

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  13. An Automatic Matcher and Linker for Transportation Datasets

    Ali Masri

    2017-01-01

    Full Text Available Multimodality requires the integration of heterogeneous transportation data to construct a broad view of the transportation network. Many new transportation services are emerging while being isolated from previously-existing networks. This leads them to publish their data sources to the web, according to linked data principles, in order to gain visibility. Our interest is to use these data to construct an extended transportation network that links these new services to existing ones. The main problems we tackle in this article fall in the categories of automatic schema matching and data interlinking. We propose an approach that uses web services as mediators to help in automatically detecting geospatial properties and mapping them between two different schemas. On the other hand, we propose a new interlinking approach that enables the user to define rich semantic links between datasets in a flexible and customizable way.

  14. ASSESSING SMALL SAMPLE WAR-GAMING DATASETS

    W. J. HURLEY

    2013-10-01

    Full Text Available One of the fundamental problems faced by military planners is the assessment of changes to force structure. An example is whether to replace an existing capability with an enhanced system. This can be done directly with a comparison of measures such as accuracy, lethality, survivability, etc. However this approach does not allow an assessment of the force multiplier effects of the proposed change. To gauge these effects, planners often turn to war-gaming. For many war-gaming experiments, it is expensive, both in terms of time and dollars, to generate a large number of sample observations. This puts a premium on the statistical methodology used to examine these small datasets. In this paper we compare the power of three tests to assess population differences: the Wald-Wolfowitz test, the Mann-Whitney U test, and re-sampling. We employ a series of Monte Carlo simulation experiments. Not unexpectedly, we find that the Mann-Whitney test performs better than the Wald-Wolfowitz test. Resampling is judged to perform slightly better than the Mann-Whitney test.

  15. Design of an audio advertisement dataset

    Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

    2015-12-01

    Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.

  16. A novel marker for assessment of liver matrix remodeling: An enzyme-linked immunosorbent assay (ELISA) detecting a MMP generated type I collagen neo-epitope (C1M)

    Leeming, Diana Julie; He, Y.; Veidal, S. S.

    2011-01-01

    A competitive enzyme-linked immunosorbent assay (ELISA) for detection of a type I collagen fragment generated by matrix metalloproteinases (MMP) -2, -9 and -13, was developed (CO1-764 or C1M). The biomarker was evaluated in two preclinical rat models of liver fibrosis: bile duct ligation (BDL) an...

  17. Datasets related to in-land water for limnology and remote sensing applications: distance-to-land, distance-to-water, water-body identifier and lake-centre co-ordinates.

    Carrea, Laura; Embury, Owen; Merchant, Christopher J

    2015-11-01

    Datasets containing information to locate and identify water bodies have been generated from data locating static-water-bodies with resolution of about 300 m (1/360 ∘ ) recently released by the Land Cover Climate Change Initiative (LC CCI) of the European Space Agency. The LC CCI water-bodies dataset has been obtained from multi-temporal metrics based on time series of the backscattered intensity recorded by ASAR on Envisat between 2005 and 2010. The new derived datasets provide coherently: distance to land, distance to water, water-body identifiers and lake-centre locations. The water-body identifier dataset locates the water bodies assigning the identifiers of the Global Lakes and Wetlands Database (GLWD), and lake centres are defined for in-land waters for which GLWD IDs were determined. The new datasets therefore link recent lake/reservoir/wetlands extent to the GLWD, together with a set of coordinates which locates unambiguously the water bodies in the database. Information on distance-to-land for each water cell and the distance-to-water for each land cell has many potential applications in remote sensing, where the applicability of geophysical retrieval algorithms may be affected by the presence of water or land within a satellite field of view (image pixel). During the generation and validation of the datasets some limitations of the GLWD database and of the LC CCI water-bodies mask have been found. Some examples of the inaccuracies/limitations are presented and discussed. Temporal change in water-body extent is common. Future versions of the LC CCI dataset are planned to represent temporal variation, and this will permit these derived datasets to be updated.

  18. The Kinetics Human Action Video Dataset

    Kay, Will; Carreira, Joao; Simonyan, Karen; Zhang, Brian; Hillier, Chloe; Vijayanarasimhan, Sudheendra; Viola, Fabio; Green, Tim; Back, Trevor; Natsev, Paul; Suleyman, Mustafa; Zisserman, Andrew

    2017-01-01

    We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. We describe the statistics of the dataset, how it was collected, and give some ...

  19. A Dataset for Visual Navigation with Neuromorphic Methods

    Francisco eBarranco

    2016-02-01

    Full Text Available Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets.

  20. Linked data and user interaction

    Cervone, H Frank

    2015-01-01

    This collection of research papers provides extensive information on deploying services, concepts, and approaches for using open linked data from libraries and other cultural heritage institutions. With a special emphasis on how libraries and other cultural heritage institutions can create effective end user interfaces using open, linked data or other datasets. These papers are essential reading for any one interesting in user interface design or the semantic web.

  1. BASE MAP DATASET, LOS ANGELES COUNTY, CALIFORNIA

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  2. BASE MAP DATASET, CHEROKEE COUNTY, SOUTH CAROLINA

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  3. SIAM 2007 Text Mining Competition dataset

    National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...

  4. Harvard Aging Brain Study : Dataset and accessibility

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G.; Chatwal, Jasmeer P.; Papp, Kathryn V.; Amariglio, Rebecca E.; Blacker, Deborah; Rentz, Dorene M.; Johnson, Keith A.; Sperling, Reisa A.; Schultz, Aaron P.

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging.

  5. BASE MAP DATASET, HONOLULU COUNTY, HAWAII, USA

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  6. BASE MAP DATASET, EDGEFIELD COUNTY, SOUTH CAROLINA

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  7. Environmental Dataset Gateway (EDG) REST Interface

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  8. BASE MAP DATASET, INYO COUNTY, OKLAHOMA

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  9. BASE MAP DATASET, JACKSON COUNTY, OKLAHOMA

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  10. BASE MAP DATASET, SANTA CRIZ COUNTY, CALIFORNIA

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  11. Climate Prediction Center IR 4km Dataset

    National Oceanic and Atmospheric Administration, Department of Commerce — CPC IR 4km dataset was created from all available individual geostationary satellite data which have been merged to form nearly seamless global (60N-60S) IR...

  12. BASE MAP DATASET, MAYES COUNTY, OKLAHOMA, USA

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications: cadastral, geodetic control,...

  13. BASE MAP DATASET, KINGFISHER COUNTY, OKLAHOMA

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  14. What Is Linked Historical Data?

    Meroño-Peñuela, Albert; Hoekstra, Rinke; Janowicz, Krzysztof; Schlobach, Stefan; Lambrix, Patrick; Hyvönen, Eero

    2014-01-01

    Datasets that represent historical sources are relative new- comers in the Linked Open Data (LOD) cloud. Following the standard LOD practices for publishing historical sources raises several questions: how can we distinguish between RDF graphs of primary and secondary sources? Should we treat

  15. Comparison of recent SnIa datasets

    Sanchez, J.C. Bueno; Perivolaropoulos, L.; Nesseris, S.

    2009-01-01

    We rank the six latest Type Ia supernova (SnIa) datasets (Constitution (C), Union (U), ESSENCE (Davis) (E), Gold06 (G), SNLS 1yr (S) and SDSS-II (D)) in the context of the Chevalier-Polarski-Linder (CPL) parametrization w(a) = w 0 +w 1 (1−a), according to their Figure of Merit (FoM), their consistency with the cosmological constant (ΛCDM), their consistency with standard rulers (Cosmic Microwave Background (CMB) and Baryon Acoustic Oscillations (BAO)) and their mutual consistency. We find a significant improvement of the FoM (defined as the inverse area of the 95.4% parameter contour) with the number of SnIa of these datasets ((C) highest FoM, (U), (G), (D), (E), (S) lowest FoM). Standard rulers (CMB+BAO) have a better FoM by about a factor of 3, compared to the highest FoM SnIa dataset (C). We also find that the ranking sequence based on consistency with ΛCDM is identical with the corresponding ranking based on consistency with standard rulers ((S) most consistent, (D), (C), (E), (U), (G) least consistent). The ranking sequence of the datasets however changes when we consider the consistency with an expansion history corresponding to evolving dark energy (w 0 ,w 1 ) = (−1.4,2) crossing the phantom divide line w = −1 (it is practically reversed to (G), (U), (E), (S), (D), (C)). The SALT2 and MLCS2k2 fitters are also compared and some peculiar features of the SDSS-II dataset when standardized with the MLCS2k2 fitter are pointed out. Finally, we construct a statistic to estimate the internal consistency of a collection of SnIa datasets. We find that even though there is good consistency among most samples taken from the above datasets, this consistency decreases significantly when the Gold06 (G) dataset is included in the sample

  16. CEDAR : The Dutch Historical Censuses as Linked Open Data

    Meroño-Peñuela, Albert; Ashkpour, Ashkan; Guéret, Christophe; Schlobach, Stefan

    2017-01-01

    Here, we describe the CEDAR dataset, a five-star Linked Open Data representation of the Dutch historical censuses. These were conducted in the Netherlands once every 10 years from 1795 to 1971. We produce a linked dataset from a digitized sample of 2,288 tables. It contains more than 6.8 million

  17. Sextant: Visualizing time-evolving linked geospatial data

    C. Nikolaou (Charalampos); K. Dogani (Kallirroi); K. Bereta (Konstantina); G. Garbis (George); M. Karpathiotakis (Manos); K. Kyzirakos (Konstantinos); M. Koubarakis (Manolis)

    2015-01-01

    textabstractThe linked open data cloud is constantly evolving as datasets get continuously updated with newer versions. As a result, representing, querying, and visualizing the temporal dimension of linked data is crucial. This is especially important for geospatial datasets that form the backbone

  18. Geoseq: a tool for dissecting deep-sequencing datasets

    Homann Robert

    2010-10-01

    Full Text Available Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO, Sequence Read Archive (SRA hosted by the NCBI, or the DNA Data Bank of Japan (ddbj. Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a identify differential isoform expression in mRNA-seq datasets, b identify miRNAs (microRNAs in libraries, and identify mature and star sequences in miRNAS and c to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  19. A new dataset validation system for the Planetary Science Archive

    Manaud, N.; Zender, J.; Heather, D.; Martinez, S.

    2007-08-01

    has been designed to provide the user with the flexibility of defining and implementing various types of validation criteria, to iteratively and incrementally validate datasets, and to generate validation reports.

  20. Augmented Reality Prototype for Visualizing Large Sensors’ Datasets

    Folorunso Olufemi A.

    2011-04-01

    Full Text Available This paper addressed the development of an augmented reality (AR based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations which made data exploration and visualisation daunting tasks. Therefore a model to manage such data and enhance computational support needed for effective explorations are developed in this paper. A challenge of this approach is to reduce the data inefficiency. This paper presented a model for computing information gain for each data attributes and determine a lead attribute.The computed lead attribute is then used for the development of an AR-based scientific visualization interface which automatically identifies, localises and visualizes all necessary data relevant to a particularly selected region of interest (ROI on the network. Necessary architectural system supports and the interface requirements for such visualizations are also presented.

  1. Linking open vocabularies

    Greifender, Elke; Seadle, Michael

    2013-01-01

    Linked Data (LD), Linked Open Data (LOD) and generating a web of data, present the new knowledge sharing frontier. In a philosophical context, LD is an evolving environment that reflects humankinds' desire to understand the world by drawing on the latest technologies and capabilities of the time. LD, while seemingly a new phenomenon did not emerge overnight; rather it represents the natural progression by which knowledge structures are developed, used, and shared. Linked Open Vocabularies is a significant trajectory of LD. Linked Open Vocabularies targets vocabularies that have traditionally b

  2. GLEAM version 3: Global Land Evaporation Datasets and Model

    Martens, B.; Miralles, D. G.; Lievens, H.; van der Schalie, R.; de Jeu, R.; Fernandez-Prieto, D.; Verhoest, N.

    2015-12-01

    Terrestrial evaporation links energy, water and carbon cycles over land and is therefore a key variable of the climate system. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to limitations in in situ measurements. As a result, several methods have risen to estimate global patterns of land evaporation from satellite observations. However, these algorithms generally differ in their approach to model evaporation, resulting in large differences in their estimates. One of these methods is GLEAM, the Global Land Evaporation: the Amsterdam Methodology. GLEAM estimates terrestrial evaporation based on daily satellite observations of meteorological variables, vegetation characteristics and soil moisture. Since the publication of the first version of the algorithm (2011), the model has been widely applied to analyse trends in the water cycle and land-atmospheric feedbacks during extreme hydrometeorological events. A third version of the GLEAM global datasets is foreseen by the end of 2015. Given the relevance of having a continuous and reliable record of global-scale evaporation estimates for climate and hydrological research, the establishment of an online data portal to host these data to the public is also foreseen. In this new release of the GLEAM datasets, different components of the model have been updated, with the most significant change being the revision of the data assimilation algorithm. In this presentation, we will highlight the most important changes of the methodology and present three new GLEAM datasets and their validation against in situ observations and an alternative dataset of terrestrial evaporation (ERA-Land). Results of the validation exercise indicate that the magnitude and the spatiotemporal variability of the modelled evaporation agree reasonably well with the estimates of ERA-Land and the in situ

  3. Topic modeling for cluster analysis of large biological and medical datasets.

    Zhao, Weizhong; Zou, Wen; Chen, James J

    2014-01-01

    The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting

  4. Towards precision medicine: discovering novel gynecological cancer biomarkers and pathways using linked data.

    Jha, Alokkumar; Khan, Yasar; Mehdi, Muntazir; Karim, Md Rezaul; Mehmood, Qaiser; Zappa, Achille; Rebholz-Schuhmann, Dietrich; Sahay, Ratnesh

    2017-09-19

    Next Generation Sequencing (NGS) is playing a key role in therapeutic decision making for the cancer prognosis and treatment. The NGS technologies are producing a massive amount of sequencing datasets. Often, these datasets are published from the isolated and different sequencing facilities. Consequently, the process of sharing and aggregating multisite sequencing datasets are thwarted by issues such as the need to discover relevant data from different sources, built scalable repositories, the automation of data linkage, the volume of the data, efficient querying mechanism, and information rich intuitive visualisation. We present an approach to link and query different sequencing datasets (TCGA, COSMIC, REACTOME, KEGG and GO) to indicate risks for four cancer types - Ovarian Serous Cystadenocarcinoma (OV), Uterine Corpus Endometrial Carcinoma (UCEC), Uterine Carcinosarcoma (UCS), Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (CESC) - covering the 16 healthy tissue-specific genes from Illumina Human Body Map 2.0. The differentially expressed genes from Illumina Human Body Map 2.0 are analysed together with the gene expressions reported in COSMIC and TCGA repositories leading to the discover of potential biomarkers for a tissue-specific cancer. We analyse the tissue expression of genes, copy number variation (CNV), somatic mutation, and promoter methylation to identify associated pathways and find novel biomarkers. We discovered twenty (20) mutated genes and three (3) potential pathways causing promoter changes in different gynaecological cancer types. We propose a data-interlinked platform called BIOOPENER that glues together heterogeneous cancer and biomedical repositories. The key approach is to find correspondences (or data links) among genetic, cellular and molecular features across isolated cancer datasets giving insight into cancer progression from normal to diseased tissues. The proposed BIOOPENER platform enriches mutations by filling in

  5. Comparison of Shallow Survey 2012 Multibeam Datasets

    Ramirez, T. M.

    2012-12-01

    The purpose of the Shallow Survey common dataset is a comparison of the different technologies utilized for data acquisition in the shallow survey marine environment. The common dataset consists of a series of surveys conducted over a common area of seabed using a variety of systems. It provides equipment manufacturers the opportunity to showcase their latest systems while giving hydrographic researchers and scientists a chance to test their latest algorithms on the dataset so that rigorous comparisons can be made. Five companies collected data for the Common Dataset in the Wellington Harbor area in New Zealand between May 2010 and May 2011; including Kongsberg, Reson, R2Sonic, GeoAcoustics, and Applied Acoustics. The Wellington harbor and surrounding coastal area was selected since it has a number of well-defined features, including the HMNZS South Seas and HMNZS Wellington wrecks, an armored seawall constructed of Tetrapods and Akmons, aquifers, wharves and marinas. The seabed inside the harbor basin is largely fine-grained sediment, with gravel and reefs around the coast. The area outside the harbor on the southern coast is an active environment, with moving sand and exposed reefs. A marine reserve is also in this area. For consistency between datasets, the coastal research vessel R/V Ikatere and crew were used for all surveys conducted for the common dataset. Using Triton's Perspective processing software multibeam datasets collected for the Shallow Survey were processed for detail analysis. Datasets from each sonar manufacturer were processed using the CUBE algorithm developed by the Center for Coastal and Ocean Mapping/Joint Hydrographic Center (CCOM/JHC). Each dataset was gridded at 0.5 and 1.0 meter resolutions for cross comparison and compliance with International Hydrographic Organization (IHO) requirements. Detailed comparisons were made of equipment specifications (transmit frequency, number of beams, beam width), data density, total uncertainty, and

  6. GenEST, a powerful bidirectional link between cDNA sequence data and gene expression profiles generated by cDNA-AFLP

    Qin Ling,; Prins, P.; Jones, J.T.; Popeijus, H.; Smant, G.; Bakker, J.; Helder, J.

    2001-01-01

    The release of vast quantities of DNA sequence data by large-scale genome and expressed sequence tag (EST) projects underlines the necessity for the development of efficient and inexpensive ways to link sequence databases with temporal and spatial expression profiles. Here we demonstrate the power

  7. RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system

    Jensen, Tue V.; Pinson, Pierre

    2017-11-01

    Future highly renewable energy systems will couple to complex weather and climate dynamics. This coupling is generally not captured in detail by the open models developed in the power and energy system communities, where such open models exist. To enable modeling such a future energy system, we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven forecasts and corresponding realizations for renewable energy generation for a period of 3 years. These may be scaled according to the envisioned degrees of renewable penetration in a future European energy system. The spatial coverage, completeness and resolution of this dataset, open the door to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecasting of renewable power generation.

  8. RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system.

    Jensen, Tue V; Pinson, Pierre

    2017-11-28

    Future highly renewable energy systems will couple to complex weather and climate dynamics. This coupling is generally not captured in detail by the open models developed in the power and energy system communities, where such open models exist. To enable modeling such a future energy system, we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven forecasts and corresponding realizations for renewable energy generation for a period of 3 years. These may be scaled according to the envisioned degrees of renewable penetration in a future European energy system. The spatial coverage, completeness and resolution of this dataset, open the door to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecasting of renewable power generation.

  9. High-Capacity 60 GHz and 75–110 GHz Band Links Employing All-Optical OFDM Generation and Digital Coherent Detection

    Caballero Jambrina, Antonio; Zibar, Darko; Sambaraju, Rakesh

    2012-01-01

    The performance of wireless signal generation and detection at millimeter-wave frequencies using baseband optical means is analyzed and experimentally demonstrated. Multigigabit wireless signal generation is achieved based on all-optical orthogonal frequency division multiplexing (OFDM) and photo......The performance of wireless signal generation and detection at millimeter-wave frequencies using baseband optical means is analyzed and experimentally demonstrated. Multigigabit wireless signal generation is achieved based on all-optical orthogonal frequency division multiplexing (OFDM...... scalability and bit-rate transparency of our proposed approach, we experimentally demonstrated generation and detection in the 60 GHz and 75–110 GHz band of an all-optical OFDM quadrature phase shift keying, with two and three subcarriers, for a total bit rate over 20 Gb/ s....

  10. 3DSEM: A 3D microscopy dataset

    Ahmad P. Tafti

    2016-03-01

    Full Text Available The Scanning Electron Microscope (SEM as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples. Keywords: 3D microscopy dataset, 3D microscopy vision, 3D SEM surface reconstruction, Scanning Electron Microscope (SEM

  11. Data Mining for Imbalanced Datasets: An Overview

    Chawla, Nitesh V.

    A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying machine learning techniques to difficult "real-world" problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performance of a classifier, might not be appropriate when the data is imbalanced and/or the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniques used for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.

  12. Ontology-Based Querying with Bio2RDF's Linked Open Data.

    Callahan, Alison; Cruz-Toledo, José; Dumontier, Michel

    2013-04-15

    A key activity for life scientists in this post "-omics" age involves searching for and integrating biological data from a multitude of independent databases. However, our ability to find relevant data is hampered by non-standard web and database interfaces backed by an enormous variety of data formats. This heterogeneity presents an overwhelming barrier to the discovery and reuse of resources which have been developed at great public expense.To address this issue, the open-source Bio2RDF project promotes a simple convention to integrate diverse biological data using Semantic Web technologies. However, querying Bio2RDF remains difficult due to the lack of uniformity in the representation of Bio2RDF datasets. We describe an update to Bio2RDF that includes tighter integration across 19 new and updated RDF datasets. All available open-source scripts were first consolidated to a single GitHub repository and then redeveloped using a common API that generates normalized IRIs using a centralized dataset registry. We then mapped dataset specific types and relations to the Semanticscience Integrated Ontology (SIO) and demonstrate simplified federated queries across multiple Bio2RDF endpoints. This coordinated release marks an important milestone for the Bio2RDF open source linked data framework. Principally, it improves the quality of linked data in the Bio2RDF network and makes it easier to access or recreate the linked data locally. We hope to continue improving the Bio2RDF network of linked data by identifying priority databases and increasing the vocabulary coverage to additional dataset vocabularies beyond SIO.

  13. Harvard Aging Brain Study: Dataset and accessibility.

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G; Chatwal, Jasmeer P; Papp, Kathryn V; Amariglio, Rebecca E; Blacker, Deborah; Rentz, Dorene M; Johnson, Keith A; Sperling, Reisa A; Schultz, Aaron P

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging. To promote more extensive analyses, imaging data was designed to be compatible with other publicly available datasets. A cloud-based system enables access to interested researchers with blinded data available contingent upon completion of a data usage agreement and administrative approval. Data collection is ongoing and currently in its fifth year. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. Expanding phenotype of p.Ala140Val mutation in MECP2 in a 4 generation family with X-linked intellectual disability and spasticity.

    Lambert, Sophie; Maystadt, Isabelle; Boulanger, Sébastien; Vrielynck, Pascal; Destrée, Anne; Lederer, Damien; Moortgat, Stéphanie

    2016-10-01

    Mutations in MECP2 (MIM #312750), located on Xq28 and encoding a methyl CpG binding protein, are classically associated with Rett syndrome in female patients, with a lethal effect in hemizygous males. However, MECP2 mutations have already been reported in surviving males with severe neonatal-onset encephalopathy, or with X-linked intellectual disability associated with psychosis, pyramidal signs, parkinsonian features and macro-orchidism (PPM-X syndrome; MIM3 #300055). Here we report on the identification of the p.Ala140Val mutation in the MECP2 gene in 4 males and 3 females of a large Caucasian family affected with X-linked intellectual disability. Females present with mild cognitive impairment and speech difficulties. Males have moderate intellectual disability, impaired language development, friendly behavior, slowly progressive spastic paraparesis and dystonic movements of the hands. Two of them show microcephaly. The p.Ala140Val mutation is recurrent, as it was already described in 4 families with X-linked mental retardation and in three sporadic male patients with intellectual disability. We further delineate the phenotype associated with the p.Ala140Val mutation, illustrating a variable expressivity even within a given family, and we compare our patients with previous reported cases in the literature. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  15. Skewing of X-chromosome inactivation in three generations of carriers with X-linked chronic granulomatous disease within one family

    Köker, M. Y.; Sanal, O.; de Boer, M.; Tezcan, I.; Metin, A.; Tan, C.; Ersoy, F.; Roos, D.

    2006-01-01

    BACKGROUND: Chronic granulomatous disease (CGD) is an inherited disorder of the innate immune system characterized by impairment of intracellular microbicidal activity of phagocytes. Mutations in one of the four known NADPH-oxidase components preclude generation of superoxide and related

  16. Kernel-based discriminant feature extraction using a representative dataset

    Li, Honglin; Sancho Gomez, Jose-Luis; Ahalt, Stanley C.

    2002-07-01

    Discriminant Feature Extraction (DFE) is widely recognized as an important pre-processing step in classification applications. Most DFE algorithms are linear and thus can only explore the linear discriminant information among the different classes. Recently, there has been several promising attempts to develop nonlinear DFE algorithms, among which is Kernel-based Feature Extraction (KFE). The efficacy of KFE has been experimentally verified by both synthetic data and real problems. However, KFE has some known limitations. First, KFE does not work well for strongly overlapped data. Second, KFE employs all of the training set samples during the feature extraction phase, which can result in significant computation when applied to very large datasets. Finally, KFE can result in overfitting. In this paper, we propose a substantial improvement to KFE that overcomes the above limitations by using a representative dataset, which consists of critical points that are generated from data-editing techniques and centroid points that are determined by using the Frequency Sensitive Competitive Learning (FSCL) algorithm. Experiments show that this new KFE algorithm performs well on significantly overlapped datasets, and it also reduces computational complexity. Further, by controlling the number of centroids, the overfitting problem can be effectively alleviated.

  17. Multivariate Analysis of Multiple Datasets: a Practical Guide for Chemical Ecology.

    Hervé, Maxime R; Nicolè, Florence; Lê Cao, Kim-Anh

    2018-03-01

    Chemical ecology has strong links with metabolomics, the large-scale study of all metabolites detectable in a biological sample. Consequently, chemical ecologists are often challenged by the statistical analyses of such large datasets. This holds especially true when the purpose is to integrate multiple datasets to obtain a holistic view and a better understanding of a biological system under study. The present article provides a comprehensive resource to analyze such complex datasets using multivariate methods. It starts from the necessary pre-treatment of data including data transformations and distance calculations, to the application of both gold standard and novel multivariate methods for the integration of different omics data. We illustrate the process of analysis along with detailed results interpretations for six issues representative of the different types of biological questions encountered by chemical ecologists. We provide the necessary knowledge and tools with reproducible R codes and chemical-ecological datasets to practice and teach multivariate methods.

  18. Graph based techniques for tag cloud generation

    Leginus, Martin; Dolog, Peter; Lage, Ricardo Gomes

    2013-01-01

    Tag cloud is one of the navigation aids for exploring documents. Tag cloud also link documents through the user defined terms. We explore various graph based techniques to improve the tag cloud generation. Moreover, we introduce relevance measures based on underlying data such as ratings...... or citation counts for improved measurement of relevance of tag clouds. We show, that on the given data sets, our approach outperforms the state of the art baseline methods with respect to such relevance by 41 % on Movielens dataset and by 11 % on Bibsonomy data set....

  19. Random Coefficient Logit Model for Large Datasets

    C. Hernández-Mireles (Carlos); D. Fok (Dennis)

    2010-01-01

    textabstractWe present an approach for analyzing market shares and products price elasticities based on large datasets containing aggregate sales data for many products, several markets and for relatively long time periods. We consider the recently proposed Bayesian approach of Jiang et al [Jiang,

  20. Thesaurus Dataset of Educational Technology in Chinese

    Wu, Linjing; Liu, Qingtang; Zhao, Gang; Huang, Huan; Huang, Tao

    2015-01-01

    The thesaurus dataset of educational technology is a knowledge description of educational technology in Chinese. The aims of this thesaurus were to collect the subject terms in the domain of educational technology, facilitate the standardization of terminology and promote the communication between Chinese researchers and scholars from various…

  1. Operative Links

    Wistoft, Karen; Højlund, Holger

    2012-01-01

    educational goals, learning content, or value clarification. Health pedagogy is often a matter of retrospective rationalization rather than the starting point of planning. Health and risk behaviour approaches override health educational approaches. Conclusions: Operational links between health education......, health professionalism, and management strategies pose the foremost challenge. Operational links indicates cooperative levels that facilitate a creative and innovative effort across traditional professional boundaries. It is proposed that such links are supported by network structures, shared semantics...

  2. The role of metadata in managing large environmental science datasets. Proceedings

    Melton, R.B.; DeVaney, D.M. [eds.] [Pacific Northwest Lab., Richland, WA (United States); French, J. C. [Univ. of Virginia, (United States)

    1995-06-01

    The purpose of this workshop was to bring together computer science researchers and environmental sciences data management practitioners to consider the role of metadata in managing large environmental sciences datasets. The objectives included: establishing a common definition of metadata; identifying categories of metadata; defining problems in managing metadata; and defining problems related to linking metadata with primary data.

  3. SmartMIMO: An Energy-Aware Adaptive MIMO-OFDM Radio Link Control for Next-Generation Wireless Local Area Networks

    Wim Dehaene

    2007-12-01

    Full Text Available Multiantenna systems and more particularly those operating on multiple input and multiple output (MIMO channels are currently a must to improve wireless links spectrum efficiency and/or robustness. There exists a fundamental tradeoff between potential spectrum efficiency and robustness increase. However, multiantenna techniques also come with an overhead in silicon implementation area and power consumption due, at least, to the duplication of part of the transmitter and receiver radio front-ends. Although the area overhead may be acceptable in view of the performance improvement, low power consumption must be preserved for integration in nomadic devices. In this case, it is the tradeoff between performance (e.g., the net throughput on top of the medium access control layer and average power consumption that really matters. It has been shown that adaptive schemes were mandatory to avoid that multiantenna techniques hamper this system tradeoff. In this paper, we derive smartMIMO: an adaptive multiantenna approach which, next to simply adapting the modulation and code rate as traditionally considered, decides packet-per-packet, depending on the MIMO channel state, to use either space-division multiplexing (increasing spectrum efficiency, space-time coding (increasing robustness, or to stick to single-antenna transmission. Contrarily to many of such adaptive schemes, the focus is set on using multiantenna transmission to improve the link energy efficiency in real operation conditions. Based on a model calibrated on an existing reconfigurable multiantenna transceiver setup, the link energy efficiency with the proposed scheme is shown to be improved by up to 30% when compared to nonadaptive schemes. The average throughput is, on the other hand, improved by up to 50% when compared to single-antenna transmission.

  4. SmartMIMO: An Energy-Aware Adaptive MIMO-OFDM Radio Link Control for Next-Generation Wireless Local Area Networks

    Dehaene Wim

    2007-01-01

    Full Text Available Multiantenna systems and more particularly those operating on multiple input and multiple output (MIMO channels are currently a must to improve wireless links spectrum efficiency and/or robustness. There exists a fundamental tradeoff between potential spectrum efficiency and robustness increase. However, multiantenna techniques also come with an overhead in silicon implementation area and power consumption due, at least, to the duplication of part of the transmitter and receiver radio front-ends. Although the area overhead may be acceptable in view of the performance improvement, low power consumption must be preserved for integration in nomadic devices. In this case, it is the tradeoff between performance (e.g., the net throughput on top of the medium access control layer and average power consumption that really matters. It has been shown that adaptive schemes were mandatory to avoid that multiantenna techniques hamper this system tradeoff. In this paper, we derive smartMIMO: an adaptive multiantenna approach which, next to simply adapting the modulation and code rate as traditionally considered, decides packet-per-packet, depending on the MIMO channel state, to use either space-division multiplexing (increasing spectrum efficiency, space-time coding (increasing robustness, or to stick to single-antenna transmission. Contrarily to many of such adaptive schemes, the focus is set on using multiantenna transmission to improve the link energy efficiency in real operation conditions. Based on a model calibrated on an existing reconfigurable multiantenna transceiver setup, the link energy efficiency with the proposed scheme is shown to be improved by up to 30% when compared to nonadaptive schemes. The average throughput is, on the other hand, improved by up to 50% when compared to single-antenna transmission.

  5. Generation of monoclonal antibodies against peptidylarginine deiminase 2 (PAD2) and development of a PAD2-specific enzyme-linked immunosorbent assay

    Damgaard, Dres; Palarasah, Yaseelan; Skjødt, Karsten

    2014-01-01

    The enzyme peptidylarginine deiminase 2 (PAD2) has been associated with inflammatory diseases, such as rheumatoid arthritis and neurodegenerative diseases including multiple sclerosis. To investigate the association of various diseases with extracellular PAD2, we raised monoclonal antibodies (m......Abs) against rabbit PAD2 and evaluated their cross-reactivity with human PAD2 by indirect enzyme-linked immunosorbent assay (ELISA), western blotting and immunohistological staining of inflamed synovial tissue. Moreover, we established a sandwich ELISA detecting human PAD2, based on two different monoclonal...... diseases....

  6. EarthCube GeoLink: Semantics and Linked Data for the Geosciences

    Arko, R. A.; Carbotte, S. M.; Chandler, C. L.; Cheatham, M.; Fils, D.; Hitzler, P.; Janowicz, K.; Ji, P.; Jones, M. B.; Krisnadhi, A.; Lehnert, K. A.; Mickle, A.; Narock, T.; O'Brien, M.; Raymond, L. M.; Schildhauer, M.; Shepherd, A.; Wiebe, P. H.

    2015-12-01

    The NSF EarthCube initiative is building next-generation cyberinfrastructure to aid geoscientists in collecting, accessing, analyzing, sharing, and visualizing their data and knowledge. The EarthCube GeoLink Building Block project focuses on a specific set of software protocols and vocabularies, often characterized as the Semantic Web and "Linked Data", to publish data online in a way that is easily discoverable, accessible, and interoperable. GeoLink brings together specialists from the computer science, geoscience, and library science domains, and includes data from a network of NSF-funded repositories that support scientific studies in marine geology, marine ecosystems, biogeochemistry, and paleoclimatology. We are working collaboratively with closely-related Building Block projects including EarthCollab and CINERGI, and solicit feedback from RCN projects including Cyberinfrastructure for Paleogeosciences (C4P) and iSamples. GeoLink has developed a modular ontology that describes essential geoscience research concepts; published data from seven collections (to date) on the Web as geospatially-enabled Linked Data using this ontology; matched and mapped data between collections using shared identifiers for investigators, repositories, datasets, funding awards, platforms, research cruises, physical specimens, and gazetteer features; and aggregated the results in a shared knowledgebase that can be queried via a standard SPARQL endpoint. Client applications have been built around the knowledgebase, including a Web/map-based data browser using the Leaflet JavaScript library and a simple query service using the OpenSearch format. Future development will include extending and refining the GeoLink ontology, adding content from additional repositories, developing semi-automated algorithms to enhance metadata, and further work on client applications.

  7. Hoechst 33258 dye generates DNA-protein cross-links during ultraviolet light-induced photolysis of bromodeoxyuridine in replicated and repaired DNA

    Guo Xicang; Morgan, W.F.; Cleaver, J.E.

    1986-08-01

    Substitution of bromodeoxyuridine for thymidine in the DNA of mammalian cells sensitizes them to a range of wavelengths of ultraviolet light. Cells are also sensitized to photochemical reactions involving dyes such as Hoechst 33258, which is used to produce differential staining of chromatids according to their bromodeoxyuridine content. Irradiation with 313 nm light of human and hamster cells containing bromodeoxyuridine in their DNA produced single-strand breaks but no DNA-protein cross-links. Irradiation with 360 nm light in the presence of Hoechst 33258 produced extensive DNA-protein cross-linkage as well as single-strand breaks. These cross-links were observed in DNA containing bromodeoxyuridine incorporated by either semiconservative or repair replication. When the protein was removed with proteinase K, bromodeoxyuridine in repair patches after irradiation by doses of ultraviolet (254 nm) light as low as 0.26 J/m/sup 2/ could readily be detected. Hoechst 33258-mediated photolysis, therefore, provides a sensitive new technique for measuring repair replication after ultraviolet light irradiation.

  8. F-band millimeter-wave signal generation for wireless link data transmission using on-chip photonic integrated dual-wavelength sources

    Guzman, Robinson; Carpintero, G.; Gordon Gallegos, Carlos; Lawniczuk, Katarzyna; Leijtens, Xaveer

    2015-01-01

    Millimeter-waves (30-300 GHz) have interest due to the wide bandwidths available for carrying information, enabling broadband wireless communications. Photonics is a key technology for millimeter wave signal generation, recently demonstrating the use of photonic integration to reduce size and cost.

  9. ISED: Constructing a high-resolution elevation road dataset from massive, low-quality in-situ observations derived from geosocial fitness tracking data.

    Grant McKenzie

    Full Text Available Gaining access to inexpensive, high-resolution, up-to-date, three-dimensional road network data is a top priority beyond research, as such data would fuel applications in industry, governments, and the broader public alike. Road network data are openly available via user-generated content such as OpenStreetMap (OSM but lack the resolution required for many tasks, e.g., emergency management. More importantly, however, few publicly available data offer information on elevation and slope. For most parts of the world, up-to-date digital elevation products with a resolution of less than 10 meters are a distant dream and, if available, those datasets have to be matched to the road network through an error-prone process. In this paper we present a radically different approach by deriving road network elevation data from massive amounts of in-situ observations extracted from user-contributed data from an online social fitness tracking application. While each individual observation may be of low-quality in terms of resolution and accuracy, taken together they form an accurate, high-resolution, up-to-date, three-dimensional road network that excels where other technologies such as LiDAR fail, e.g., in case of overpasses, overhangs, and so forth. In fact, the 1m spatial resolution dataset created in this research based on 350 million individual 3D location fixes has an RMSE of approximately 3.11m compared to a LiDAR-based ground-truth and can be used to enhance existing road network datasets where individual elevation fixes differ by up to 60m. In contrast, using interpolated data from the National Elevation Dataset (NED results in 4.75m RMSE compared to the base line. We utilize Linked Data technologies to integrate the proposed high-resolution dataset with OpenStreetMap road geometries without requiring any changes to the OSM data model.

  10. ISED: Constructing a high-resolution elevation road dataset from massive, low-quality in-situ observations derived from geosocial fitness tracking data.

    McKenzie, Grant; Janowicz, Krzysztof

    2017-01-01

    Gaining access to inexpensive, high-resolution, up-to-date, three-dimensional road network data is a top priority beyond research, as such data would fuel applications in industry, governments, and the broader public alike. Road network data are openly available via user-generated content such as OpenStreetMap (OSM) but lack the resolution required for many tasks, e.g., emergency management. More importantly, however, few publicly available data offer information on elevation and slope. For most parts of the world, up-to-date digital elevation products with a resolution of less than 10 meters are a distant dream and, if available, those datasets have to be matched to the road network through an error-prone process. In this paper we present a radically different approach by deriving road network elevation data from massive amounts of in-situ observations extracted from user-contributed data from an online social fitness tracking application. While each individual observation may be of low-quality in terms of resolution and accuracy, taken together they form an accurate, high-resolution, up-to-date, three-dimensional road network that excels where other technologies such as LiDAR fail, e.g., in case of overpasses, overhangs, and so forth. In fact, the 1m spatial resolution dataset created in this research based on 350 million individual 3D location fixes has an RMSE of approximately 3.11m compared to a LiDAR-based ground-truth and can be used to enhance existing road network datasets where individual elevation fixes differ by up to 60m. In contrast, using interpolated data from the National Elevation Dataset (NED) results in 4.75m RMSE compared to the base line. We utilize Linked Data technologies to integrate the proposed high-resolution dataset with OpenStreetMap road geometries without requiring any changes to the OSM data model.

  11. The Path from Large Earth Science Datasets to Information

    Vicente, G. A.

    2013-12-01

    The NASA Goddard Earth Sciences Data (GES) and Information Services Center (DISC) is one of the major Science Mission Directorate (SMD) for archiving and distribution of Earth Science remote sensing data, products and services. This virtual portal provides convenient access to Atmospheric Composition and Dynamics, Hydrology, Precipitation, Ozone, and model derived datasets (generated by GSFC's Global Modeling and Assimilation Office), the North American Land Data Assimilation System (NLDAS) and the Global Land Data Assimilation System (GLDAS) data products (both generated by GSFC's Hydrological Sciences Branch). This presentation demonstrates various tools and computational technologies developed in the GES DISC to manage the huge volume of data and products acquired from various missions and programs over the years. It explores approaches to archive, document, distribute, access and analyze Earth Science data and information as well as addresses the technical and scientific issues, governance and user support problem faced by scientists in need of multi-disciplinary datasets. It also discusses data and product metrics, user distribution profiles and lessons learned through interactions with the science communities around the world. Finally it demonstrates some of the most used data and product visualization and analyses tools developed and maintained by the GES DISC.

  12. Linked Data: what does it offer Earth Sciences?

    Cox, Simon; Schade, Sven

    2010-05-01

    interfaces, the browse metaphor, which has been such an important part of the success of the web, must be augmented with other interaction mechanisms, including query. What are the impacts on search and metadata? Hypertext provides links selected by the page provider. However, science should endeavor to be exhaustive in its use of data. Resource discovery through links must be supplemented by more systematic data discovery through search. Conversely, the crawlers that generate search indexes must be fed by resource providers (a) serving navigation pages with links to every dataset (b) adding enough 'metadata' (semantics) on each link to effectively populate the indexes. Linked Data makes this easier due to its integration with semantic web technologies, including structured vocabularies. What is the relation between structured data and Linked Data? Linked Data has focused on web-pages (primarily HTML) for human browsing, and RDF for semantics, assuming that other representations are opaque. However, this overlooks the wealth of XML data on the web, some of which is structured according to XML Schemas that provide semantics. Technical applications can use content-negotiation to get a structured representation, and exploit its semantics. Particularly relevant for earth sciences are data representations based on OGC Geography Markup Language (GML), such as GeoSciML, O&M and MOLES. GML was strongly influenced by RDF, and typed links are intrinsic: xlink:href plays the role that rdf:resource does in RDF representations. Services which expose GML-formatted resources (such as OGC Web Feature Service) are a prototype of Linked Data. Giving credit where it is due. Organizations investing in data collection may be reluctant to publish the raw data prior to completing an initial analysis. To encourage early data publication the system must provide suitable incentives, and citation analysis must recognize the increasing diversity of publication routes and forms. Linked Data makes it

  13. Sharing Video Datasets in Design Research

    Christensen, Bo; Abildgaard, Sille Julie Jøhnk

    2017-01-01

    This paper examines how design researchers, design practitioners and design education can benefit from sharing a dataset. We present the Design Thinking Research Symposium 11 (DTRS11) as an exemplary project that implied sharing video data of design processes and design activity in natural settings...... with a large group of fellow academics from the international community of Design Thinking Research, for the purpose of facilitating research collaboration and communication within the field of Design and Design Thinking. This approach emphasizes the social and collaborative aspects of design research, where...... a multitude of appropriate perspectives and methods may be utilized in analyzing and discussing the singular dataset. The shared data is, from this perspective, understood as a design object in itself, which facilitates new ways of working, collaborating, studying, learning and educating within the expanding...

  14. Interpolation of diffusion weighted imaging datasets

    Dyrby, Tim B; Lundell, Henrik; Burke, Mark W

    2014-01-01

    anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal......Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer...... interpolation methods fail to disentangle fine anatomical details if PVE is too pronounced in the original data. As for validation we used ex-vivo DWI datasets acquired at various image resolutions as well as Nissl-stained sections. Increasing the image resolution by a factor of eight yielded finer geometrical...

  15. Data assimilation and model evaluation experiment datasets

    Lai, Chung-Cheng A.; Qian, Wen; Glenn, Scott M.

    1994-01-01

    The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMEE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets. The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: (1) collection of observational data; (2) analysis and interpretation; (3) interpolation using the Optimum Thermal Interpolation System package; (4) quality control and re-analysis; and (5) data archiving and software documentation. The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement. Suggestions for DAMEE data usages include (1) ocean modeling and data assimilation studies, (2) diagnosis and theoretical studies, and (3) comparisons with locally detailed observations.

  16. A hybrid organic-inorganic perovskite dataset

    Kim, Chiho; Huan, Tran Doan; Krishnan, Sridevi; Ramprasad, Rampi

    2017-05-01

    Hybrid organic-inorganic perovskites (HOIPs) have been attracting a great deal of attention due to their versatility of electronic properties and fabrication methods. We prepare a dataset of 1,346 HOIPs, which features 16 organic cations, 3 group-IV cations and 4 halide anions. Using a combination of an atomic structure search method and density functional theory calculations, the optimized structures, the bandgap, the dielectric constant, and the relative energies of the HOIPs are uniformly prepared and validated by comparing with relevant experimental and/or theoretical data. We make the dataset available at Dryad Digital Repository, NoMaD Repository, and Khazana Repository (http://khazana.uconn.edu/), hoping that it could be useful for future data-mining efforts that can explore possible structure-property relationships and phenomenological models. Progressive extension of the dataset is expected as new organic cations become appropriate within the HOIP framework, and as additional properties are calculated for the new compounds found.

  17. Discovery of Protein–lncRNA Interactions by Integrating Large-Scale CLIP-Seq and RNA-Seq Datasets

    Li, Jun-Hao; Liu, Shun; Zheng, Ling-Ling; Wu, Jie; Sun, Wen-Ju; Wang, Ze-Lin; Zhou, Hui; Qu, Liang-Hu; Yang, Jian-Hua

    2015-01-01

    Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein–lncRNA interactions. In this study, by analyzing millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies, we identified 22,735 RBP–lncRNA regulatory relationships. We found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulate gene expression. We also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs. Finally, we developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets. Our study represented an important step in identification and analysis of RBP–lncRNA interactions and showed that these interactions may play crucial roles in cancer and genetic diseases.

  18. Discovery of Protein–lncRNA Interactions by Integrating Large-Scale CLIP-Seq and RNA-Seq Datasets

    Li, Jun-Hao; Liu, Shun; Zheng, Ling-Ling; Wu, Jie; Sun, Wen-Ju; Wang, Ze-Lin; Zhou, Hui; Qu, Liang-Hu, E-mail: lssqlh@mail.sysu.edu.cn; Yang, Jian-Hua, E-mail: lssqlh@mail.sysu.edu.cn [RNA Information Center, Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory for Biocontrol, Sun Yat-sen University, Guangzhou (China)

    2015-01-14

    Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein–lncRNA interactions. In this study, by analyzing millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies, we identified 22,735 RBP–lncRNA regulatory relationships. We found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulate gene expression. We also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs. Finally, we developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets. Our study represented an important step in identification and analysis of RBP–lncRNA interactions and showed that these interactions may play crucial roles in cancer and genetic diseases.

  19. The peroxyl radical-induced oxidation of Escherichia coli FtsZ and its single tryptophan mutant (Y222W) modifies specific side-chains, generates protein cross-links and affects biological function

    Escobar-Álvarez, Elizabeth; Leinisch, Fabian; Araya, Gissela

    2017-01-01

    radicals (ROO•) generated from AAPH (2,2′-azobis(2-methylpropionamidine) dihydrochloride) was studied. The non-oxidized proteins showed differences in their polymerization behavior, with this favored by the presence of Trp at position 222. AAPH-treatment of the proteins inhibited polymerization. Protein...... consumed by ROO•. Quantification of the number of moles of amino acid consumed per mole of ROO• shows that most of the initial oxidant can be accounted for at low radical fluxes, with Met being a major target. Western blotting provided evidence for di-tyrosine cross-links in the dimeric and trimeric...

  20. Annotating spatio-temporal datasets for meaningful analysis in the Web

    Stasch, Christoph; Pebesma, Edzer; Scheider, Simon

    2014-05-01

    More and more environmental datasets that vary in space and time are available in the Web. This comes along with an advantage of using the data for other purposes than originally foreseen, but also with the danger that users may apply inappropriate analysis procedures due to lack of important assumptions made during the data collection process. In order to guide towards a meaningful (statistical) analysis of spatio-temporal datasets available in the Web, we have developed a Higher-Order-Logic formalism that captures some relevant assumptions in our previous work [1]. It allows to proof on meaningful spatial prediction and aggregation in a semi-automated fashion. In this poster presentation, we will present a concept for annotating spatio-temporal datasets available in the Web with concepts defined in our formalism. Therefore, we have defined a subset of the formalism as a Web Ontology Language (OWL) pattern. It allows capturing the distinction between the different spatio-temporal variable types, i.e. point patterns, fields, lattices and trajectories, that in turn determine whether a particular dataset can be interpolated or aggregated in a meaningful way using a certain procedure. The actual annotations that link spatio-temporal datasets with the concepts in the ontology pattern are provided as Linked Data. In order to allow data producers to add the annotations to their datasets, we have implemented a Web portal that uses a triple store at the backend to store the annotations and to make them available in the Linked Data cloud. Furthermore, we have implemented functions in the statistical environment R to retrieve the RDF annotations and, based on these annotations, to support a stronger typing of spatio-temporal datatypes guiding towards a meaningful analysis in R. [1] Stasch, C., Scheider, S., Pebesma, E., Kuhn, W. (2014): "Meaningful spatial prediction and aggregation", Environmental Modelling & Software, 51, 149-165.

  1. High capacity wireless data links in the W-band using hybrid photonics-electronic techniques for signal generation and detection

    Vegas Olmos, Juan José; Tafur Monroy, Idelfonso

    2014-01-01

    Seamless convergence of fiber-optic and the wireless networks is of great interest for enabling transparent delivery of broadband services to users in different locations, including both metropolitan and rural areas. Current demand of bandwidth by end-users, especially using mobile devices...... latest findings and experimental results on the W-band, specifically on its 81–86GHz sub-band. These include photonic generation of millimeter-wave carriers and transmission performance of broadband signals on different types of fibers and span lengths....

  2. Sensitivity of a numerical wave model on wind re-analysis datasets

    Lavidas, George; Venugopal, Vengatesan; Friedrich, Daniel

    2017-03-01

    Wind is the dominant process for wave generation. Detailed evaluation of metocean conditions strengthens our understanding of issues concerning potential offshore applications. However, the scarcity of buoys and high cost of monitoring systems pose a barrier to properly defining offshore conditions. Through use of numerical wave models, metocean conditions can be hindcasted and forecasted providing reliable characterisations. This study reports the sensitivity of wind inputs on a numerical wave model for the Scottish region. Two re-analysis wind datasets with different spatio-temporal characteristics are used, the ERA-Interim Re-Analysis and the CFSR-NCEP Re-Analysis dataset. Different wind products alter results, affecting the accuracy obtained. The scope of this study is to assess different available wind databases and provide information concerning the most appropriate wind dataset for the specific region, based on temporal, spatial and geographic terms for wave modelling and offshore applications. Both wind input datasets delivered results from the numerical wave model with good correlation. Wave results by the 1-h dataset have higher peaks and lower biases, in expense of a high scatter index. On the other hand, the 6-h dataset has lower scatter but higher biases. The study shows how wind dataset affects the numerical wave modelling performance, and that depending on location and study needs, different wind inputs should be considered.

  3. Utilizing the Antarctic Master Directory to find orphan datasets

    Bonczkowski, J.; Carbotte, S. M.; Arko, R. A.; Grebas, S. K.

    2011-12-01

    While most Antarctic data are housed at an established disciplinary-specific data repository, there are data types for which no suitable repository exists. In some cases, these "orphan" data, without an appropriate national archive, are served from local servers by the principal investigators who produced the data. There are many pitfalls with data served privately, including the frequent lack of adequate documentation to ensure the data can be understood by others for re-use and the impermanence of personal web sites. For example, if an investigator leaves an institution and the data moves, the link published is no longer accessible. To ensure continued availability of data, submission to long-term national data repositories is needed. As stated in the National Science Foundation Office of Polar Programs (NSF/OPP) Guidelines and Award Conditions for Scientific Data, investigators are obligated to submit their data for curation and long-term preservation; this includes the registration of a dataset description into the Antarctic Master Directory (AMD), http://gcmd.nasa.gov/Data/portals/amd/. The AMD is a Web-based, searchable directory of thousands of dataset descriptions, known as DIF records, submitted by scientists from over 20 countries. It serves as a node of the International Directory Network/Global Change Master Directory (IDN/GCMD). The US Antarctic Program Data Coordination Center (USAP-DCC), http://www.usap-data.org/, funded through NSF/OPP, was established in 2007 to help streamline the process of data submission and DIF record creation. When data does not quite fit within any existing disciplinary repository, it can be registered within the USAP-DCC as the fallback data repository. Within the scope of the USAP-DCC we undertook the challenge of discovering and "rescuing" orphan datasets currently registered within the AMD. In order to find which DIF records led to data served privately, all records relating to US data within the AMD were parsed. After

  4. External validation of a publicly available computer assisted diagnostic tool for mammographic mass lesions with two high prevalence research datasets.

    Benndorf, Matthias; Burnside, Elizabeth S; Herda, Christoph; Langer, Mathias; Kotter, Elmar

    2015-08-01

    Lesions detected at mammography are described with a highly standardized terminology: the breast imaging-reporting and data system (BI-RADS) lexicon. Up to now, no validated semantic computer assisted classification algorithm exists to interactively link combinations of morphological descriptors from the lexicon to a probabilistic risk estimate of malignancy. The authors therefore aim at the external validation of the mammographic mass diagnosis (MMassDx) algorithm. A classification algorithm like MMassDx must perform well in a variety of clinical circumstances and in datasets that were not used to generate the algorithm in order to ultimately become accepted in clinical routine. The MMassDx algorithm uses a naïve Bayes network and calculates post-test probabilities of malignancy based on two distinct sets of variables, (a) BI-RADS descriptors and age ("descriptor model") and (b) BI-RADS descriptors, age, and BI-RADS assessment categories ("inclusive model"). The authors evaluate both the MMassDx (descriptor) and MMassDx (inclusive) models using two large publicly available datasets of mammographic mass lesions: the digital database for screening mammography (DDSM) dataset, which contains two subsets from the same examinations-a medio-lateral oblique (MLO) view and cranio-caudal (CC) view dataset-and the mammographic mass (MM) dataset. The DDSM contains 1220 mass lesions and the MM dataset contains 961 mass lesions. The authors evaluate discriminative performance using area under the receiver-operating-characteristic curve (AUC) and compare this to the BI-RADS assessment categories alone (i.e., the clinical performance) using the DeLong method. The authors also evaluate whether assigned probabilistic risk estimates reflect the lesions' true risk of malignancy using calibration curves. The authors demonstrate that the MMassDx algorithms show good discriminatory performance. AUC for the MMassDx (descriptor) model in the DDSM data is 0.876/0.895 (MLO/CC view) and AUC

  5. Thiolated and S-protected hydrophobically modified cross-linked poly(acrylic acid)--a new generation of multifunctional polymers.

    Bonengel, Sonja; Haupstein, Sabine; Perera, Glen; Bernkop-Schnürch, Andreas

    2014-10-01

    The aim of this study was to create a novel multifunctional polymer by covalent attachment of l-cysteine to the polymeric backbone of hydrophobically modified cross-linked poly(acrylic acid) (AC1030). Secondly, the free thiol groups of the resulting thiomer were activated using 2-mercaptonicotinic acid (2-MNA) to provide full reactivity and stability. Within this study, 1167.36 μmol cysteine and 865.72 μmol 2-MNA could be coupled per gram polymer. Studies evaluating mucoadhesive properties revealed a 4-fold extended adherence time to native small intestinal mucosa for the thiomer (AC1030-cysteine) as well as an 18-fold prolonged adhesion for the preactivated thiomer (AC1030-Cyst-2-MNA) compared to the unmodified polymer. Modification of the polymer led to a higher tablet stability concerning the thiomer and the S-protected thiomer, but a decelerated water uptake could be observed only for the preactivated thiomer. Neither the novel conjugates nor the unmodified polymer showed severe toxicity on Caco-2 cells. Evaluation of emulsification capacity proofed the ability to incorporate lipophilic compounds like medium chain triglycerides and the preservation of the emulsifying properties after the modifications. According to these results thiolated AC1030 as well as the S-protected thiolated polymer might provide a promising tool for solid and semisolid formulations in pharmaceutical development. Copyright © 2014 Elsevier B.V. All rights reserved.

  6. NGO Presence and Activity in Afghanistan, 2000–2014: A Provincial-Level Dataset

    David F. Mitchell

    2017-06-01

    Full Text Available This article introduces a new provincial-level dataset on non-governmental organizations (NGOs in Afghanistan. The data—which are freely available for download—provide information on the locations and sectors of activity of 891 international and local (Afghan NGOs that operated in the country between 2000 and 2014. A summary and visualization of the data is presented in the article following a brief historical overview of NGOs in Afghanistan. Links to download the full dataset are provided in the conclusion.

  7. VT National Land Cover Dataset - 2001

    Vermont Center for Geographic Information — (Link to Metadata) The NLCD2001 layer available from VCGI is a subset of the the National Land Cover Database 2001 land cover layer for mapping zone 65 was produced...

  8. FTSPlot: fast time series visualization for large datasets.

    Michael Riss

    Full Text Available The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of O(n x log(N; the visualization itself can be done with a complexity of O(1 and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with < 20 ms ms. The current 64-bit implementation theoretically supports datasets with up to 2(64 bytes, on the x86_64 architecture currently up to 2(48 bytes are supported, and benchmarks have been conducted with 2(40 bytes/1 TiB or 1.3 x 10(11 double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments.

  9. A synthetic dataset for evaluating soft and hard fusion algorithms

    Graham, Jacob L.; Hall, David L.; Rimland, Jeffrey

    2011-06-01

    There is an emerging demand for the development of data fusion techniques and algorithms that are capable of combining conventional "hard" sensor inputs such as video, radar, and multispectral sensor data with "soft" data including textual situation reports, open-source web information, and "hard/soft" data such as image or video data that includes human-generated annotations. New techniques that assist in sense-making over a wide range of vastly heterogeneous sources are critical to improving tactical situational awareness in counterinsurgency (COIN) and other asymmetric warfare situations. A major challenge in this area is the lack of realistic datasets available for test and evaluation of such algorithms. While "soft" message sets exist, they tend to be of limited use for data fusion applications due to the lack of critical message pedigree and other metadata. They also lack corresponding hard sensor data that presents reasonable "fusion opportunities" to evaluate the ability to make connections and inferences that span the soft and hard data sets. This paper outlines the design methodologies, content, and some potential use cases of a COIN-based synthetic soft and hard dataset created under a United States Multi-disciplinary University Research Initiative (MURI) program funded by the U.S. Army Research Office (ARO). The dataset includes realistic synthetic reports from a variety of sources, corresponding synthetic hard data, and an extensive supporting database that maintains "ground truth" through logical grouping of related data into "vignettes." The supporting database also maintains the pedigree of messages and other critical metadata.

  10. Integrated dataset of impact of dissolved organic matter on particle behavior and phototoxicity of titanium dioxide nanoparticles

    U.S. Environmental Protection Agency — This dataset is generated to both qualitatively and quantitatively examine the interactions between nano-TiO2 and natural organic matter (NOM). This integrated...

  11. Assessment of NASA's Physiographic and Meteorological Datasets as Input to HSPF and SWAT Hydrological Models

    Alacron, Vladimir J.; Nigro, Joseph D.; McAnally, William H.; OHara, Charles G.; Engman, Edwin Ted; Toll, David

    2011-01-01

    This paper documents the use of simulated Moderate Resolution Imaging Spectroradiometer land use/land cover (MODIS-LULC), NASA-LIS generated precipitation and evapo-transpiration (ET), and Shuttle Radar Topography Mission (SRTM) datasets (in conjunction with standard land use, topographical and meteorological datasets) as input to hydrological models routinely used by the watershed hydrology modeling community. The study is focused in coastal watersheds in the Mississippi Gulf Coast although one of the test cases focuses in an inland watershed located in northeastern State of Mississippi, USA. The decision support tools (DSTs) into which the NASA datasets were assimilated were the Soil Water & Assessment Tool (SWAT) and the Hydrological Simulation Program FORTRAN (HSPF). These DSTs are endorsed by several US government agencies (EPA, FEMA, USGS) for water resources management strategies. These models use physiographic and meteorological data extensively. Precipitation gages and USGS gage stations in the region were used to calibrate several HSPF and SWAT model applications. Land use and topographical datasets were swapped to assess model output sensitivities. NASA-LIS meteorological data were introduced in the calibrated model applications for simulation of watershed hydrology for a time period in which no weather data were available (1997-2006). The performance of the NASA datasets in the context of hydrological modeling was assessed through comparison of measured and model-simulated hydrographs. Overall, NASA datasets were as useful as standard land use, topographical , and meteorological datasets. Moreover, NASA datasets were used for performing analyses that the standard datasets could not made possible, e.g., introduction of land use dynamics into hydrological simulations

  12. D'' Layer Activation via Tidal Dissipation: A Link Between Non-Hydrostatic Ellipticity, Non-Chondritic Heat Flux, and Non-Plume Head Generation of Flood Basalts

    Hager, B. H.; Mazarico, E.; Touma, J.; Wisdom, J.

    2003-12-01

    Quantitative understanding of Earth's heat budget has eluded a list of distinguished physicists and geochemists ranging from Lord Kelvin to Don L Anderson. The global heat flux is substantially greater than that generated by the estimated inventory of radioactive heat sources, so simple energy balance considerations demand an additional heat source. Secular cooling is commonly invoked to balance Earth's energy budget, but the required cooling rates are difficult to reconcile with both traditional convection calculations and petrologic estimates of ancient upper mantle temperatures. A non-geochemical heat source seems plausible. Indeed, Tuoma and Wisdom (Astron. J., 122, 2001) showed that tidal dissipation of rotational energy associated with resonant coupling could provide a substantial heat pulse to the CMB. D'' Layer Activation (DLA) by dumping of rotational energy could have important geodynamical consequences that we explore here. DLA could lead to a sudden (but modest) increase in the temperature of preexisting plumes, leading to a sudden increase in melt volume without the need for a troublesome plume head. The dissipation depends on non-hydrostatic CMB ellipticity, which itself is a result of mantle convection, leading to the possibility of an important feedback mechanism - DLA would lead to an increase in CMB ellipticity, further increasing the geodynamic importance of DLA.

  13. Robust computational analysis of rRNA hypervariable tag datasets.

    Maksim Sipos

    Full Text Available Next-generation DNA sequencing is increasingly being utilized to probe microbial communities, such as gastrointestinal microbiomes, where it is important to be able to quantify measures of abundance and diversity. The fragmented nature of the 16S rRNA datasets obtained, coupled with their unprecedented size, has led to the recognition that the results of such analyses are potentially contaminated by a variety of artifacts, both experimental and computational. Here we quantify how multiple alignment and clustering errors contribute to overestimates of abundance and diversity, reflected by incorrect OTU assignment, corrupted phylogenies, inaccurate species diversity estimators, and rank abundance distribution functions. We show that straightforward procedural optimizations, combining preexisting tools, are effective in handling large (10(5-10(6 16S rRNA datasets, and we describe metrics to measure the effectiveness and quality of the estimators obtained. We introduce two metrics to ascertain the quality of clustering of pyrosequenced rRNA data, and show that complete linkage clustering greatly outperforms other widely used methods.

  14. Predicting weather regime transitions in Northern Hemisphere datasets

    Kondrashov, D. [University of California, Department of Atmospheric and Oceanic Sciences and Institute of Geophysics and Planetary Physics, Los Angeles, CA (United States); Shen, J. [UCLA, Department of Statistics, Los Angeles, CA (United States); Berk, R. [UCLA, Department of Statistics, Los Angeles, CA (United States); University of Pennsylvania, Department of Criminology, Philadelphia, PA (United States); D' Andrea, F.; Ghil, M. [Ecole Normale Superieure, Departement Terre-Atmosphere-Ocean and Laboratoire de Meteorologie Dynamique (CNRS and IPSL), Paris Cedex 05 (France)

    2007-10-15

    A statistical learning method called random forests is applied to the prediction of transitions between weather regimes of wintertime Northern Hemisphere (NH) atmospheric low-frequency variability. A dataset composed of 55 winters of NH 700-mb geopotential height anomalies is used in the present study. A mixture model finds that the three Gaussian components that were statistically significant in earlier work are robust; they are the Pacific-North American (PNA) regime, its approximate reverse (the reverse PNA, or RNA), and the blocked phase of the North Atlantic Oscillation (BNAO). The most significant and robust transitions in the Markov chain generated by these regimes are PNA {yields} BNAO, PNA {yields} RNA and BNAO {yields} PNA. The break of a regime and subsequent onset of another one is forecast for these three transitions. Taking the relative costs of false positives and false negatives into account, the random-forests method shows useful forecasting skill. The calculations are carried out in the phase space spanned by a few leading empirical orthogonal functions of dataset variability. Plots of estimated response functions to a given predictor confirm the crucial influence of the exit angle on a preferred transition path. This result points to the dynamic origin of the transitions. (orig.)

  15. Quality Controlling CMIP datasets at GFDL

    Horowitz, L. W.; Radhakrishnan, A.; Balaji, V.; Adcroft, A.; Krasting, J. P.; Nikonov, S.; Mason, E. E.; Schweitzer, R.; Nadeau, D.

    2017-12-01

    As GFDL makes the switch from model development to production in light of the Climate Model Intercomparison Project (CMIP), GFDL's efforts are shifted to testing and more importantly establishing guidelines and protocols for Quality Controlling and semi-automated data publishing. Every CMIP cycle introduces key challenges and the upcoming CMIP6 is no exception. The new CMIP experimental design comprises of multiple MIPs facilitating research in different focus areas. This paradigm has implications not only for the groups that develop the models and conduct the runs, but also for the groups that monitor, analyze and quality control the datasets before data publishing, before their knowledge makes its way into reports like the IPCC (Intergovernmental Panel on Climate Change) Assessment Reports. In this talk, we discuss some of the paths taken at GFDL to quality control the CMIP-ready datasets including: Jupyter notebooks, PrePARE, LAMP (Linux, Apache, MySQL, PHP/Python/Perl): technology-driven tracker system to monitor the status of experiments qualitatively and quantitatively, provide additional metadata and analysis services along with some in-built controlled-vocabulary validations in the workflow. In addition to this, we also discuss the integration of community-based model evaluation software (ESMValTool, PCMDI Metrics Package, and ILAMB) as part of our CMIP6 workflow.

  16. Integrated remotely sensed datasets for disaster management

    McCarthy, Timothy; Farrell, Ronan; Curtis, Andrew; Fotheringham, A. Stewart

    2008-10-01

    Video imagery can be acquired from aerial, terrestrial and marine based platforms and has been exploited for a range of remote sensing applications over the past two decades. Examples include coastal surveys using aerial video, routecorridor infrastructures surveys using vehicle mounted video cameras, aerial surveys over forestry and agriculture, underwater habitat mapping and disaster management. Many of these video systems are based on interlaced, television standards such as North America's NTSC and European SECAM and PAL television systems that are then recorded using various video formats. This technology has recently being employed as a front-line, remote sensing technology for damage assessment post-disaster. This paper traces the development of spatial video as a remote sensing tool from the early 1980s to the present day. The background to a new spatial-video research initiative based at National University of Ireland, Maynooth, (NUIM) is described. New improvements are proposed and include; low-cost encoders, easy to use software decoders, timing issues and interoperability. These developments will enable specialists and non-specialists collect, process and integrate these datasets within minimal support. This integrated approach will enable decision makers to access relevant remotely sensed datasets quickly and so, carry out rapid damage assessment during and post-disaster.

  17. Operative Links

    Wistoft, Karen; Højlund, Holger

    2012-01-01

    and have been the object of great expectations concerning the ability to incorporate health concerns into every welfare area through health promotion strategies. The paper draws on results and analyses of a collective research project funded by the Danish National Research Council and carried out...... links' that indicate cooperative levels which facilitate a creative and innovative effort in disease prevention and health promotion targeted at children and adolescents - across traditional professional boundaries. It is proposed that such links are supported by network structures, shared semantics...

  18. PhenoLink - a web-tool for linking phenotype to ~omics data for bacteria: application to gene-trait matching for Lactobacillus plantarum strains

    Bayjanov Jumamurat R

    2012-05-01

    Full Text Available Abstract Background Linking phenotypes to high-throughput molecular biology information generated by ~omics technologies allows revealing cellular mechanisms underlying an organism's phenotype. ~Omics datasets are often very large and noisy with many features (e.g., genes, metabolite abundances. Thus, associating phenotypes to ~omics data requires an approach that is robust to noise and can handle large and diverse data sets. Results We developed a web-tool PhenoLink (http://bamics2.cmbi.ru.nl/websoftware/phenolink/ that links phenotype to ~omics data sets using well-established as well new techniques. PhenoLink imputes missing values and preprocesses input data (i to decrease inherent noise in the data and (ii to counterbalance pitfalls of the Random Forest algorithm, on which feature (e.g., gene selection is based. Preprocessed data is used in feature (e.g., gene selection to identify relations to phenotypes. We applied PhenoLink to identify gene-phenotype relations based on the presence/absence of 2847 genes in 42 Lactobacillus plantarum strains and phenotypic measurements of these strains in several experimental conditions, including growth on sugars and nitrogen-dioxide production. Genes were ranked based on their importance (predictive value to correctly predict the phenotype of a given strain. In addition to known gene to phenotype relations we also found novel relations. Conclusions PhenoLink is an easily accessible web-tool to facilitate identifying relations from large and often noisy phenotype and ~omics datasets. Visualization of links to phenotypes offered in PhenoLink allows prioritizing links, finding relations between features, finding relations between phenotypes, and identifying outliers in phenotype data. PhenoLink can be used to uncover phenotype links to a multitude of ~omics data, e.g., gene presence/absence (determined by e.g.: CGH or next-generation sequencing, gene expression (determined by e.g.: microarrays or RNA

  19. Visual Comparison of Multiple Gene Expression Datasets in a Genomic Context

    Borowski Krzysztof

    2008-06-01

    Full Text Available The need for novel methods of visualizing microarray data is growing. New perspectives are beneficial to finding patterns in expression data. The Bluejay genome browser provides an integrative way of visualizing gene expression datasets in a genomic context. We have now developed the functionality to display multiple microarray datasets simultaneously in Bluejay, in order to provide researchers with a comprehensive view of their datasets linked to a graphical representation of gene function. This will enable biologists to obtain valuable insights on expression patterns, by allowing them to analyze the expression values in relation to the gene locations as well as to compare expression profiles of related genomes or of di erent experiments for the same genome.

  20. Benchmarking of Typical Meteorological Year datasets dedicated to Concentrated-PV systems

    Realpe, Ana Maria; Vernay, Christophe; Pitaval, Sébastien; Blanc, Philippe; Wald, Lucien; Lenoir, Camille

    2016-04-01

    Accurate analysis of meteorological and pyranometric data for long-term analysis is the basis of decision-making for banks and investors, regarding solar energy conversion systems. This has led to the development of methodologies for the generation of Typical Meteorological Years (TMY) datasets. The most used method for solar energy conversion systems was proposed in 1978 by the Sandia Laboratory (Hall et al., 1978) considering a specific weighted combination of different meteorological variables with notably global, diffuse horizontal and direct normal irradiances, air temperature, wind speed, relative humidity. In 2012, a new approach was proposed in the framework of the European project FP7 ENDORSE. It introduced the concept of "driver" that is defined by the user as an explicit function of the pyranometric and meteorological relevant variables to improve the representativeness of the TMY datasets with respect the specific solar energy conversion system of interest. The present study aims at comparing and benchmarking different TMY datasets considering a specific Concentrated-PV (CPV) system as the solar energy conversion system of interest. Using long-term (15+ years) time-series of high quality meteorological and pyranometric ground measurements, three types of TMY datasets generated by the following methods: the Sandia method, a simplified driver with DNI as the only representative variable and a more sophisticated driver. The latter takes into account the sensitivities of the CPV system with respect to the spectral distribution of the solar irradiance and wind speed. Different TMY datasets from the three methods have been generated considering different numbers of years in the historical dataset, ranging from 5 to 15 years. The comparisons and benchmarking of these TMY datasets are conducted considering the long-term time series of simulated CPV electric production as a reference. The results of this benchmarking clearly show that the Sandia method is not

  1. Scandinavian links

    Matthiessen, Christian Wichmann; Knowles, Richard D.

    2014-01-01

    are impressive mega structures spanning international waterways. These waterways between the Baltic Sea and the North Sea have played major roles in history. The length of each of the crossings are around 20 km. The fixed links closes gaps between the Scandinavian and European motorway and rail networks...

  2. A first dataset toward a standardized community-driven global mapping of the human immunopeptidome

    Pouya Faridi

    2016-06-01

    Full Text Available We present the first standardized HLA peptidomics dataset generated by the immunopeptidomics community. The dataset is composed of native HLA class I peptides as well as synthetic HLA class II peptides that were acquired in data-dependent acquisition mode using multiple types of mass spectrometers. All laboratories used the spiked-in landmark iRT peptides for retention time normalization and data analysis. The mass spectrometric data were deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier http://www.ebi.ac.uk/pride/archive/projects/PXD001872. The generated data were used to build HLA allele-specific peptide spectral and assay libraries, which were stored in the SWATHAtlas database. Data presented here are described in more detail in the original eLife article entitled ‘An open-source computational and data resource to analyze digital maps of immunopeptidomes’.

  3. Predicting dataset popularity for the CMS experiment

    INSPIRE-00005122; Li, Ting; Giommi, Luca; Bonacorsi, Daniele; Wildish, Tony

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure.

  4. Predicting dataset popularity for the CMS experiment

    Kuznetsov, V.; Li, T.; Giommi, L.; Bonacorsi, D.; Wildish, T.

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure. (paper)

  5. Internationally coordinated glacier monitoring: strategy and datasets

    Hoelzle, Martin; Armstrong, Richard; Fetterer, Florence; Gärtner-Roer, Isabelle; Haeberli, Wilfried; Kääb, Andreas; Kargel, Jeff; Nussbaumer, Samuel; Paul, Frank; Raup, Bruce; Zemp, Michael

    2014-05-01

    (c) the Randolph Glacier Inventory (RGI), a new and globally complete digital dataset of outlines from about 180,000 glaciers with some meta-information, which has been used for many applications relating to the IPCC AR5 report. Concerning glacier changes, a database (Fluctuations of Glaciers) exists containing information about mass balance, front variations including past reconstructed time series, geodetic changes and special events. Annual mass balance reporting contains information for about 125 glaciers with a subset of 37 glaciers with continuous observational series since 1980 or earlier. Front variation observations of around 1800 glaciers are available from most of the mountain ranges world-wide. This database was recently updated with 26 glaciers having an unprecedented dataset of length changes from from reconstructions of well-dated historical evidence going back as far as the 16th century. Geodetic observations of about 430 glaciers are available. The database is completed by a dataset containing information on special events including glacier surges, glacier lake outbursts, ice avalanches, eruptions of ice-clad volcanoes, etc. related to about 200 glaciers. A special database of glacier photographs contains 13,000 pictures from around 500 glaciers, some of them dating back to the 19th century. A key challenge is to combine and extend the traditional observations with fast evolving datasets from new technologies.

  6. 2006 Fynmeet sea clutter measurement trial: Datasets

    Herselman, PLR

    2007-09-06

    Full Text Available -011............................................................................................................................................................................................. 25 iii Dataset CAD14-001 0 5 10 15 20 25 30 35 10 20 30 40 50 60 70 80 90 R an ge G at e # Time [s] A bs ol ut e R an ge [m ] RCS [dBm2] vs. time and range for f1 = 9.000 GHz - CAD14-001 2400 2600 2800... 40 10 20 30 40 50 60 70 80 90 R an ge G at e # Time [s] A bs ol ut e R an ge [m ] RCS [dBm2] vs. time and range for f1 = 9.000 GHz - CAD14-002 2400 2600 2800 3000 3200 3400 3600 -30 -25 -20 -15 -10 -5 0 5 10...

  7. Exudate-based diabetic macular edema detection in fundus images using publicly available datasets

    Giancardo, Luca [ORNL; Meriaudeau, Fabrice [ORNL; Karnowski, Thomas Paul [ORNL; Li, Yaquin [University of Tennessee, Knoxville (UTK); Garg, Seema [University of North Carolina; Tobin Jr, Kenneth William [ORNL; Chaum, Edward [University of Tennessee, Knoxville (UTK)

    2011-01-01

    Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME through the presence of exudation. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing (e.g., the classifier was trained on an independent dataset and tested on MESSIDOR). Our algorithm obtained an AUC between 0.88 and 0.94 depending on the dataset/features used. Additionally, it does not need ground truth at lesion level to reject false positives and is computationally efficient, as it generates a diagnosis on an average of 4.4 s (9.3 s, considering the optic nerve localization) per image on an 2.6 GHz platform with an unoptimized Matlab implementation.

  8. A new bed elevation dataset for Greenland

    J. L. Bamber

    2013-03-01

    Full Text Available We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2012. Around 420 000 line kilometres of airborne data were used, with roughly 70% of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non-glaciated terrain to produce a consistent bed digital elevation model (DEM over the entire island including across the glaciated–ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice thickness was determined where an ice shelf exists from a combination of surface elevation and radar soundings. The across-track spacing between flight lines warranted interpolation at 1 km postings for significant sectors of the ice sheet. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±10 m to about ±300 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new datasets, particularly along the ice sheet margin, where ice velocity is highest and changes in ice dynamics most marked. We estimate that the volume of ice included in our land-ice mask would raise mean sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  9. The French Muséum national d'histoire naturelle vascular plant herbarium collection dataset

    Le Bras, Gwenaël; Pignal, Marc; Jeanson, Marc L.; Muller, Serge; Aupic, Cécile; Carré, Benoît; Flament, Grégoire; Gaudeul, Myriam; Gonçalves, Claudia; Invernón, Vanessa R.; Jabbour, Florian; Lerat, Elodie; Lowry, Porter P.; Offroy, Bérangère; Pimparé, Eva Pérez; Poncy, Odile; Rouhan, Germinal; Haevermans, Thomas

    2017-02-01

    We provide a quantitative description of the French national herbarium vascular plants collection dataset. Held at the Muséum national d'histoire naturelle, Paris, it currently comprises records for 5,400,000 specimens, representing 90% of the estimated total of specimens. Ninety nine percent of the specimen entries are linked to one or more images and 16% have field-collecting information available. This major botanical collection represents the results of over three centuries of exploration and study. The sources of the collection are global, with a strong representation for France, including overseas territories, and former French colonies. The compilation of this dataset was made possible through numerous national and international projects, the most important of which was linked to the renovation of the herbarium building. The vascular plant collection is actively expanding today, hence the continuous growth exhibited by the dataset, which can be fully accessed through the GBIF portal or the MNHN database portal (available at: https://science.mnhn.fr/institution/mnhn/collection/p/item/search/form). This dataset is a major source of data for systematics, global plants macroecological studies or conservation assessments.

  10. The French Muséum national d’histoire naturelle vascular plant herbarium collection dataset

    Le Bras, Gwenaël; Pignal, Marc; Jeanson, Marc L.; Muller, Serge; Aupic, Cécile; Carré, Benoît; Flament, Grégoire; Gaudeul, Myriam; Gonçalves, Claudia; Invernón, Vanessa R.; Jabbour, Florian; Lerat, Elodie; Lowry, Porter P.; Offroy, Bérangère; Pimparé, Eva Pérez; Poncy, Odile; Rouhan, Germinal; Haevermans, Thomas

    2017-01-01

    We provide a quantitative description of the French national herbarium vascular plants collection dataset. Held at the Muséum national d’histoire naturelle, Paris, it currently comprises records for 5,400,000 specimens, representing 90% of the estimated total of specimens. Ninety nine percent of the specimen entries are linked to one or more images and 16% have field-collecting information available. This major botanical collection represents the results of over three centuries of exploration and study. The sources of the collection are global, with a strong representation for France, including overseas territories, and former French colonies. The compilation of this dataset was made possible through numerous national and international projects, the most important of which was linked to the renovation of the herbarium building. The vascular plant collection is actively expanding today, hence the continuous growth exhibited by the dataset, which can be fully accessed through the GBIF portal or the MNHN database portal (available at: https://science.mnhn.fr/institution/mnhn/collection/p/item/search/form). This dataset is a major source of data for systematics, global plants macroecological studies or conservation assessments. PMID:28195585

  11. Drugs + HIV, Learn the Link

    Full Text Available ... Link" campaign continues to raise awareness among this generation of the real risks of drug use for ... Resource Center (NWHRC) Mujeres Unidas Contra el SIDA New Mexico AIDS Services African Advocates Against AIDS The ...

  12. Drugs + HIV, Learn the Link

    Full Text Available ... NIDA’s "Learn the Link" campaign continues to raise awareness among this generation of the real risks of ... Collaborators Thanks to Those Who Have Helped Raise Awareness of Our Campaign! NIDA acknowledges the following television ...

  13. Technical note: An inorganic water chemistry dataset (1972–2011 ...

    A national dataset of inorganic chemical data of surface waters (rivers, lakes, and dams) in South Africa is presented and made freely available. The dataset comprises more than 500 000 complete water analyses from 1972 up to 2011, collected from more than 2 000 sample monitoring stations in South Africa. The dataset ...

  14. QSAR ligand dataset for modelling mutagenicity, genotoxicity, and rodent carcinogenicity

    Davy Guan

    2018-04-01

    Full Text Available Five datasets were constructed from ligand and bioassay result data from the literature. These datasets include bioassay results from the Ames mutagenicity assay, Greenscreen GADD-45a-GFP assay, Syrian Hamster Embryo (SHE assay, and 2 year rat carcinogenicity assay results. These datasets provide information about chemical mutagenicity, genotoxicity and carcinogenicity.

  15. EVALUATION OF LAND USE/LAND COVER DATASETS FOR URBAN WATERSHED MODELING

    S.J. BURIAN; M.J. BROWN; T.N. MCPHERSON

    2001-01-01

    Land use/land cover (LULC) data are a vital component for nonpoint source pollution modeling. Most watershed hydrology and pollutant loading models use, in some capacity, LULC information to generate runoff and pollutant loading estimates. Simple equation methods predict runoff and pollutant loads using runoff coefficients or pollutant export coefficients that are often correlated to LULC type. Complex models use input variables and parameters to represent watershed characteristics and pollutant buildup and washoff rates as a function of LULC type. Whether using simple or complex models an accurate LULC dataset with an appropriate spatial resolution and level of detail is paramount for reliable predictions. The study presented in this paper compared and evaluated several LULC dataset sources for application in urban environmental modeling. The commonly used USGS LULC datasets have coarser spatial resolution and lower levels of classification than other LULC datasets. In addition, the USGS datasets do not accurately represent the land use in areas that have undergone significant land use change during the past two decades. We performed a watershed modeling analysis of three urban catchments in Los Angeles, California, USA to investigate the relative difference in average annual runoff volumes and total suspended solids (TSS) loads when using the USGS LULC dataset versus using a more detailed and current LULC dataset. When the two LULC datasets were aggregated to the same land use categories, the relative differences in predicted average annual runoff volumes and TSS loads from the three catchments were 8 to 14% and 13 to 40%, respectively. The relative differences did not have a predictable relationship with catchment size

  16. MicroRNA Array Normalization: An Evaluation Using a Randomized Dataset as the Benchmark

    Qin, Li-Xuan; Zhou, Qin

    2014-01-01

    MicroRNA arrays possess a number of unique data features that challenge the assumption key to many normalization methods. We assessed the performance of existing normalization methods using two microRNA array datasets derived from the same set of tumor samples: one dataset was generated using a blocked randomization design when assigning arrays to samples and hence was free of confounding array effects; the second dataset was generated without blocking or randomization and exhibited array effects. The randomized dataset was assessed for differential expression between two tumor groups and treated as the benchmark. The non-randomized dataset was assessed for differential expression after normalization and compared against the benchmark. Normalization improved the true positive rate significantly in the non-randomized data but still possessed a false discovery rate as high as 50%. Adding a batch adjustment step before normalization further reduced the number of false positive markers while maintaining a similar number of true positive markers, which resulted in a false discovery rate of 32% to 48%, depending on the specific normalization method. We concluded the paper with some insights on possible causes of false discoveries to shed light on how to improve normalization for microRNA arrays. PMID:24905456

  17. Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space

    Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal

    2008-01-01

    Motivation: UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. Application: We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without ex...

  18. On the link between oil and commodity prices: a panel VAR approach

    Bremond, Vincent; Hache, Emmanuel; Joets, Marc

    2013-12-01

    The aim of this paper is to study the relationships between the price of oil and a large dataset of commodity prices, relying on panel data settings. Using second generation panel co-integration tests, our findings show that the WTI and commodity prices are not linked in the long term. Nevertheless, considering our results in causality tests, we show that short-run relations exist, mainly from the price of crude oil to commodity prices. We thus implement a panel VAR estimation with an impulse response function analysis. Two main conclusions emerge: (i) fast co-movements are highlighted, while (ii) market efficiency is emphasized. (authors)

  19. Facing the Challenges of Accessing, Managing, and Integrating Large Observational Datasets in Ecology: Enabling and Enriching the Use of NEON's Observational Data

    Thibault, K. M.

    2013-12-01

    As the construction of NEON and its transition to operations progresses, more and more data will become available to the scientific community, both from NEON directly and from the concomitant growth of existing data repositories. Many of these datasets include ecological observations of a diversity of taxa in both aquatic and terrestrial environments. Although observational data have been collected and used throughout the history of organismal biology, the field has not yet fully developed a culture of data management, documentation, standardization, sharing and discoverability to facilitate the integration and synthesis of datasets. Moreover, the tools required to accomplish these goals, namely database design, implementation, and management, and automation and parallelization of analytical tasks through computational techniques, have not historically been included in biology curricula, at either the undergraduate or graduate levels. To ensure the success of data-generating projects like NEON in advancing organismal ecology and to increase transparency and reproducibility of scientific analyses, an acceleration of the cultural shift to open science practices, the development and adoption of data standards, such as the DarwinCore standard for taxonomic data, and increased training in computational approaches for biologists need to be realized. Here I highlight several initiatives that are intended to increase access to and discoverability of publicly available datasets and equip biologists and other scientists with the skills that are need to manage, integrate, and analyze data from multiple large-scale projects. The EcoData Retriever (ecodataretriever.org) is a tool that downloads publicly available datasets, re-formats the data into an efficient relational database structure, and then automatically imports the data tables onto a user's local drive into the database tool of the user's choice. The automation of these tasks results in nearly instantaneous execution

  20. Multilayered complex network datasets for three supply chain network archetypes on an urban road grid.

    Viljoen, Nadia M; Joubert, Johan W

    2018-02-01

    This article presents the multilayered complex network formulation for three different supply chain network archetypes on an urban road grid and describes how 500 instances were randomly generated for each archetype. Both the supply chain network layer and the urban road network layer are directed unweighted networks. The shortest path set is calculated for each of the 1 500 experimental instances. The datasets are used to empirically explore the impact that the supply chain's dependence on the transport network has on its vulnerability in Viljoen and Joubert (2017) [1]. The datasets are publicly available on Mendeley (Joubert and Viljoen, 2017) [2].

  1. Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets.

    Washburne, Alex D; Silverman, Justin D; Leff, Jonathan W; Bennett, Dominic J; Darcy, John L; Mukherjee, Sayan; Fierer, Noah; David, Lawrence A

    2017-01-01

    Marker gene sequencing of microbial communities has generated big datasets of microbial relative abundances varying across environmental conditions, sample sites and treatments. These data often come with putative phylogenies, providing unique opportunities to investigate how shared evolutionary history affects microbial abundance patterns. Here, we present a method to identify the phylogenetic factors driving patterns in microbial community composition. We use the method, "phylofactorization," to re-analyze datasets from the human body and soil microbial communities, demonstrating how phylofactorization is a dimensionality-reducing tool, an ordination-visualization tool, and an inferential tool for identifying edges in the phylogeny along which putative functional ecological traits may have arisen.

  2. Statistical segmentation of multidimensional brain datasets

    Desco, Manuel; Gispert, Juan D.; Reig, Santiago; Santos, Andres; Pascau, Javier; Malpica, Norberto; Garcia-Barreno, Pedro

    2001-07-01

    This paper presents an automatic segmentation procedure for MRI neuroimages that overcomes part of the problems involved in multidimensional clustering techniques like partial volume effects (PVE), processing speed and difficulty of incorporating a priori knowledge. The method is a three-stage procedure: 1) Exclusion of background and skull voxels using threshold-based region growing techniques with fully automated seed selection. 2) Expectation Maximization algorithms are used to estimate the probability density function (PDF) of the remaining pixels, which are assumed to be mixtures of gaussians. These pixels can then be classified into cerebrospinal fluid (CSF), white matter and grey matter. Using this procedure, our method takes advantage of using the full covariance matrix (instead of the diagonal) for the joint PDF estimation. On the other hand, logistic discrimination techniques are more robust against violation of multi-gaussian assumptions. 3) A priori knowledge is added using Markov Random Field techniques. The algorithm has been tested with a dataset of 30 brain MRI studies (co-registered T1 and T2 MRI). Our method was compared with clustering techniques and with template-based statistical segmentation, using manual segmentation as a gold-standard. Our results were more robust and closer to the gold-standard.

  3. Privacy-preserving record linkage on large real world datasets.

    Randall, Sean M; Ferrante, Anna M; Boyd, James H; Bauer, Jacqueline K; Semmens, James B

    2014-08-01

    Record linkage typically involves the use of dedicated linkage units who are supplied with personally identifying information to determine individuals from within and across datasets. The personally identifying information supplied to linkage units is separated from clinical information prior to release by data custodians. While this substantially reduces the risk of disclosure of sensitive information, some residual risks still exist and remain a concern for some custodians. In this paper we trial a method of record linkage which reduces privacy risk still further on large real world administrative data. The method uses encrypted personal identifying information (bloom filters) in a probability-based linkage framework. The privacy preserving linkage method was tested on ten years of New South Wales (NSW) and Western Australian (WA) hospital admissions data, comprising in total over 26 million records. No difference in linkage quality was found when the results were compared to traditional probabilistic methods using full unencrypted personal identifiers. This presents as a possible means of reducing privacy risks related to record linkage in population level research studies. It is hoped that through adaptations of this method or similar privacy preserving methods, risks related to information disclosure can be reduced so that the benefits of linked research taking place can be fully realised. Copyright © 2013 Elsevier Inc. All rights reserved.

  4. Analysis of Public Datasets for Wearable Fall Detection Systems.

    Casilari, Eduardo; Santoyo-Ramón, José-Antonio; Cano-García, José-Manuel

    2017-06-27

    Due to the boom of wireless handheld devices such as smartwatches and smartphones, wearable Fall Detection Systems (FDSs) have become a major focus of attention among the research community during the last years. The effectiveness of a wearable FDS must be contrasted against a wide variety of measurements obtained from inertial sensors during the occurrence of falls and Activities of Daily Living (ADLs). In this regard, the access to public databases constitutes the basis for an open and systematic assessment of fall detection techniques. This paper reviews and appraises twelve existing available data repositories containing measurements of ADLs and emulated falls envisaged for the evaluation of fall detection algorithms in wearable FDSs. The analysis of the found datasets is performed in a comprehensive way, taking into account the multiple factors involved in the definition of the testbeds deployed for the generation of the mobility samples. The study of the traces brings to light the lack of a common experimental benchmarking procedure and, consequently, the large heterogeneity of the datasets from a number of perspectives (length and number of samples, typology of the emulated falls and ADLs, characteristics of the test subjects, features and positions of the sensors, etc.). Concerning this, the statistical analysis of the samples reveals the impact of the sensor range on the reliability of the traces. In addition, the study evidences the importance of the selection of the ADLs and the need of categorizing the ADLs depending on the intensity of the movements in order to evaluate the capability of a certain detection algorithm to discriminate falls from ADLs.

  5. Analysis of Public Datasets for Wearable Fall Detection Systems

    Eduardo Casilari

    2017-06-01

    Full Text Available Due to the boom of wireless handheld devices such as smartwatches and smartphones, wearable Fall Detection Systems (FDSs have become a major focus of attention among the research community during the last years. The effectiveness of a wearable FDS must be contrasted against a wide variety of measurements obtained from inertial sensors during the occurrence of falls and Activities of Daily Living (ADLs. In this regard, the access to public databases constitutes the basis for an open and systematic assessment of fall detection techniques. This paper reviews and appraises twelve existing available data repositories containing measurements of ADLs and emulated falls envisaged for the evaluation of fall detection algorithms in wearable FDSs. The analysis of the found datasets is performed in a comprehensive way, taking into account the multiple factors involved in the definition of the testbeds deployed for the generation of the mobility samples. The study of the traces brings to light the lack of a common experimental benchmarking procedure and, consequently, the large heterogeneity of the datasets from a number of perspectives (length and number of samples, typology of the emulated falls and ADLs, characteristics of the test subjects, features and positions of the sensors, etc.. Concerning this, the statistical analysis of the samples reveals the impact of the sensor range on the reliability of the traces. In addition, the study evidences the importance of the selection of the ADLs and the need of categorizing the ADLs depending on the intensity of the movements in order to evaluate the capability of a certain detection algorithm to discriminate falls from ADLs.

  6. Linking GHG Emission Trading Systems and Markets

    NONE

    2006-07-01

    Several different types of links are possible between different GHG-mitigation systems. These include: Linking two or more emission trading schemes so that emissions trading can occur both within and between different schemes ('direct links'); and Linking emission trading systems to registries/mechanisms and systems that generate offsets from project based mechanisms or from direct purchases/transfers of AAUs ('indirect links').

  7. Materializing the web of linked data

    Konstantinou, Nikolaos

    2015-01-01

    This book explains the Linked Data domain by adopting a bottom-up approach: it introduces the fundamental Semantic Web technologies and building blocks, which are then combined into methodologies and end-to-end examples for publishing datasets as Linked Data, and use cases that harness scholarly information and sensor data. It presents how Linked Data is used for web-scale data integration, information management and search. Special emphasis is given to the publication of Linked Data from relational databases as well as from real-time sensor data streams. The authors also trace the transformation from the document-based World Wide Web into a Web of Data. Materializing the Web of Linked Data is addressed to researchers and professionals studying software technologies, tools and approaches that drive the Linked Data ecosystem, and the Web in general.

  8. Common integration sites of published datasets identified using a graph-based framework

    Alessandro Vasciaveo

    2016-01-01

    Full Text Available With next-generation sequencing, the genomic data available for the characterization of integration sites (IS has dramatically increased. At present, in a single experiment, several thousand viral integration genome targets can be investigated to define genomic hot spots. In a previous article, we renovated a formal CIS analysis based on a rigid fixed window demarcation into a more stretchy definition grounded on graphs. Here, we present a selection of supporting data related to the graph-based framework (GBF from our previous article, in which a collection of common integration sites (CIS was identified on six published datasets. In this work, we will focus on two datasets, ISRTCGD and ISHIV, which have been previously discussed. Moreover, we show in more detail the workflow design that originates the datasets.

  9. The Problem with Big Data: Operating on Smaller Datasets to Bridge the Implementation Gap.

    Mann, Richard P; Mushtaq, Faisal; White, Alan D; Mata-Cervantes, Gabriel; Pike, Tom; Coker, Dalton; Murdoch, Stuart; Hiles, Tim; Smith, Clare; Berridge, David; Hinchliffe, Suzanne; Hall, Geoff; Smye, Stephen; Wilkie, Richard M; Lodge, J Peter A; Mon-Williams, Mark

    2016-01-01

    Big datasets have the potential to revolutionize public health. However, there is a mismatch between the political and scientific optimism surrounding big data and the public's perception of its benefit. We suggest a systematic and concerted emphasis on developing models derived from smaller datasets to illustrate to the public how big data can produce tangible benefits in the long term. In order to highlight the immediate value of a small data approach, we produced a proof-of-concept model predicting hospital length of stay. The results demonstrate that existing small datasets can be used to create models that generate a reasonable prediction, facilitating health-care delivery. We propose that greater attention (and funding) needs to be directed toward the utilization of existing information resources in parallel with current efforts to create and exploit "big data."

  10. Gulf of Aqaba Field Trip - Datasets

    Hanafy, Sherif M.; Jonsson, Sigurjon; Klinger, Yann

    2013-01-01

    and 2D resistivity imaging to locate the fault. One seismic profile and one 2D resistivity profile are collected at an alluvial fan on the Gulf of Aqaba coast in Saudi Arabia. The collected data are inverted to generate the traveltime tomogram

  11. Privacy Preserving PCA on Distributed Bioinformatics Datasets

    Li, Xin

    2011-01-01

    In recent years, new bioinformatics technologies, such as gene expression microarray, genome-wide association study, proteomics, and metabolomics, have been widely used to simultaneously identify a huge number of human genomic/genetic biomarkers, generate a tremendously large amount of data, and dramatically increase the knowledge on human…

  12. Linked Ocean Data

    Leadbetter, Adam; Arko, Robert; Chandler, Cynthia; Shepherd, Adam

    2014-05-01

    Data repositories. The benefits of this approach include: increased interoperability between the metadata created by projects; improved data discovery as users of SeaDataNet, R2R and BCO-DMO terms can find data using labels with which they are familiar both standard tools and newly developed custom tools may be used to explore the data; and using standards means the custom tools are easier to develop Linked Data is a concept which has been in existence for nearly a decade, and has a simple set of formal best practices associated with it. Linked Data is increasingly being seen as a driver of the next generation of "community science" activities. While many data providers in the oceanographic domain may be unaware of Linked Data, they may also be providing it at one of its lower levels. Here we have shown that it is possible to deliver the highest standard of Linked Oceanographic Data, and some of the benefits of the approach.

  13. The Dataset of Countries at Risk of Electoral Violence

    Birch, Sarah; Muchlinski, David

    2017-01-01

    Electoral violence is increasingly affecting elections around the world, yet researchers have been limited by a paucity of granular data on this phenomenon. This paper introduces and describes a new dataset of electoral violence – the Dataset of Countries at Risk of Electoral Violence (CREV) – that provides measures of 10 different types of electoral violence across 642 elections held around the globe between 1995 and 2013. The paper provides a detailed account of how and why the dataset was ...

  14. Norwegian Hydrological Reference Dataset for Climate Change Studies

    Magnussen, Inger Helene; Killingland, Magnus; Spilde, Dag

    2012-07-01

    Based on the Norwegian hydrological measurement network, NVE has selected a Hydrological Reference Dataset for studies of hydrological change. The dataset meets international standards with high data quality. It is suitable for monitoring and studying the effects of climate change on the hydrosphere and cryosphere in Norway. The dataset includes streamflow, groundwater, snow, glacier mass balance and length change, lake ice and water temperature in rivers and lakes.(Author)

  15. LSD Dimensions: Use and Reuse of Linked Statistical Data

    Meroño-Peñuela, Albert

    2014-01-01

    RDF Data Cube (QB) has boosted the publication of Linked Statistical Data (LSD) on the Web, making them linkable to other related datasets and concepts following the Linked Data paradigm. In this demo we present LSD Dimensions, a web based application that monitors the usage of dimensions and codes

  16. Public Availability to ECS Collected Datasets

    Henderson, J. F.; Warnken, R.; McLean, S. J.; Lim, E.; Varner, J. D.

    2013-12-01

    Coastal nations have spent considerable resources exploring the limits of their extended continental shelf (ECS) beyond 200 nm. Although these studies are funded to fulfill requirements of the UN Convention on the Law of the Sea, the investments are producing new data sets in frontier areas of Earth's oceans that will be used to understand, explore, and manage the seafloor and sub-seafloor for decades to come. Although many of these datasets are considered proprietary until a nation's potential ECS has become 'final and binding' an increasing amount of data are being released and utilized by the public. Data sets include multibeam, seismic reflection/refraction, bottom sampling, and geophysical data. The U.S. ECS Project, a multi-agency collaboration whose mission is to establish the full extent of the continental shelf of the United States consistent with international law, relies heavily on data and accurate, standard metadata. The United States has made it a priority to make available to the public all data collected with ECS-funding as quickly as possible. The National Oceanic and Atmospheric Administration's (NOAA) National Geophysical Data Center (NGDC) supports this objective by partnering with academia and other federal government mapping agencies to archive, inventory, and deliver marine mapping data in a coordinated, consistent manner. This includes ensuring quality, standard metadata and developing and maintaining data delivery capabilities built on modern digital data archives. Other countries, such as Ireland, have submitted their ECS data for public availability and many others have made pledges to participate in the future. The data services provided by NGDC support the U.S. ECS effort as well as many developing nation's ECS effort through the U.N. Environmental Program. Modern discovery, visualization, and delivery of scientific data and derived products that span national and international sources of data ensure the greatest re-use of data and

  17. BIA Indian Lands Dataset (Indian Lands of the United States)

    Federal Geographic Data Committee — The American Indian Reservations / Federally Recognized Tribal Entities dataset depicts feature location, selected demographics and other associated data for the 561...

  18. Framework for Interactive Parallel Dataset Analysis on the Grid

    Alexander, David A.; Ananthan, Balamurali; /Tech-X Corp.; Johnson, Tony; Serbo, Victor; /SLAC

    2007-01-10

    We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.

  19. Socioeconomic Data and Applications Center (SEDAC) Treaty Status Dataset

    National Aeronautics and Space Administration — The Socioeconomic Data and Application Center (SEDAC) Treaty Status Dataset contains comprehensive treaty information for multilateral environmental agreements,...

  20. Modelling and analysis of turbulent datasets using Auto Regressive Moving Average processes

    Faranda, Davide; Dubrulle, Bérengère; Daviaud, François; Pons, Flavio Maria Emanuele; Saint-Michel, Brice; Herbert, Éric; Cortet, Pierre-Philippe

    2014-01-01

    We introduce a novel way to extract information from turbulent datasets by applying an Auto Regressive Moving Average (ARMA) statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new index Υ that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von Kármán swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the Υ is highest in regions where shear layer vortices are present, thereby establishing a link between deviations from the Kolmogorov model and coherent structures. These deviations are consistent with the ones observed by computing the Hurst exponents for the same time series. We show that some salient features of the analysis are preserved when considering global instead of local observables. Finally, we analyze flow configurations with multistability features where the ARMA technique is efficient in discriminating different stability branches of the system

  1. Evaluation of Chicken IgY Generated Against Canine Parvovirus Viral-Like Particles and Development of Enzyme-Linked Immunosorbent Assay and Immunochromatographic Assay for Canine Parvovirus Detection.

    He, Jinxin; Wang, Yuan; Sun, Shiqi; Zhang, Xiaoying

    2015-11-01

    Immunoglobulin Y (IgY) antibodies were generated against canine parvovirus virus-like particles (CPV-VLPs) antigen using chickens. Anti-CPV-VLPs-IgY was extracted from hen egg yolk and used for developing enzyme-linked immunosorbent assay (ELISA) and immunochromatographic assay (ICA) for the detection of CPV in dog feces. The cutoff negative values for anti-CPV-VLPs-IgY were determined using negative fecal samples (already confirmed by polymerase chain reaction [PCR]). In both ELISA and ICA, there was no cross-reaction with other diarrheal pathogens. Thirty-four fecal samples were collected from dogs with diarrhea, of which 26.47% were confirmed as CPV-positive samples by PCR, while 29.41% and 32.35% of the samples were found to be positive by ELISA and ICA, respectively. The developed ELISA and ICA exhibited 97.06% and 94.12% conformity with PCR. Higher sensitivity and specificity were observed for IgY-based ELISA and ICA. Thus, they could be suitable for routine use in the diagnosis of CPV in dogs.

  2. Algorithms for assessing person-based consistency among linked records for the investigation of maternal use of medications and safety

    Duong Tran

    2017-04-01

    Quality assessment indicated high consistency among linked records. The set of algorithms developed in this project can be applied to similar linked perinatal datasets to promote a consistent approach and comparability across studies.

  3. An Analysis of the GTZAN Music Genre Dataset

    Sturm, Bob L.

    2012-01-01

    Most research in automatic music genre recognition has used the dataset assembled by Tzanetakis et al. in 2001. The composition and integrity of this dataset, however, has never been formally analyzed. For the first time, we provide an analysis of its composition, and create a machine...

  4. Really big data: Processing and analysis of large datasets

    Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...

  5. An Annotated Dataset of 14 Cardiac MR Images

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated cardiac MR images. Points of correspondence are placed on each image at the left ventricle (LV). As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  6. A New Outlier Detection Method for Multidimensional Datasets

    Abdel Messih, Mario A.

    2012-07-01

    This study develops a novel hybrid method for outlier detection (HMOD) that combines the idea of distance based and density based methods. The proposed method has two main advantages over most of the other outlier detection methods. The first advantage is that it works well on both dense and sparse datasets. The second advantage is that, unlike most other outlier detection methods that require careful parameter setting and prior knowledge of the data, HMOD is not very sensitive to small changes in parameter values within certain parameter ranges. The only required parameter to set is the number of nearest neighbors. In addition, we made a fully parallelized implementation of HMOD that made it very efficient in applications. Moreover, we proposed a new way of using the outlier detection for redundancy reduction in datasets where the confidence level that evaluates how accurate the less redundant dataset can be used to represent the original dataset can be specified by users. HMOD is evaluated on synthetic datasets (dense and mixed “dense and sparse”) and a bioinformatics problem of redundancy reduction of dataset of position weight matrices (PWMs) of transcription factor binding sites. In addition, in the process of assessing the performance of our redundancy reduction method, we developed a simple tool that can be used to evaluate the confidence level of reduced dataset representing the original dataset. The evaluation of the results shows that our method can be used in a wide range of problems.

  7. Oil palm mapping for Malaysia using PALSAR-2 dataset

    Gong, P.; Qi, C. Y.; Yu, L.; Cracknell, A.

    2016-12-01

    Oil palm is one of the most productive vegetable oil crops in the world. The main oil palm producing areas are distributed in humid tropical areas such as Malaysia, Indonesia, Thailand, western and central Africa, northern South America, and central America. Increasing market demands, high yields and low production costs of palm oil are the primary factors driving large-scale commercial cultivation of oil palm, especially in Malaysia and Indonesia. Global demand for palm oil has grown exponentially during the last 50 years, and the expansion of oil palm plantations is linked directly to the deforestation of natural forests. Satellite remote sensing plays an important role in monitoring expansion of oil palm. However, optical remote sensing images are difficult to acquire in the Tropics because of the frequent occurrence of thick cloud cover. This problem has led to the use of data obtained by synthetic aperture radar (SAR), which is a sensor capable of all-day/all-weather observation for studies in the Tropics. In this study, the ALOS-2 (Advanced Land Observing Satellite) PALSAR-2 (Phased Array type L-band SAR) datasets for year 2015 were used as an input to a support vector machine (SVM) based machine learning algorithm. Oil palm/non-oil palm samples were collected using a hexagonal equal-area sampling design. High-resolution images in Google Earth and PALSAR-2 imagery were used in human photo-interpretation to separate oil palm from others (i.e. cropland, forest, grassland, shrubland, water, hard surface and bareland). The characteristics of oil palms from various aspects, including PALSAR-2 backscattering coefficients (HH, HV), terrain and climate by using this sample set were further explored to post-process the SVM output. The average accuracy of oil palm type is better than 80% in the final oil palm map for Malaysia.

  8. ATLAS File and Dataset Metadata Collection and Use

    Albrand, S; The ATLAS collaboration; Lambert, F; Gallas, E J

    2012-01-01

    The ATLAS Metadata Interface (“AMI”) was designed as a generic cataloguing system, and as such it has found many uses in the experiment including software release management, tracking of reconstructed event sizes and control of dataset nomenclature. The primary use of AMI is to provide a catalogue of datasets (file collections) which is searchable using physics criteria. In this paper we discuss the various mechanisms used for filling the AMI dataset and file catalogues. By correlating information from different sources we can derive aggregate information which is important for physics analysis; for example the total number of events contained in dataset, and possible reasons for missing events such as a lost file. Finally we will describe some specialized interfaces which were developed for the Data Preparation and reprocessing coordinators. These interfaces manipulate information from both the dataset domain held in AMI, and the run-indexed information held in the ATLAS COMA application (Conditions and ...

  9. A dataset on tail risk of commodities markets.

    Powell, Robert J; Vo, Duc H; Pham, Thach N; Singh, Abhay K

    2017-12-01

    This article contains the datasets related to the research article "The long and short of commodity tails and their relationship to Asian equity markets"(Powell et al., 2017) [1]. The datasets contain the daily prices (and price movements) of 24 different commodities decomposed from the S&P GSCI index and the daily prices (and price movements) of three share market indices including World, Asia, and South East Asia for the period 2004-2015. Then, the dataset is divided into annual periods, showing the worst 5% of price movements for each year. The datasets are convenient to examine the tail risk of different commodities as measured by Conditional Value at Risk (CVaR) as well as their changes over periods. The datasets can also be used to investigate the association between commodity markets and share markets.

  10. Transition to Operations Plans for GPM Datasets

    Zavodsky, Bradley; Jedlovec, Gary; Case, Jonathan; Leroy, Anita; Molthan, Andrew; Bell, Jordan; Fuell, Kevin; Stano, Geoffrey

    2013-01-01

    Founded in 2002 at the National Space Science Technology Center at Marshall Space Flight Center in Huntsville, AL. Focused on transitioning unique NASA and NOAA observations and research capabilities to the operational weather community to improve short-term weather forecasts on a regional and local scale. NASA directed funding; NOAA funding from Proving Grounds (PG). Demonstrate capabilities experimental products to weather applications and societal benefit to prepare forecasters for the use of data from next generation of operational satellites. Objective of this poster is to highlight SPoRT's research to operations (R2O) paradigm and provide examples of work done by the team with legacy instruments relevant to GPM in order to promote collaborations with groups developing GPM products.

  11. Integration of public procurement data using linked data

    Jindrich Mynarz

    2014-01-01

    Linked data is frequently casted as a technology for performing integration of distributed datasets on the Web. In this paper, we propose a generic workflow for data integration based on linked data and semantic web technologies. The workflow comes out of an analysis of the application of linked data to integration of public procurement data. It organizes common data integration tasks, including schema alignment, data translation, entity reconciliation, and data fusion, into a sequence of rep...

  12. The Transcriptome Analysis and Comparison Explorer--T-ACE: a platform-independent, graphical tool to process large RNAseq datasets of non-model organisms.

    Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P

    2012-03-15

    Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.

  13. Climate Forcing Datasets for Agricultural Modeling: Merged Products for Gap-Filling and Historical Climate Series Estimation

    Ruane, Alex C.; Goldberg, Richard; Chryssanthacopoulos, James

    2014-01-01

    The AgMERRA and AgCFSR climate forcing datasets provide daily, high-resolution, continuous, meteorological series over the 1980-2010 period designed for applications examining the agricultural impacts of climate variability and climate change. These datasets combine daily resolution data from retrospective analyses (the Modern-Era Retrospective Analysis for Research and Applications, MERRA, and the Climate Forecast System Reanalysis, CFSR) with in situ and remotely-sensed observational datasets for temperature, precipitation, and solar radiation, leading to substantial reductions in bias in comparison to a network of 2324 agricultural-region stations from the Hadley Integrated Surface Dataset (HadISD). Results compare favorably against the original reanalyses as well as the leading climate forcing datasets (Princeton, WFD, WFD-EI, and GRASP), and AgMERRA distinguishes itself with substantially improved representation of daily precipitation distributions and extreme events owing to its use of the MERRA-Land dataset. These datasets also peg relative humidity to the maximum temperature time of day, allowing for more accurate representation of the diurnal cycle of near-surface moisture in agricultural models. AgMERRA and AgCFSR enable a number of ongoing investigations in the Agricultural Model Intercomparison and Improvement Project (AgMIP) and related research networks, and may be used to fill gaps in historical observations as well as a basis for the generation of future climate scenarios.

  14. Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets

    Karacali Bilge

    2007-10-01

    Full Text Available Abstract Background Independently derived expression profiles of the same biological condition often have few genes in common. In this study, we created populations of expression profiles from publicly available microarray datasets of cancer (breast, lymphoma and renal samples linked to clinical information with an iterative machine learning algorithm. ROC curves were used to assess the prediction error of each profile for classification. We compared the prediction error of profiles correlated with molecular phenotype against profiles correlated with relapse-free status. Prediction error of profiles identified with supervised univariate feature selection algorithms were compared to profiles selected randomly from a all genes on the microarray platform and b a list of known disease-related genes (a priori selection. We also determined the relevance of expression profiles on test arrays from independent datasets, measured on either the same or different microarray platforms. Results Highly discriminative expression profiles were produced on both simulated gene expression data and expression data from breast cancer and lymphoma datasets on the basis of ER and BCL-6 expression, respectively. Use of relapse-free status to identify profiles for prognosis prediction resulted in poorly discriminative decision rules. Supervised feature selection resulted in more accurate classifications than random or a priori selection, however, the difference in prediction error decreased as the number of features increased. These results held when decision rules were applied across-datasets to samples profiled on the same microarray platform. Conclusion Our results show that many gene sets predict molecular phenotypes accurately. Given this, expression profiles identified using different training datasets should be expected to show little agreement. In addition, we demonstrate the difficulty in predicting relapse directly from microarray data using supervised machine

  15. panMetaDocs, eSciDoc, and DOIDB - an infrastructure for the curation and publication of file-based datasets for 'GFZ Data Services'

    Ulbricht, Damian; Elger, Kirsten; Bertelmann, Roland; Klump, Jens

    2016-04-01

    With the foundation of DataCite in 2009 and the technical infrastructure installed in the last six years it has become very easy to create citable dataset DOIs. Nowadays, dataset DOIs are increasingly accepted and required by journals in reference lists of manuscripts. In addition, DataCite provides usage statistics [1] of assigned DOIs and offers a public search API to make research data count. By linking related information to the data, they become more useful for future generations of scientists. For this purpose, several identifier systems, as ISBN for books, ISSN for journals, DOI for articles or related data, Orcid for authors, and IGSN for physical samples can be attached to DOIs using the DataCite metadata schema [2]. While these are good preconditions to publish data, free and open solutions that help with the curation of data, the publication of research data, and the assignment of DOIs in one software seem to be rare. At GFZ Potsdam we built a modular software stack that is made of several free and open software solutions and we established 'GFZ Data Services'. 'GFZ Data Services' provides storage, a metadata editor for publication and a facility to moderate minted DOIs. All software solutions are connected through web APIs, which makes it possible to reuse and integrate established software. Core component of 'GFZ Data Services' is an eSciDoc [3] middleware that is used as central storage, and has been designed along the OAIS reference model for digital preservation. Thus, data are stored in self-contained packages that are made of binary file-based data and XML-based metadata. The eSciDoc infrastructure provides access control to data and it is able to handle half-open datasets, which is useful in embargo situations when a subset of the research data are released after an adequate period. The data exchange platform panMetaDocs [4] makes use of eSciDoc's REST API to upload file-based data into eSciDoc and uses a metadata editor [5] to annotate the files

  16. Mining Open Datasets for Transparency in Taxi Transport in Metropolitan Environments

    Noulas, Anastasios; Salnikov, Vsevolod; Lambiotte, Renaud; Mascolo, Cecilia

    2015-01-01

    Uber has recently been introducing novel practices in urban taxi transport. Journey prices can change dynamically in almost real time and also vary geographically from one area to another in a city, a strategy known as surge pricing. In this paper, we explore the power of the new generation of open datasets towards understanding the impact of the new disruption technologies that emerge in the area of public transport. With our primary goal being a more transparent economic landscape for urban...

  17. Sex-linked dominant

    Inheritance - sex-linked dominant; Genetics - sex-linked dominant; X-linked dominant; Y-linked dominant ... can be either an autosomal chromosome or a sex chromosome. It also depends on whether the trait ...

  18. Rolling Deck to Repository (R2R): Linking and Integrating Data for Oceanographic Research

    Arko, R. A.; Chandler, C. L.; Clark, P. D.; Shepherd, A.; Moore, C.

    2012-12-01

    The Rolling Deck to Repository (R2R) program is developing infrastructure to ensure the underway sensor data from NSF-supported oceanographic research vessels are routinely and consistently documented, preserved in long-term archives, and disseminated to the science community. We have published the entire R2R Catalog as a Linked Data collection, making it easily accessible to encourage linking and integration with data at other repositories. We are developing the R2R Linked Data collection with specific goals in mind: 1.) We facilitate data access and reuse by providing the richest possible collection of resources to describe vessels, cruises, instruments, and datasets from the U.S. academic fleet, including data quality assessment results and clean trackline navigation. We are leveraging or adopting existing community-standard concepts and vocabularies, particularly concepts from the Biological and Chemical Oceanography Data Management Office (BCO-DMO) ontology and terms from the pan-European SeaDataNet vocabularies, and continually re-publish resources as new concepts and terms are mapped. 2.) We facilitate data citation through the entire data lifecycle from field acquisition to shoreside archiving to (ultimately) global syntheses and journal articles. We are implementing globally unique and persistent identifiers at the collection, dataset, and granule levels, and encoding these citable identifiers directly into the Linked Data resources. 3.) We facilitate linking and integration with other repositories that publish Linked Data collections for the U.S. academic fleet, such as BCO-DMO and the Index to Marine and Lacustrine Geological Samples (IMLGS). We are initially mapping datasets at the resource level, and plan to eventually implement rule-based mapping at the concept level. We work collaboratively with partner repositories to develop best practices for URI patterns and consensus on shared vocabularies. The R2R Linked Data collection is implemented as a

  19. Dataset of cocoa aspartic protease cleavage sites

    Katharina Janek

    2016-09-01

    Full Text Available The data provide information in support of the research article, “The cleavage specificity of the aspartic protease of cocoa beans involved in the generation of the cocoa-specific aroma precursors” (Janek et al., 2016 [1]. Three different protein substrates were partially digested with the aspartic protease isolated from cocoa beans and commercial pepsin, respectively. The obtained peptide fragments were analyzed by matrix-assisted laser-desorption/ionization time-of-flight mass spectrometry (MALDI-TOF/TOF-MS/MS and identified using the MASCOT server. The N- and C-terminal ends of the peptide fragments were used to identify the corresponding in-vitro cleavage sites by comparison with the amino acid sequences of the substrate proteins. The same procedure was applied to identify the cleavage sites used by the cocoa aspartic protease during cocoa fermentation starting from the published amino acid sequences of oligopeptides isolated from fermented cocoa beans. Keywords: Aspartic protease, Cleavage sites, Cocoa, In-vitro proteolysis, Mass spectrometry, Peptides

  20. Viability of Controlling Prosthetic Hand Utilizing Electroencephalograph (EEG) Dataset Signal

    Miskon, Azizi; A/L Thanakodi, Suresh; Raihan Mazlan, Mohd; Mohd Haziq Azhar, Satria; Nooraya Mohd Tawil, Siti

    2016-11-01

    This project presents the development of an artificial hand controlled by Electroencephalograph (EEG) signal datasets for the prosthetic application. The EEG signal datasets were used as to improvise the way to control the prosthetic hand compared to the Electromyograph (EMG). The EMG has disadvantages to a person, who has not used the muscle for a long time and also to person with degenerative issues due to age factor. Thus, the EEG datasets found to be an alternative for EMG. The datasets used in this work were taken from Brain Computer Interface (BCI) Project. The datasets were already classified for open, close and combined movement operations. It served the purpose as an input to control the prosthetic hand by using an Interface system between Microsoft Visual Studio and Arduino. The obtained results reveal the prosthetic hand to be more efficient and faster in response to the EEG datasets with an additional LiPo (Lithium Polymer) battery attached to the prosthetic. Some limitations were also identified in terms of the hand movements, weight of the prosthetic, and the suggestions to improve were concluded in this paper. Overall, the objective of this paper were achieved when the prosthetic hand found to be feasible in operation utilizing the EEG datasets.

  1. Enhancing Conservation with High Resolution Productivity Datasets for the Conterminous United States

    Robinson, Nathaniel Paul

    Human driven alteration of the earth's terrestrial surface is accelerating through land use changes, intensification of human activity, climate change, and other anthropogenic pressures. These changes occur at broad spatio-temporal scales, challenging our ability to effectively monitor and assess the impacts and subsequent conservation strategies. While satellite remote sensing (SRS) products enable monitoring of the earth's terrestrial surface continuously across space and time, the practical applications for conservation and management of these products are limited. Often the processes driving ecological change occur at fine spatial resolutions and are undetectable given the resolution of available datasets. Additionally, the links between SRS data and ecologically meaningful metrics are weak. Recent advances in cloud computing technology along with the growing record of high resolution SRS data enable the development of SRS products that quantify ecologically meaningful variables at relevant scales applicable for conservation and management. The focus of my dissertation is to improve the applicability of terrestrial gross and net primary productivity (GPP/NPP) datasets for the conterminous United States (CONUS). In chapter one, I develop a framework for creating high resolution datasets of vegetation dynamics. I use the entire archive of Landsat 5, 7, and 8 surface reflectance data and a novel gap filling approach to create spatially continuous 30 m, 16-day composites of the normalized difference vegetation index (NDVI) from 1986 to 2016. In chapter two, I integrate this with other high resolution datasets and the MOD17 algorithm to create the first high resolution GPP and NPP datasets for CONUS. I demonstrate the applicability of these products for conservation and management, showing the improvements beyond currently available products. In chapter three, I utilize this dataset to evaluate the relationships between land ownership and terrestrial production

  2. LinkED: A Novel Methodology for Publishing Linked Enterprise Data

    Shreyas Suresh Rao

    2017-01-01

    Full Text Available Semantic Web technologies have redefined and strengthened the Enterprise-Web interoperability over the last decade. Linked Open Data (LOD refers to a set of best practices that empower enterprises to publish and interlink their data using existing ontologies on the World Wide Web. Current research in LOD focuses on expert search, the creation of unified information space and augmentation of core data from an enterprise context. However, existing approaches for publication of enterprise data as LOD are domain-specific, ad-hoc and suffer from lack of uniform representation across domains. The paper proposes a novel methodology called LinkED that contributes towards LOD literature in two ways: (a streamlines the publishing process through five stages of cleaning, triplification, interlinking, storage and visualization; (b addresses the latest challenges in LOD publication, namely: inadequate links, inconsistencies in the quality of the dataset and replicability of the LOD publication process. Further, the methodology is demonstrated via the publication of digital repository data as LOD in a university setting, which is evaluated based on two semantic standards: Five-Star model and data quality metrics. Overall, the paper provides a generic LOD publication process that is applicable across various domains such as healthcare, e-governance, banking, and tourism, to name a few.

  3. Error characterisation of global active and passive microwave soil moisture datasets

    W. A. Dorigo

    2010-12-01

    datasets are uncorrelated the errors estimated for the remote sensing products are hardly influenced by the choice of the third independent dataset. The results obtained in this study can help us in developing adequate strategies for the combined use of various scatterometer and radiometer-based soil moisture datasets, e.g. for improved flood forecast modelling or the generation of superior multi-mission long-term soil moisture datasets.

  4. Homogenised Australian climate datasets used for climate change monitoring

    Trewin, Blair; Jones, David; Collins; Dean; Jovanovic, Branislava; Braganza, Karl

    2007-01-01

    Full text: The Australian Bureau of Meteorology has developed a number of datasets for use in climate change monitoring. These datasets typically cover 50-200 stations distributed as evenly as possible over the Australian continent, and have been subject to detailed quality control and homogenisation.The time period over which data are available for each element is largely determined by the availability of data in digital form. Whilst nearly all Australian monthly and daily precipitation data have been digitised, a significant quantity of pre-1957 data (for temperature and evaporation) or pre-1987 data (for some other elements) remains to be digitised, and is not currently available for use in the climate change monitoring datasets. In the case of temperature and evaporation, the start date of the datasets is also determined by major changes in instruments or observing practices for which no adjustment is feasible at the present time. The datasets currently available cover: Monthly and daily precipitation (most stations commence 1915 or earlier, with many extending back to the late 19th century, and a few to the mid-19th century); Annual temperature (commences 1910); Daily temperature (commences 1910, with limited station coverage pre-1957); Twice-daily dewpoint/relative humidity (commences 1957); Monthly pan evaporation (commences 1970); Cloud amount (commences 1957) (Jovanovic etal. 2007). As well as the station-based datasets listed above, an additional dataset being developed for use in climate change monitoring (and other applications) covers tropical cyclones in the Australian region. This is described in more detail in Trewin (2007). The datasets already developed are used in analyses of observed climate change, which are available through the Australian Bureau of Meteorology website (http://www.bom.gov.au/silo/products/cli_chg/). They are also used as a basis for routine climate monitoring, and in the datasets used for the development of seasonal

  5. Gulf of Aqaba Field Trip - Datasets

    Hanafy, Sherif M.

    2013-11-01

    OBJECTIVE: In this work we use geophysical methods to locate and characterize active faults in alluvial sediments. INTRODUCTION: Since only subtle material and velocity contrasts are expected across the faults, we used seismic refraction tomography and 2D resistivity imaging to locate the fault. One seismic profile and one 2D resistivity profile are collected at an alluvial fan on the Gulf of Aqaba coast in Saudi Arabia. The collected data are inverted to generate the traveltime tomogram and the electric resistivity tomogram (ERT). A low velocity anomaly is shown on the traveltime tomogram indicates the colluvial wedge associated with the fault. The location of the fault is shown on the ERT as a vertical high resistivity anomaly. Two data sets were collected at the study site to map the subsurface structure along a profile across the known normal fault described above. The first data set is a seismic refraction data set and the second is a 2D resistivity imaging data set. A total of 120 common shot gathers were collected (MatLab and DPik format). Each shot gather has 120 traces at equal shot and receiver intervals of 2.5 m. The total length of the profile is 297.5 m . Data were recorded using a 1 ms sampling interval for a total recording time of 0.3 s. A 200 lb weight drop was used as the seismic source, with 10 to 15 stacks at each shot location. One 2D resistivity profile is acquired at the same location and parallel to the seismic profile. The acquisition parameters of the resistivity profile are: No. of nodes: 64, Node interval: 5 m, Configuration Array: Schlumberger-Wenner, Total profile length: 315 m, Both seismic and resistivity profiles share the same starting point at the western end of the profile.

  6. HLA diversity in the 1000 genomes dataset.

    Pierre-Antoine Gourraud

    Full Text Available The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation by sequencing at a level that should allow the genome-wide detection of most variants with frequencies as low as 1%. However, in the major histocompatibility complex (MHC, only the top 10 most frequent haplotypes are in the 1% frequency range whereas thousands of haplotypes are present at lower frequencies. Given the limitation of both the coverage and the read length of the sequences generated by the 1000 Genomes Project, the highly variable positions that define HLA alleles may be difficult to identify. We used classical Sanger sequencing techniques to type the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 genes in the available 1000 Genomes samples and combined the results with the 103,310 variants in the MHC region genotyped by the 1000 Genomes Project. Using pairwise identity-by-descent distances between individuals and principal component analysis, we established the relationship between ancestry and genetic diversity in the MHC region. As expected, both the MHC variants and the HLA phenotype can identify the major ancestry lineage, informed mainly by the most frequent HLA haplotypes. To some extent, regions of the genome with similar genetic or similar recombination rate have similar properties. An MHC-centric analysis underlines departures between the ancestral background of the MHC and the genome-wide picture. Our analysis of linkage disequilibrium (LD decay in these samples suggests that overestimation of pairwise LD occurs due to a limited sampling of the MHC diversity. This collection of HLA-specific MHC variants, available on the dbMHC portal, is a valuable resource for future analyses of the role of MHC in population and disease studies.

  7. Hydrology Research with the North American Land Data Assimilation System (NLDAS) Datasets at the NASA GES DISC Using Giovanni

    Mocko, David M.; Rui, Hualan; Acker, James G.

    2013-01-01

    The North American Land Data Assimilation System (NLDAS) is a collaboration project between NASA/GSFC, NOAA, Princeton Univ., and the Univ. of Washington. NLDAS has created a surface meteorology dataset using the best-available observations and reanalyses the backbone of this dataset is a gridded precipitation analysis from rain gauges. This dataset is used to drive four separate land-surface models (LSMs) to produce datasets of soil moisture, snow, runoff, and surface fluxes. NLDAS datasets are available hourly and extend from Jan 1979 to near real-time with a typical 4-day lag. The datasets are available at 1/8th-degree over CONUS and portions of Canada and Mexico from 25-53 North. The datasets have been extensively evaluated against observations, and are also used as part of a drought monitor. NLDAS datasets are available from the NASA GES DISC and can be accessed via ftp, GDS, Mirador, and Giovanni. GES DISC news articles were published showing figures from the heat wave of 2011, Hurricane Irene, Tropical Storm Lee, and the low-snow winter of 2011-2012. For this presentation, Giovanni-generated figures using NLDAS data from the derecho across the U.S. Midwest and Mid-Atlantic will be presented. Also, similar figures will be presented from the landfall of Hurricane Isaac and the before-and-after drought conditions of the path of the tropical moisture into the central states of the U.S. Updates on future products and datasets from the NLDAS project will also be introduced.

  8. Tension in the recent Type Ia supernovae datasets

    Wei, Hao

    2010-01-01

    In the present work, we investigate the tension in the recent Type Ia supernovae (SNIa) datasets Constitution and Union. We show that they are in tension not only with the observations of the cosmic microwave background (CMB) anisotropy and the baryon acoustic oscillations (BAO), but also with other SNIa datasets such as Davis and SNLS. Then, we find the main sources responsible for the tension. Further, we make this more robust by employing the method of random truncation. Based on the results of this work, we suggest two truncated versions of the Union and Constitution datasets, namely the UnionT and ConstitutionT SNIa samples, whose behaviors are more regular.

  9. Verification of target motion effects on SAR imagery using the Gotcha GMTI challenge dataset

    Hack, Dan E.; Saville, Michael A.

    2010-04-01

    This paper investigates the relationship between a ground moving target's kinematic state and its SAR image. While effects such as cross-range offset, defocus, and smearing appear well understood, their derivations in the literature typically employ simplifications of the radar/target geometry and assume point scattering targets. This study adopts a geometrical model for understanding target motion effects in SAR imagery, termed the target migration path, and focuses on experimental verification of predicted motion effects using both simulated and empirical datasets based on the Gotcha GMTI challenge dataset. Specifically, moving target imagery is generated from three data sources: first, simulated phase history for a moving point target; second, simulated phase history for a moving vehicle derived from a simulated Mazda MPV X-band signature; and third, empirical phase history from the Gotcha GMTI challenge dataset. Both simulated target trajectories match the truth GPS target position history from the Gotcha GMTI challenge dataset, allowing direct comparison between all three imagery sets and the predicted target migration path. This paper concludes with a discussion of the parallels between the target migration path and the measurement model within a Kalman filtering framework, followed by conclusions.

  10. Discovery of Teleconnections Using Data Mining Technologies in Global Climate Datasets

    Fan Lin

    2007-10-01

    Full Text Available In this paper, we apply data mining technologies to a 100-year global land precipitation dataset and a 100-year Sea Surface Temperature (SST dataset. Some interesting teleconnections are discovered, including well-known patterns and unknown patterns (to the best of our knowledge, such as teleconnections between the abnormally low temperature events of the North Atlantic and floods in Northern Bolivia, abnormally low temperatures of the Venezuelan Coast and floods in Northern Algeria and Tunisia, etc. In particular, we use a high dimensional clustering method and a method that mines episode association rules in event sequences. The former is used to cluster the original time series datasets into higher spatial granularity, and the later is used to discover teleconnection patterns among events sequences that are generated by the clustering method. In order to verify our method, we also do experiments on the SOI index and a 100-year global land precipitation dataset and find many well-known teleconnections, such as teleconnections between SOI lower events and drought events of Eastern Australia, South Africa, and North Brazil; SOI lower events and flood events of the middle-lower reaches of Yangtze River; etc. We also do explorative experiments to help domain scientists discover new knowledge.

  11. Automatic Diabetic Macular Edema Detection in Fundus Images Using Publicly Available Datasets

    Giancardo, Luca [ORNL; Meriaudeau, Fabrice [ORNL; Karnowski, Thomas Paul [ORNL; Li, Yaquin [University of Tennessee, Knoxville (UTK); Garg, Seema [University of North Carolina; Tobin Jr, Kenneth William [ORNL; Chaum, Edward [University of Tennessee, Knoxville (UTK)

    2011-01-01

    Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing. Our algorithm is robust to segmentation uncertainties, does not need ground truth at lesion level, and is very fast, generating a diagnosis on an average of 4.4 seconds per image on an 2.6 GHz platform with an unoptimised Matlab implementation.

  12. Dimension Reduction Aided Hyperspectral Image Classification with a Small-sized Training Dataset: Experimental Comparisons

    Jinya Su

    2017-11-01

    Full Text Available Hyperspectral images (HSI provide rich information which may not be captured by other sensing technologies and therefore gradually find a wide range of applications. However, they also generate a large amount of irrelevant or redundant data for a specific task. This causes a number of issues including significantly increased computation time, complexity and scale of prediction models mapping the data to semantics (e.g., classification, and the need of a large amount of labelled data for training. Particularly, it is generally difficult and expensive for experts to acquire sufficient training samples in many applications. This paper addresses these issues by exploring a number of classical dimension reduction algorithms in machine learning communities for HSI classification. To reduce the size of training dataset, feature selection (e.g., mutual information, minimal redundancy maximal relevance and feature extraction (e.g., Principal Component Analysis (PCA, Kernel PCA are adopted to augment a baseline classification method, Support Vector Machine (SVM. The proposed algorithms are evaluated using a real HSI dataset. It is shown that PCA yields the most promising performance in reducing the number of features or spectral bands. It is observed that while significantly reducing the computational complexity, the proposed method can achieve better classification results over the classic SVM on a small training dataset, which makes it suitable for real-time applications or when only limited training data are available. Furthermore, it can also achieve performances similar to the classic SVM on large datasets but with much less computing time.

  13. Finding the traces of behavioral and cognitive processes in big data and naturally occurring datasets.

    Paxton, Alexandra; Griffiths, Thomas L

    2017-10-01

    Today, people generate and store more data than ever before as they interact with both real and virtual environments. These digital traces of behavior and cognition offer cognitive scientists and psychologists an unprecedented opportunity to test theories outside the laboratory. Despite general excitement about big data and naturally occurring datasets among researchers, three "gaps" stand in the way of their wider adoption in theory-driven research: the imagination gap, the skills gap, and the culture gap. We outline an approach to bridging these three gaps while respecting our responsibilities to the public as participants in and consumers of the resulting research. To that end, we introduce Data on the Mind ( http://www.dataonthemind.org ), a community-focused initiative aimed at meeting the unprecedented challenges and opportunities of theory-driven research with big data and naturally occurring datasets. We argue that big data and naturally occurring datasets are most powerfully used to supplement-not supplant-traditional experimental paradigms in order to understand human behavior and cognition, and we highlight emerging ethical issues related to the collection, sharing, and use of these powerful datasets.

  14. Background qualitative analysis of the European reference life cycle database (ELCD) energy datasets - part II: electricity datasets.

    Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

    2015-01-01

    The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.

  15. Dataset definition for CMS operations and physics analyses

    Franzoni, Giovanni; Compact Muon Solenoid Collaboration

    2016-04-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets and secondary datasets/dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concepts of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the LHC run I, and we discuss the plans for the run II.

  16. U.S. Climate Divisional Dataset (Version Superseded)

    National Oceanic and Atmospheric Administration, Department of Commerce — This data has been superseded by a newer version of the dataset. Please refer to NOAA's Climate Divisional Database for more information. The U.S. Climate Divisional...

  17. Karna Particle Size Dataset for Tables and Figures

    U.S. Environmental Protection Agency — This dataset contains 1) table of bulk Pb-XAS LCF results, 2) table of bulk As-XAS LCF results, 3) figure data of particle size distribution, and 4) figure data for...

  18. NOAA Global Surface Temperature Dataset, Version 4.0

    National Oceanic and Atmospheric Administration, Department of Commerce — The NOAA Global Surface Temperature Dataset (NOAAGlobalTemp) is derived from two independent analyses: the Extended Reconstructed Sea Surface Temperature (ERSST)...

  19. National Hydrography Dataset (NHD) - USGS National Map Downloadable Data Collection

    U.S. Geological Survey, Department of the Interior — The USGS National Hydrography Dataset (NHD) Downloadable Data Collection from The National Map (TNM) is a comprehensive set of digital spatial data that encodes...

  20. Watershed Boundary Dataset (WBD) - USGS National Map Downloadable Data Collection

    U.S. Geological Survey, Department of the Interior — The Watershed Boundary Dataset (WBD) from The National Map (TNM) defines the perimeter of drainage areas formed by the terrain and other landscape characteristics....

  1. BASE MAP DATASET, LE FLORE COUNTY, OKLAHOMA, USA

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  2. USGS National Hydrography Dataset from The National Map

    U.S. Geological Survey, Department of the Interior — USGS The National Map - National Hydrography Dataset (NHD) is a comprehensive set of digital spatial data that encodes information about naturally occurring and...

  3. A robust dataset-agnostic heart disease classifier from Phonocardiogram.

    Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M

    2017-07-01

    Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.

  4. AFSC/REFM: Seabird Necropsy dataset of North Pacific

    National Oceanic and Atmospheric Administration, Department of Commerce — The seabird necropsy dataset contains information on seabird specimens that were collected under salvage and scientific collection permits primarily by...

  5. Dataset definition for CMS operations and physics analyses

    AUTHOR|(CDS)2051291

    2016-01-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets, secondary datasets, and dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concept of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the first run, and we discuss the plans for the second LHC run.

  6. USGS National Boundary Dataset (NBD) Downloadable Data Collection

    U.S. Geological Survey, Department of the Interior — The USGS Governmental Unit Boundaries dataset from The National Map (TNM) represents major civil areas for the Nation, including States or Territories, counties (or...

  7. Environmental Dataset Gateway (EDG) CS-W Interface

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  8. Global Man-made Impervious Surface (GMIS) Dataset From Landsat

    National Aeronautics and Space Administration — The Global Man-made Impervious Surface (GMIS) Dataset From Landsat consists of global estimates of fractional impervious cover derived from the Global Land Survey...

  9. A Comparative Analysis of Classification Algorithms on Diverse Datasets

    M. Alghobiri

    2018-04-01

    Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

  10. Newton SSANTA Dr Water using POU filters dataset

    U.S. Environmental Protection Agency — This dataset contains information about all the features extracted from the raw data files, the formulas that were assigned to some of these features, and the...

  11. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets

    Li, Lianwei; Ma, Zhanshan (Sam)

    2016-01-01

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health?the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples...

  12. General Purpose Multimedia Dataset - GarageBand 2008

    Meng, Anders

    This document describes a general purpose multimedia data-set to be used in cross-media machine learning problems. In more detail we describe the genre taxonomy applied at http://www.garageband.com, from where the data-set was collected, and how the taxonomy have been fused into a more human...... understandable taxonomy. Finally, a description of various features extracted from both the audio and text are presented....

  13. Artificial intelligence (AI) systems for interpreting complex medical datasets.

    Altman, R B

    2017-05-01

    Advances in machine intelligence have created powerful capabilities in algorithms that find hidden patterns in data, classify objects based on their measured characteristics, and associate similar patients/diseases/drugs based on common features. However, artificial intelligence (AI) applications in medical data have several technical challenges: complex and heterogeneous datasets, noisy medical datasets, and explaining their output to users. There are also social challenges related to intellectual property, data provenance, regulatory issues, economics, and liability. © 2017 ASCPT.

  14. CrossWork: Software-assisted identification of cross-linked peptides

    Rasmussen, Morten; Refsgaard, Jan; Peng, Li

    2011-01-01

    Work searches batches of tandem mass-spectrometric data, and identifies cross-linked and non-cross-linked peptides using a standard PC. We tested CrossWork by searching mass-spectrometric datasets of cross-linked complement factor C3 against small (1 protein) and large (1000 proteins) search spaces, and show...

  15. Semi-supervised tracking of extreme weather events in global spatio-temporal climate datasets

    Kim, S. K.; Prabhat, M.; Williams, D. N.

    2017-12-01

    Deep neural networks have been successfully applied to solve problem to detect extreme weather events in large scale climate datasets and attend superior performance that overshadows all previous hand-crafted methods. Recent work has shown that multichannel spatiotemporal encoder-decoder CNN architecture is able to localize events in semi-supervised bounding box. Motivated by this work, we propose new learning metric based on Variational Auto-Encoders (VAE) and Long-Short-Term-Memory (LSTM) to track extreme weather events in spatio-temporal dataset. We consider spatio-temporal object tracking problems as learning probabilistic distribution of continuous latent features of auto-encoder using stochastic variational inference. For this, we assume that our datasets are i.i.d and latent features is able to be modeled by Gaussian distribution. In proposed metric, we first train VAE to generate approximate posterior given multichannel climate input with an extreme climate event at fixed time. Then, we predict bounding box, location and class of extreme climate events using convolutional layers given input concatenating three features including embedding, sampled mean and standard deviation. Lastly, we train LSTM with concatenated input to learn timely information of dataset by recurrently feeding output back to next time-step's input of VAE. Our contribution is two-fold. First, we show the first semi-supervised end-to-end architecture based on VAE to track extreme weather events which can apply to massive scaled unlabeled climate datasets. Second, the information of timely movement of events is considered for bounding box prediction using LSTM which can improve accuracy of localization. To our knowledge, this technique has not been explored neither in climate community or in Machine Learning community.

  16. SAR image dataset of military ground targets with multiple poses for ATR

    Belloni, Carole; Balleri, Alessio; Aouf, Nabil; Merlet, Thomas; Le Caillec, Jean-Marc

    2017-10-01

    Automatic Target Recognition (ATR) is the task of automatically detecting and classifying targets. Recognition using Synthetic Aperture Radar (SAR) images is interesting because SAR images can be acquired at night and under any weather conditions, whereas optical sensors operating in the visible band do not have this capability. Existing SAR ATR algorithms have mostly been evaluated using the MSTAR dataset.1 The problem with the MSTAR is that some of the proposed ATR methods have shown good classification performance even when targets were hidden,2 suggesting the presence of a bias in the dataset. Evaluations of SAR ATR techniques are currently challenging due to the lack of publicly available data in the SAR domain. In this paper, we present a high resolution SAR dataset consisting of images of a set of ground military target models taken at various aspect angles, The dataset can be used for a fair evaluation and comparison of SAR ATR algorithms. We applied the Inverse Synthetic Aperture Radar (ISAR) technique to echoes from targets rotating on a turntable and illuminated with a stepped frequency waveform. The targets in the database consist of four variants of two 1.7m-long models of T-64 and T-72 tanks. The gun, the turret position and the depression angle are varied to form 26 different sequences of images. The emitted signal spanned the frequency range from 13 GHz to 18 GHz to achieve a bandwidth of 5 GHz sampled with 4001 frequency points. The resolution obtained with respect to the size of the model targets is comparable to typical values obtained using SAR airborne systems. Single polarized images (Horizontal-Horizontal) are generated using the backprojection algorithm.3 A total of 1480 images are produced using a 20° integration angle. The images in the dataset are organized in a suggested training and testing set to facilitate a standard evaluation of SAR ATR algorithms.

  17. LINKING STATE, UNIVERSITY AND BUSINESS IN NICARAGUA

    Máximo Andrés Rodríguez Pérez

    2015-07-01

    Full Text Available In Nicaragua levels Linking state, university and business are low, Nicaraguan universities have initiated communication strategies with the state and the private sector. The idiosyncrasies of its citizens favor this link. The entailment policies formalize the communications and information networks. Universities have a key role in building models and organizations that provide alternatives to economic development. Linking the university with the environment, generating virtuous circles, where companies achieve greater competitiveness, the state, higher taxes and public stability, universities generate new knowledge. This article analyzes the strategies linking U-E- E that can be applied in Nicaragua, to strengthen and achieve positive developments in the country.

  18. Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

    National Aeronautics and Space Administration — We propose to mine and utilize the combination of Earth Science dataset, metadata with usage metrics and user feedback to objectively extract relevance for improved...

  19. Multi-facetted Metadata - Describing datasets with different metadata schemas at the same time

    Ulbricht, Damian; Klump, Jens; Bertelmann, Roland

    2013-04-01

    Inspired by the wish to re-use research data a lot of work is done to bring data systems of the earth sciences together. Discovery metadata is disseminated to data portals to allow building of customized indexes of catalogued dataset items. Data that were once acquired in the context of a scientific project are open for reappraisal and can now be used by scientists that were not part of the original research team. To make data re-use easier, measurement methods and measurement parameters must be documented in an application metadata schema and described in a written publication. Linking datasets to publications - as DataCite [1] does - requires again a specific metadata schema and every new use context of the measured data may require yet another metadata schema sharing only a subset of information with the meta information already present. To cope with the problem of metadata schema diversity in our common data repository at GFZ Potsdam we established a solution to store file-based research data and describe these with an arbitrary number of metadata schemas. Core component of the data repository is an eSciDoc infrastructure that provides versioned container objects, called eSciDoc [2] "items". The eSciDoc content model allows assigning files to "items" and adding any number of metadata records to these "items". The eSciDoc items can be submitted, revised, and finally published, which makes the data and metadata available through the internet worldwide. GFZ Potsdam uses eSciDoc to support its scientific publishing workflow, including mechanisms for data review in peer review processes by providing temporary web links for external reviewers that do not have credentials to access the data. Based on the eSciDoc API, panMetaDocs [3] provides a web portal for data management in research projects. PanMetaDocs, which is based on panMetaWorks [4], is a PHP based web application that allows to describe data with any XML-based schema. It uses the eSciDoc infrastructures

  20. A geospatial database model for the management of remote sensing datasets at multiple spectral, spatial, and temporal scales

    Ifimov, Gabriela; Pigeau, Grace; Arroyo-Mora, J. Pablo; Soffer, Raymond; Leblanc, George

    2017-10-01

    In this study the development and implementation of a geospatial database model for the management of multiscale datasets encompassing airborne imagery and associated metadata is presented. To develop the multi-source geospatial database we have used a Relational Database Management System (RDBMS) on a Structure Query Language (SQL) server which was then integrated into ArcGIS and implemented as a geodatabase. The acquired datasets were compiled, standardized, and integrated into the RDBMS, where logical associations between different types of information were linked (e.g. location, date, and instrument). Airborne data, at different processing levels (digital numbers through geocorrected reflectance), were implemented in the geospatial database where the datasets are linked spatially and temporally. An example dataset consisting of airborne hyperspectral imagery, collected for inter and intra-annual vegetation characterization and detection of potential hydrocarbon seepage events over pipeline areas, is presented. Our work provides a model for the management of airborne imagery, which is a challenging aspect of data management in remote sensing, especially when large volumes of data are collected.

  1. Use of country of birth as an indicator of refugee background in health datasets

    2014-01-01

    Background Routine public health databases contain a wealth of data useful for research among vulnerable or isolated groups, who may be under-represented in traditional medical research. Identifying specific vulnerable populations, such as resettled refugees, can be particularly challenging; often country of birth is the sole indicator of whether an individual has a refugee background. The objective of this article was to review strengths and weaknesses of different methodological approaches to identifying resettled refugees and comparison groups from routine health datasets and to propose the application of additional methodological rigour in future research. Discussion Methodological approaches to selecting refugee and comparison groups from existing routine health datasets vary widely and are often explained in insufficient detail. Linked data systems or datasets from specialized refugee health services can accurately select resettled refugee and asylum seeker groups but have limited availability and can be selective. In contrast, country of birth is commonly collected in routine health datasets but a robust method for selecting humanitarian source countries based solely on this information is required. The authors recommend use of national immigration data to objectively identify countries of birth with high proportions of humanitarian entrants, matched by time period to the study dataset. When available, additional migration indicators may help to better understand migration as a health determinant. Methodologically, if multiple countries of birth are combined, the proportion of the sample represented by each country of birth should be included, with sub-analysis of individual countries of birth potentially providing further insights, if population size allows. United Nations-defined world regions provide an objective framework for combining countries of birth when necessary. A comparison group of economic migrants from the same world region may be appropriate

  2. Multilayered complex network datasets for three supply chain network archetypes on an urban road grid

    Nadia M. Viljoen

    2018-02-01

    Full Text Available This article presents the multilayered complex network formulation for three different supply chain network archetypes on an urban road grid and describes how 500 instances were randomly generated for each archetype. Both the supply chain network layer and the urban road network layer are directed unweighted networks. The shortest path set is calculated for each of the 1 500 experimental instances. The datasets are used to empirically explore the impact that the supply chain's dependence on the transport network has on its vulnerability in Viljoen and Joubert (2017 [1]. The datasets are publicly available on Mendeley (Joubert and Viljoen, 2017 [2]. Keywords: Multilayered complex networks, Supply chain vulnerability, Urban road networks

  3. Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets

    Alex D. Washburne

    2017-02-01

    Full Text Available Marker gene sequencing of microbial communities has generated big datasets of microbial relative abundances varying across environmental conditions, sample sites and treatments. These data often come with putative phylogenies, providing unique opportunities to investigate how shared evolutionary history affects microbial abundance patterns. Here, we present a method to identify the phylogenetic factors driving patterns in microbial community composition. We use the method, “phylofactorization,” to re-analyze datasets from the human body and soil microbial communities, demonstrating how phylofactorization is a dimensionality-reducing tool, an ordination-visualization tool, and an inferential tool for identifying edges in the phylogeny along which putative functional ecological traits may have arisen.

  4. Linked Patient-Reported Outcomes Data From Patients With Multiple Sclerosis Recruited on an Open Internet Platform to Health Care Claims Databases Identifies a Representative Population for Real-Life Data Analysis in Multiple Sclerosis.

    Risson, Valery; Ghodge, Bhaskar; Bonzani, Ian C; Korn, Jonathan R; Medin, Jennie; Saraykar, Tanmay; Sengupta, Souvik; Saini, Deepanshu; Olson, Melvin

    2016-09-22

    An enormous amount of information relevant to public health is being generated directly by online communities. To explore the feasibility of creating a dataset that links patient-reported outcomes data, from a Web-based survey of US patients with multiple sclerosis (MS) recruited on open Internet platforms, to health care utilization information from health care claims databases. The dataset was generated by linkage analysis to a broader MS population in the United States using both pharmacy and medical claims data sources. US Facebook users with an interest in MS were alerted to a patient-reported survey by targeted advertisements. Eligibility criteria were diagnosis of MS by a specialist (primary progressive, relapsing-remitting, or secondary progressive), ≥12-month history of disease, age 18-65 years, and commercial health insurance. Participants completed a questionnaire including data on demographic and disease characteristics, current and earlier therapies, relapses, disability, health-related quality of life, and employment status and productivity. A unique anonymous profile was generated for each survey respondent. Each anonymous profile was linked to a number of medical and pharmacy claims datasets in the United States. Linkage rates were assessed and survey respondents' representativeness was evaluated based on differences in the distribution of characteristics between the linked survey population and the general MS population in the claims databases. The advertisement was placed on 1,063,973 Facebook users' pages generating 68,674 clicks, 3719 survey attempts, and 651 successfully completed surveys, of which 440 could be linked to any of the claims databases for 2014 or 2015 (67.6% linkage rate). Overall, no significant differences were found between patients who were linked and not linked for educational status, ethnicity, current or prior disease-modifying therapy (DMT) treatment, or presence of a relapse in the last 12 months. The frequencies of the

  5. Integration of Neuroimaging and Microarray Datasets  through Mapping and Model-Theoretic Semantic Decomposition of Unstructured Phenotypes

    Spiro P. Pantazatos

    2009-06-01

    Full Text Available An approach towards heterogeneous neuroscience dataset integration is proposed that uses Natural Language Processing (NLP and a knowledge-based phenotype organizer system (PhenOS to link ontology-anchored terms to underlying data from each database, and then maps these terms based on a computable model of disease (SNOMED CT®. The approach was implemented using sample datasets from fMRIDC, GEO, The Whole Brain Atlas and Neuronames, and allowed for complex queries such as “List all disorders with a finding site of brain region X, and then find the semantically related references in all participating databases based on the ontological model of the disease or its anatomical and morphological attributes”. Precision of the NLP-derived coding of the unstructured phenotypes in each dataset was 88% (n = 50, and precision of the semantic mapping between these terms across datasets was 98% (n = 100. To our knowledge, this is the first example of the use of both semantic decomposition of disease relationships and hierarchical information found in ontologies to integrate heterogeneous phenotypes across clinical and molecular datasets.

  6. HC StratoMineR: A web-based tool for the rapid analysis of high content datasets

    Omta, W.; Heesbeen, R. van; Pagliero, R.; Velden, L. van der; Lelieveld, D.; Nellen, M.; Kramer, M.; Yeong, M.; Saeidi, A.; Medema, R.; Spruit, M.; Brinkkemper, S.; Klumperman, J.; Egan, D.

    2016-01-01

    High-content screening (HCS) can generate large multidimensional datasets and when aligned with the appropriate data mining tools, it can yield valuable insights into the mechanism of action of bioactive molecules. However, easy-to-use data mining tools are not widely available, with the result that

  7. HC StratoMineR : A Web-Based Tool for the Rapid Analysis of High-Content Datasets

    Omta, Wienand A; van Heesbeen, Roy G; Pagliero, Romina J; van der Velden, Lieke M; Lelieveld, Daphne; Nellen, Mehdi; Kramer, Maik; Yeong, Marley; Saeidi, Amir M; Medema, Rene H; Spruit, Marco; Brinkkemper, Sjaak; Klumperman, Judith; Egan, David A

    2016-01-01

    High-content screening (HCS) can generate large multidimensional datasets and when aligned with the appropriate data mining tools, it can yield valuable insights into the mechanism of action of bioactive molecules. However, easy-to-use data mining tools are not widely available, with the result that

  8. A novel industry grade dataset for fault prediction based on model-driven developed automotive embedded software

    Altinger, H.; Siegl, S.; Dajsuren, Y.; Wotawa, F.

    2015-01-01

    In this paper, we present a novel industry dataset on static software and change metrics for Matlab/Simulink models and their corresponding auto-generated C source code. The data set comprises data of three automotive projects developed and tested accordingly to industry standards and restrictive

  9. Sequence-function-stability relationships in proteins from datasets of functionally annotated variants: The case of TEM beta-lactamases

    Abriata, L.A.; Salverda, M.L.M.; Tomatis, P.E.

    2012-01-01

    A dataset of TEM lactamase variants with different substrate and inhibition profiles was compiled and analyzed. Trends show that loops are the main evolvable regions in these enzymes, gradually accumulating mutations to generate increasingly complex functions. Notably, many mutations present in

  10. Quality-control of an hourly rainfall dataset and climatology of extremes for the UK.

    Blenkinsop, Stephen; Lewis, Elizabeth; Chan, Steven C; Fowler, Hayley J

    2017-02-01

    Sub-daily rainfall extremes may be associated with flash flooding, particularly in urban areas but, compared with extremes on daily timescales, have been relatively little studied in many regions. This paper describes a new, hourly rainfall dataset for the UK based on ∼1600 rain gauges from three different data sources. This includes tipping bucket rain gauge data from the UK Environment Agency (EA), which has been collected for operational purposes, principally flood forecasting. Significant problems in the use of such data for the analysis of extreme events include the recording of accumulated totals, high frequency bucket tips, rain gauge recording errors and the non-operation of gauges. Given the prospect of an intensification of short-duration rainfall in a warming climate, the identification of such errors is essential if sub-daily datasets are to be used to better understand extreme events. We therefore first describe a series of procedures developed to quality control this new dataset. We then analyse ∼380 gauges with near-complete hourly records for 1992-2011 and map the seasonal climatology of intense rainfall based on UK hourly extremes using annual maxima, n-largest events and fixed threshold approaches. We find that the highest frequencies and intensities of hourly extreme rainfall occur during summer when the usual orographically defined pattern of extreme rainfall is replaced by a weaker, north-south pattern. A strong diurnal cycle in hourly extremes, peaking in late afternoon to early evening, is also identified in summer and, for some areas, in spring. This likely reflects the different mechanisms that generate sub-daily rainfall, with convection dominating during summer. The resulting quality-controlled hourly rainfall dataset will provide considerable value in several contexts, including the development of standard, globally applicable quality-control procedures for sub-daily data, the validation of the new generation of very high

  11. Development and field testing of satellite-linked fluorometers for marine mammals

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset includes telemetry data related to the development and testing of an animal-borne satellite-linked fluorometer tag, used on northern fur seals and...

  12. Development in fiscal 1999 of technology to put photovoltaic power generation system into practical use. Demonstrative study on photovoltaic power generation system (Research on high-density linking technology); 1999 nendo taiyoko hatsuden system jitsuyoka gijutsu kaihatsu seika hokokusho. Taiyoko hatsuden system no jissho kenkyu (komitsudo renkei gijutsu no kenkyu)

    NONE

    2000-03-01

    Taking a photovoltaic power generation system linked to distribution lines at high density as the object of discussion, investigative studies were performed on electric power quality, clarification of various problems including effects on operation and protection of distribution lines, the corrective measures thereof, and enhancement of power quality by utilizing inverters. This paper summarizes the achievements in fiscal 1999. In clarifying the problems associated with high-density linkage, discussions were given on possible amount of PV introduction into the distribution lines as seen from the electric performance aspect including power quality and safety. Placing the importance on identifying the current status of single operation preventing technologies, demonstrative discussions were given on the single operation preventing performance of commercially available inverters in testing 84 inverters operated in parallel by using the Rokko testing installation. In discussing the corrective measure technologies, development has been performed on a decentralized voltage stabilizing device based on injection of reactive power into high-voltage distribution lines as a measure to suppress voltage rise in the distribution lines. The reasonability of the fundamental characteristics thereof was verified by using the Akagi testing facilities. In addition, improved design was progressed on the two-step active prevention system. Commencement has taken place on verification of the reasonability of the fundamental characteristics and tests on parallel operation of multiple number of units. (NEDO)

  13. Wind and wave dataset for Matara, Sri Lanka

    Luo, Yao; Wang, Dongxiao; Priyadarshana Gamage, Tilak; Zhou, Fenghua; Madusanka Widanage, Charith; Liu, Taiwei

    2018-01-01

    We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1) is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017) is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447).

  14. The LANDFIRE Refresh strategy: updating the national dataset

    Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit E.; Martin, Charley

    2013-01-01

    The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.

  15. Interactive visualization and analysis of multimodal datasets for surgical applications.

    Kirmizibayrak, Can; Yim, Yeny; Wakid, Mike; Hahn, James

    2012-12-01

    Surgeons use information from multiple sources when making surgical decisions. These include volumetric datasets (such as CT, PET, MRI, and their variants), 2D datasets (such as endoscopic videos), and vector-valued datasets (such as computer simulations). Presenting all the information to the user in an effective manner is a challenging problem. In this paper, we present a visualization approach that displays the information from various sources in a single coherent view. The system allows the user to explore and manipulate volumetric datasets, display analysis of dataset values in local regions, combine 2D and 3D imaging modalities and display results of vector-based computer simulations. Several interaction methods are discussed: in addition to traditional interfaces including mouse and trackers, gesture-based natural interaction methods are shown to control these visualizations with real-time performance. An example of a medical application (medialization laryngoplasty) is presented to demonstrate how the combination of different modalities can be used in a surgical setting with our approach.

  16. Wind and wave dataset for Matara, Sri Lanka

    Y. Luo

    2018-01-01

    Full Text Available We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1 is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017 is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447.

  17. Process mining in oncology using the MIMIC-III dataset

    Prima Kurniati, Angelina; Hall, Geoff; Hogg, David; Johnson, Owen

    2018-03-01

    Process mining is a data analytics approach to discover and analyse process models based on the real activities captured in information systems. There is a growing body of literature on process mining in healthcare, including oncology, the study of cancer. In earlier work we found 37 peer-reviewed papers describing process mining research in oncology with a regular complaint being the limited availability and accessibility of datasets with suitable information for process mining. Publicly available datasets are one option and this paper describes the potential to use MIMIC-III, for process mining in oncology. MIMIC-III is a large open access dataset of de-identified patient records. There are 134 publications listed as using the MIMIC dataset, but none of them have used process mining. The MIMIC-III dataset has 16 event tables which are potentially useful for process mining and this paper demonstrates the opportunities to use MIMIC-III for process mining in oncology. Our research applied the L* lifecycle method to provide a worked example showing how process mining can be used to analyse cancer pathways. The results and data quality limitations are discussed along with opportunities for further work and reflection on the value of MIMIC-III for reproducible process mining research.

  18. NOAA NEXt-Generation RADar (NEXRAD) Products

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset consists of Level III weather radar products collected from Next-Generation Radar (NEXRAD) stations located in the contiguous United States, Alaska,...

  19. Recent Development on the NOAA's Global Surface Temperature Dataset

    Zhang, H. M.; Huang, B.; Boyer, T.; Lawrimore, J. H.; Menne, M. J.; Rennie, J.

    2016-12-01

    Global Surface Temperature (GST) is one of the most widely used indicators for climate trend and extreme analyses. A widely used GST dataset is the NOAA merged land-ocean surface temperature dataset known as NOAAGlobalTemp (formerly MLOST). The NOAAGlobalTemp had recently been updated from version 3.5.4 to version 4. The update includes a significant improvement in the ocean surface component (Extended Reconstructed Sea Surface Temperature or ERSST, from version 3b to version 4) which resulted in an increased temperature trends in recent decades. Since then, advancements in both the ocean component (ERSST) and land component (GHCN-Monthly) have been made, including the inclusion of Argo float SSTs and expanded EOT modes in ERSST, and the use of ISTI databank in GHCN-Monthly. In this presentation, we describe the impact of those improvements on the merged global temperature dataset, in terms of global trends and other aspects.

  20. Synthetic ALSPAC longitudinal datasets for the Big Data VR project.

    Avraam, Demetris; Wilson, Rebecca C; Burton, Paul

    2017-01-01

    Three synthetic datasets - of observation size 15,000, 155,000 and 1,555,000 participants, respectively - were created by simulating eleven cardiac and anthropometric variables from nine collection ages of the ALSAPC birth cohort study. The synthetic datasets retain similar data properties to the ALSPAC study data they are simulated from (co-variance matrices, as well as the mean and variance values of the variables) without including the original data itself or disclosing participant information.  In this instance, the three synthetic datasets have been utilised in an academia-industry collaboration to build a prototype virtual reality data analysis software, but they could have a broader use in method and software development projects where sensitive data cannot be freely shared.

  1. The OXL format for the exchange of integrated datasets

    Taubert Jan

    2007-12-01

    Full Text Available A prerequisite for systems biology is the integration and analysis of heterogeneous experimental data stored in hundreds of life-science databases and millions of scientific publications. Several standardised formats for the exchange of specific kinds of biological information exist. Such exchange languages facilitate the integration process; however they are not designed to transport integrated datasets. A format for exchanging integrated datasets needs to i cover data from a broad range of application domains, ii be flexible and extensible to combine many different complex data structures, iii include metadata and semantic definitions, iv include inferred information, v identify the original data source for integrated entities and vi transport large integrated datasets. Unfortunately, none of the exchange formats from the biological domain (e.g. BioPAX, MAGE-ML, PSI-MI, SBML or the generic approaches (RDF, OWL fulfil these requirements in a systematic way.

  2. Dataset of transcriptional landscape of B cell early activation

    Alexander S. Garruss

    2015-09-01

    Full Text Available Signaling via B cell receptors (BCR and Toll-like receptors (TLRs result in activation of B cells with distinct physiological outcomes, but transcriptional regulatory mechanisms that drive activation and distinguish these pathways remain unknown. At early time points after BCR and TLR ligand exposure, 0.5 and 2 h, RNA-seq was performed allowing observations on rapid transcriptional changes. At 2 h, ChIP-seq was performed to allow observations on important regulatory mechanisms potentially driving transcriptional change. The dataset includes RNA-seq, ChIP-seq of control (Input, RNA Pol II, H3K4me3, H3K27me3, and a separate RNA-seq for miRNA expression, which can be found at Gene Expression Omnibus Dataset GSE61608. Here, we provide details on the experimental and analysis methods used to obtain and analyze this dataset and to examine the transcriptional landscape of B cell early activation.

  3. The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset

    Huffman, George J.; Adler, Robert F.; Arkin, Philip; Chang, Alfred; Ferraro, Ralph; Gruber, Arnold; Janowiak, John; McNab, Alan; Rudolf, Bruno; Schneider, Udo

    1997-01-01

    The Global Precipitation Climatology Project (GPCP) has released the GPCP Version 1 Combined Precipitation Data Set, a global, monthly precipitation dataset covering the period July 1987 through December 1995. The primary product in the dataset is a merged analysis incorporating precipitation estimates from low-orbit-satellite microwave data, geosynchronous-orbit -satellite infrared data, and rain gauge observations. The dataset also contains the individual input fields, a combination of the microwave and infrared satellite estimates, and error estimates for each field. The data are provided on 2.5 deg x 2.5 deg latitude-longitude global grids. Preliminary analyses show general agreement with prior studies of global precipitation and extends prior studies of El Nino-Southern Oscillation precipitation patterns. At the regional scale there are systematic differences with standard climatologies.

  4. A high-resolution European dataset for hydrologic modeling

    Ntegeka, Victor; Salamon, Peter; Gomes, Goncalo; Sint, Hadewij; Lorini, Valerio; Thielen, Jutta

    2013-04-01

    There is an increasing demand for large scale hydrological models not only in the field of modeling the impact of climate change on water resources but also for disaster risk assessments and flood or drought early warning systems. These large scale models need to be calibrated and verified against large amounts of observations in order to judge their capabilities to predict the future. However, the creation of large scale datasets is challenging for it requires collection, harmonization, and quality checking of large amounts of observations. For this reason, only a limited number of such datasets exist. In this work, we present a pan European, high-resolution gridded dataset of meteorological observations (EFAS-Meteo) which was designed with the aim to drive a large scale hydrological model. Similar European and global gridded datasets already exist, such as the HadGHCND (Caesar et al., 2006), the JRC MARS-STAT database (van der Goot and Orlandi, 2003) and the E-OBS gridded dataset (Haylock et al., 2008). However, none of those provide similarly high spatial resolution and/or a complete set of variables to force a hydrologic model. EFAS-Meteo contains daily maps of precipitation, surface temperature (mean, minimum and maximum), wind speed and vapour pressure at a spatial grid resolution of 5 x 5 km for the time period 1 January 1990 - 31 December 2011. It furthermore contains calculated radiation, which is calculated by using a staggered approach depending on the availability of sunshine duration, cloud cover and minimum and maximum temperature, and evapotranspiration (potential evapotranspiration, bare soil and open water evapotranspiration). The potential evapotranspiration was calculated using the Penman-Monteith equation with the above-mentioned meteorological variables. The dataset was created as part of the development of the European Flood Awareness System (EFAS) and has been continuously updated throughout the last years. The dataset variables are used as

  5. Visualization of conserved structures by fusing highly variable datasets.

    Silverstein, Jonathan C; Chhadia, Ankur; Dech, Fred

    2002-01-01

    Skill, effort, and time are required to identify and visualize anatomic structures in three-dimensions from radiological data. Fundamentally, automating these processes requires a technique that uses symbolic information not in the dynamic range of the voxel data. We were developing such a technique based on mutual information for automatic multi-modality image fusion (MIAMI Fuse, University of Michigan). This system previously demonstrated facility at fusing one voxel dataset with integrated symbolic structure information to a CT dataset (different scale and resolution) from the same person. The next step of development of our technique was aimed at accommodating the variability of anatomy from patient to patient by using warping to fuse our standard dataset to arbitrary patient CT datasets. A standard symbolic information dataset was created from the full color Visible Human Female by segmenting the liver parenchyma, portal veins, and hepatic veins and overwriting each set of voxels with a fixed color. Two arbitrarily selected patient CT scans of the abdomen were used for reference datasets. We used the warping functions in MIAMI Fuse to align the standard structure data to each patient scan. The key to successful fusion was the focused use of multiple warping control points that place themselves around the structure of interest automatically. The user assigns only a few initial control points to align the scans. Fusion 1 and 2 transformed the atlas with 27 points around the liver to CT1 and CT2 respectively. Fusion 3 transformed the atlas with 45 control points around the liver to CT1 and Fusion 4 transformed the atlas with 5 control points around the portal vein. The CT dataset is augmented with the transformed standard structure dataset, such that the warped structure masks are visualized in combination with the original patient dataset. This combined volume visualization is then rendered interactively in stereo on the ImmersaDesk in an immersive Virtual

  6. A cross-country Exchange Market Pressure (EMP) dataset.

    Desai, Mohit; Patnaik, Ila; Felman, Joshua; Shah, Ajay

    2017-06-01

    The data presented in this article are related to the research article titled - "An exchange market pressure measure for cross country analysis" (Patnaik et al. [1]). In this article, we present the dataset for Exchange Market Pressure values (EMP) for 139 countries along with their conversion factors, ρ (rho). Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values) for the point estimates of ρ 's. Using the standard errors of estimates of ρ 's, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.

  7. Hydrological simulation of the Brahmaputra basin using global datasets

    Bhattacharya, Biswa; Conway, Crystal; Craven, Joanne; Masih, Ilyas; Mazzolini, Maurizio; Shrestha, Shreedeepy; Ugay, Reyne; van Andel, Schalk Jan

    2017-04-01

    Brahmaputra River flows through China, India and Bangladesh to the Bay of Bengal and is one of the largest rivers of the world with a catchment size of 580K km2. The catchment is largely hilly and/or forested with sparse population and with limited urbanisation and economic activities. The catchment experiences heavy monsoon rainfall leading to very high flood discharges. Large inter-annual variation of discharge leading to flooding, erosion and morphological changes are among the major challenges. The catchment is largely ungauged; moreover, limited availability of hydro-meteorological data limits the possibility of carrying out evidence based research, which could provide trustworthy information for managing and when needed, controlling, the basin processes by the riparian countries for overall basin development. The paper presents initial results of a current research project on Brahmaputra basin. A set of hydrological and hydraulic models (SWAT, HMS, RAS) are developed by employing publicly available datasets of DEM, land use and soil and simulated using satellite based rainfall products, evapotranspiration and temperature estimates. Remotely sensed data are compared with sporadically available ground data. The set of models are able to produce catchment wide hydrological information that potentially can be used in the future in managing the basin's water resources. The model predications should be used with caution due to high level of uncertainty because the semi-calibrated models are developed with uncertain physical representation (e.g. cross-section) and simulated with global meteorological forcing (e.g. TRMM) with limited validation. Major scientific challenges are seen in producing robust information that can be reliably used in managing the basin. The information generated by the models are uncertain and as a result, instead of using them per se, they are used in improving the understanding of the catchment, and by running several scenarios with varying

  8. Dataset of herbarium specimens of threatened vascular plants in Catalonia.

    Nualart, Neus; Ibáñez, Neus; Luque, Pere; Pedrol, Joan; Vilar, Lluís; Guàrdia, Roser

    2017-01-01

    This data paper describes a specimens' dataset of the Catalonian threatened vascular plants conserved in five public Catalonian herbaria (BC, BCN, HGI, HBIL and MTTE). Catalonia is an administrative region of Spain that includes large autochthon plants diversity and 199 taxa with IUCN threatened categories (EX, EW, RE, CR, EN and VU). This dataset includes 1,618 records collected from 17 th century to nowadays. For each specimen, the species name, locality indication, collection date, collector, ecology and revision label are recorded. More than 94% of the taxa are represented in the herbaria, which evidence the paper of the botanical collections as an essential source of occurrence data.

  9. A Large-Scale 3D Object Recognition dataset

    Sølund, Thomas; Glent Buch, Anders; Krüger, Norbert

    2016-01-01

    geometric groups; concave, convex, cylindrical and flat 3D object models. The object models have varying amount of local geometric features to challenge existing local shape feature descriptors in terms of descriptiveness and robustness. The dataset is validated in a benchmark which evaluates the matching...... performance of 7 different state-of-the-art local shape descriptors. Further, we validate the dataset in a 3D object recognition pipeline. Our benchmark shows as expected that local shape feature descriptors without any global point relation across the surface have a poor matching performance with flat...

  10. Traffic sign classification with dataset augmentation and convolutional neural network

    Tang, Qing; Kurnianggoro, Laksono; Jo, Kang-Hyun

    2018-04-01

    This paper presents a method for traffic sign classification using a convolutional neural network (CNN). In this method, firstly we transfer a color image into grayscale, and then normalize it in the range (-1,1) as the preprocessing step. To increase robustness of classification model, we apply a dataset augmentation algorithm and create new images to train the model. To avoid overfitting, we utilize a dropout module before the last fully connection layer. To assess the performance of the proposed method, the German traffic sign recognition benchmark (GTSRB) dataset is utilized. Experimental results show that the method is effective in classifying traffic signs.

  11. Towards interoperable and reproducible QSAR analyses: Exchange of datasets.

    Spjuth, Ola; Willighagen, Egon L; Guha, Rajarshi; Eklund, Martin; Wikberg, Jarl Es

    2010-06-30

    QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but

  12. Towards interoperable and reproducible QSAR analyses: Exchange of datasets

    Spjuth Ola

    2010-06-01

    Full Text Available Abstract Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join

  13. The Wind Integration National Dataset (WIND) toolkit (Presentation)

    Caroline Draxl: NREL

    2014-01-01

    Regional wind integration studies require detailed wind power output data at many locations to perform simulations of how the power system will operate under high penetration scenarios. The wind datasets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as being time synchronized with available load profiles.As described in this presentation, the WIND Toolkit fulfills these requirements by providing a state-of-the-art national (US) wind resource, power production and forecast dataset.

  14. Experience and Its Generation

    Youqing, Chen

    2006-01-01

    Experience is an activity that arouses emotions and generates meanings based on vivid sensation and profound comprehension. It is emotional, meaningful, and personal, playing a key role in the course of forming and developing one's qualities. The psychological process of experience generation consists of such links as sensing things, arousing…

  15. Bulk Data Movement for Climate Dataset: Efficient Data Transfer Management with Dynamic Transfer Adjustment

    Sim, Alexander; Balman, Mehmet; Williams, Dean; Shoshani, Arie; Natarajan, Vijaya

    2010-01-01

    Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. One challenging issue in such efforts is the limited network capacity for moving large datasets to explore and manage. The Bulk Data Mover (BDM), a data transfer management tool in the Earth System Grid (ESG) community, has been managing the massive dataset transfers efficiently with the pre-configured transfer properties in the environment where the network bandwidth is limited. Dynamic transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environment as well as to control the data transfers for the desired transfer performance. We describe the results from the BDM transfer management for the climate datasets. We also describe the transfer estimation model and results from the dynamic transfer adjustment.

  16. Experiences and lessons learned from creating a generalized workflow for data publication of field campaign datasets

    Santhana Vannan, S. K.; Ramachandran, R.; Deb, D.; Beaty, T.; Wright, D.

    2017-12-01

    This paper summarizes the workflow challenges of curating and publishing data produced from disparate data sources and provides a generalized workflow solution to efficiently archive data generated by researchers. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) for biogeochemical dynamics and the Global Hydrology Resource Center (GHRC) DAAC have been collaborating on the development of a generalized workflow solution to efficiently manage the data publication process. The generalized workflow presented here are built on lessons learned from implementations of the workflow system. Data publication consists of the following steps: Accepting the data package from the data providers, ensuring the full integrity of the data files. Identifying and addressing data quality issues Assembling standardized, detailed metadata and documentation, including file level details, processing methodology, and characteristics of data files Setting up data access mechanisms Setup of the data in data tools and services for improved data dissemination and user experience Registering the dataset in online search and discovery catalogues Preserving the data location through Digital Object Identifiers (DOI) We will describe the steps taken to automate, and realize efficiencies to the above process. The goals of the workflow system are to reduce the time taken to publish a dataset, to increase the quality of documentation and metadata, and to track individual datasets through the data curation process. Utilities developed to achieve these goal will be described. We will also share metrics driven value of the workflow system and discuss the future steps towards creation of a common software framework.

  17. Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

    Lary, D. J.

    2013-12-01

    A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.

  18. Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: Automatic construction of onychomycosis datasets by region-based convolutional deep neural network.

    Seung Seog Han

    Full Text Available Although there have been reports of the successful diagnosis of skin disorders using deep learning, unrealistically large clinical image datasets are required for artificial intelligence (AI training. We created datasets of standardized nail images using a region-based convolutional neural network (R-CNN trained to distinguish the nail from the background. We used R-CNN to generate training datasets of 49,567 images, which we then used to fine-tune the ResNet-152 and VGG-19 models. The validation datasets comprised 100 and 194 images from Inje University (B1 and B2 datasets, respectively, 125 images from Hallym University (C dataset, and 939 images from Seoul National University (D dataset. The AI (ensemble model; ResNet-152 + VGG-19 + feedforward neural networks results showed test sensitivity/specificity/ area under the curve values of (96.0 / 94.7 / 0.98, (82.7 / 96.7 / 0.95, (92.3 / 79.3 / 0.93, (87.7 / 69.3 / 0.82 for the B1, B2, C, and D datasets. With a combination of the B1 and C datasets, the AI Youden index was significantly (p = 0.01 higher than that of 42 dermatologists doing the same assessment manually. For B1+C and B2+ D dataset combinations, almost none of the dermatologists performed as well as the AI. By training with a dataset comprising 49,567 images, we achieved a diagnostic accuracy for onychomycosis using deep learning that was superior to that of most of the dermatologists who participated in this study.

  19. Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: Automatic construction of onychomycosis datasets by region-based convolutional deep neural network.

    Han, Seung Seog; Park, Gyeong Hun; Lim, Woohyung; Kim, Myoung Shin; Na, Jung Im; Park, Ilwoo; Chang, Sung Eun

    2018-01-01

    Although there have been reports of the successful diagnosis of skin disorders using deep learning, unrealistically large clinical image datasets are required for artificial intelligence (AI) training. We created datasets of standardized nail images using a region-based convolutional neural network (R-CNN) trained to distinguish the nail from the background. We used R-CNN to generate training datasets of 49,567 images, which we then used to fine-tune the ResNet-152 and VGG-19 models. The validation datasets comprised 100 and 194 images from Inje University (B1 and B2 datasets, respectively), 125 images from Hallym University (C dataset), and 939 images from Seoul National University (D dataset). The AI (ensemble model; ResNet-152 + VGG-19 + feedforward neural networks) results showed test sensitivity/specificity/ area under the curve values of (96.0 / 94.7 / 0.98), (82.7 / 96.7 / 0.95), (92.3 / 79.3 / 0.93), (87.7 / 69.3 / 0.82) for the B1, B2, C, and D datasets. With a combination of the B1 and C datasets, the AI Youden index was significantly (p = 0.01) higher than that of 42 dermatologists doing the same assessment manually. For B1+C and B2+ D dataset combinations, almost none of the dermatologists performed as well as the AI. By training with a dataset comprising 49,567 images, we achieved a diagnostic accuracy for onychomycosis using deep learning that was superior to that of most of the dermatologists who participated in this study.

  20. Would the ‘real’ observed dataset stand up? A critical examination of eight observed gridded climate datasets for China

    Sun, Qiaohong; Miao, Chiyuan; Duan, Qingyun; Kong, Dongxian; Ye, Aizhong; Di, Zhenhua; Gong, Wei

    2014-01-01

    This research compared and evaluated the spatio-temporal similarities and differences of eight widely used gridded datasets. The datasets include daily precipitation over East Asia (EA), the Climate Research Unit (CRU) product, the Global Precipitation Climatology Centre (GPCC) product, the University of Delaware (UDEL) product, Precipitation Reconstruction over Land (PREC/L), the Asian Precipitation Highly Resolved Observational (APHRO) product, the Institute of Atmospheric Physics (IAP) dataset from the Chinese Academy of Sciences, and the National Meteorological Information Center dataset from the China Meteorological Administration (CN05). The meteorological variables focus on surface air temperature (SAT) or precipitation (PR) in China. All datasets presented general agreement on the whole spatio-temporal scale, but some differences appeared for specific periods and regions. On a temporal scale, EA shows the highest amount of PR, while APHRO shows the lowest. CRU and UDEL show higher SAT than IAP or CN05. On a spatial scale, the most significant differences occur in western China for PR and SAT. For PR, the difference between EA and CRU is the largest. When compared with CN05, CRU shows higher SAT in the central and southern Northwest river drainage basin, UDEL exhibits higher SAT over the Southwest river drainage system, and IAP has lower SAT in the Tibetan Plateau. The differences in annual mean PR and SAT primarily come from summer and winter, respectively. Finally, potential factors impacting agreement among gridded climate datasets are discussed, including raw data sources, quality control (QC) schemes, orographic correction, and interpolation techniques. The implications and challenges of these results for climate research are also briefly addressed. (paper)

  1. Dataset Documentation for the ERTMS-Oriented Signalling Maintenance in the Danish Railway System

    M. Pour, Shahrzad

    2017-01-01

    This documentation provides information about the dataset generated as part of a PhD thesis (Towards Signalling Maintenance Scheduling Problem for European Railway Traffic Management System ) for the signaling maintenance of the Danish railway system . The data instances (M.Pour 2018a; M.Pour 2018b......) have been created for the purpose of adaptation to the newest railway signalling standard which is so called European Railway Traffic Management System(ERTMS). The chapter provides explanation of the different types of maintenance tasks in the ERTMS, followed by data definition. Furthermore...

  2. Overview and Meteorological Validation of the Wind Integration National Dataset toolkit

    Draxl, C. [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Hodge, B. M. [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Clifton, A. [National Renewable Energy Laboratory (NREL), Golden, CO (United States); McCaa, J. [3TIER by VAisala, Seattle, WA (United States)

    2015-04-13

    The Wind Integration National Dataset (WIND) Toolkit described in this report fulfills these requirements, and constitutes a state-of-the-art national wind resource data set covering the contiguous United States from 2007 to 2013 for use in a variety of next-generation wind integration analyses and wind power planning. The toolkit is a wind resource data set, wind forecast data set, and wind power production and forecast data set derived from the Weather Research and Forecasting (WRF) numerical weather prediction model. WIND Toolkit data are available online for over 116,000 land-based and 10,000 offshore sites representing existing and potential wind facilities.

  3. An Ex Vivo Imaging Pipeline for Producing High- Quality and High-Resolution Diffusion-Weighted Imaging Datasets

    Dyrby, Tim Bjørn; Baaré, William F.C.; Alexander, Daniel C.

    2011-01-01

    Diffusion tensor (DT) imaging and related multifiber reconstruction algorithms allow the study of in vivo microstructure and, by means of tractography, structural connectivity. Although reconstruction algorithms are promising imaging tools, high‐quality diffusion‐weighted imaging (DWI) datasets...... complexity, to establish an ex vivo imaging pipeline for generating high‐quality DWI datasets. Perfusion fixation ensured that tissue characteristics were comparable to in vivo conditions. There were three main results: (i) heat conduction and unstable tissue mechanics accounted for time‐varying artefacts...... in the DWI dataset, which were present for up to 15 h after positioning brain tissue in the scanner; (ii) using fitted DT, q‐ball, and persistent angular structure magnetic resonance imaging algorithms, any b‐value between ∼2,000 and ∼8,000 s/mm2, with an optimal value around 4,000 s/mm2, allowed...

  4. Europeana Linked Open Data -- data.europeana.eu

    Isaac, A.H.J.C.A.; Haslhofer, B.

    2013-01-01

    Europeana is a single access point to millions of books, paintings, films, museum objects and archival records that have been digitized throughout Europe. The data.europeana.eu Linked Open Data pilot dataset contains open metadata on approximately 2.4 million texts, images, videos and sounds

  5. Applying linked data approaches to pharmacology: Architectural decisions and implementation

    Gray, A.J.G; Groth, P.T.; Loizou, A.; Askjaer, S; Brenninkmeijer, C; Burger, K.; Chichester, C.; Evelo, C.T.; Goble, C.A.; Harland, L; Pettifier, S; Thompson, M.; Waagmeester, A; William, A.J

    2014-01-01

    The discovery of new medicines requires pharmacologists to interact with a number of information sources ranging from tabular data to scientific papers, and other specialized formats. In this application report, we describe a linked data platform for integrating multiple pharmacology datasets that

  6. Linked data-as-a-service: The semantic web redeployed

    Rietveld, Laurens; Verborgh, Ruben; Beek, Wouter; Vander Sande, Miel; Schlobach, Stefan

    2015-01-01

    Ad-hoc querying is crucial to access information from Linked Data, yet publishing queryable RDF datasets on the Web is not a trivial exercise. The most compelling argument to support this claim is that the Web contains hundreds of thousands of data documents, while only 260 queryable SPARQL

  7. Exploring TED Talks as Linked Data for Education

    Taibi, Davide; Chawla, Saniya; Dietze, Stefan; Marenzi, Ivana; Fetahu, Besnik

    2015-01-01

    In this paper, we present the TED Talks dataset which exposes all metadata and the actual transcripts of available TED talks as structured Linked Data. The TED talks collection is composed of more than 1800 talks, along with 35?000 transcripts in over 30 languages, related to a wide range of topics. In this regard, TED talks metadata available in…

  8. SatelliteDL: a Toolkit for Analysis of Heterogeneous Satellite Datasets

    Galloy, M. D.; Fillmore, D.

    2014-12-01

    SatelliteDL is an IDL toolkit for the analysis of satellite Earth observations from a diverse set of platforms and sensors. The core function of the toolkit is the spatial and temporal alignment of satellite swath and geostationary data. The design features an abstraction layer that allows for easy inclusion of new datasets in a modular way. Our overarching objective is to create utilities that automate the mundane aspects of satellite data analysis, are extensible and maintainable, and do not place limitations on the analysis itself. IDL has a powerful suite of statistical and visualization tools that can be used in conjunction with SatelliteDL. Toward this end we have constructed SatelliteDL to include (1) HTML and LaTeX API document generation,(2) a unit test framework,(3) automatic message and error logs,(4) HTML and LaTeX plot and table generation, and(5) several real world examples with bundled datasets available for download. For ease of use, datasets, variables and optional workflows may be specified in a flexible format configuration file. Configuration statements may specify, for example, a region and date range, and the creation of images, plots and statistical summary tables for a long list of variables. SatelliteDL enforces data provenance; all data should be traceable and reproducible. The output NetCDF file metadata holds a complete history of the original datasets and their transformations, and a method exists to reconstruct a configuration file from this information. Release 0.1.0 distributes with ingest methods for GOES, MODIS, VIIRS and CERES radiance data (L1) as well as select 2D atmosphere products (L2) such as aerosol and cloud (MODIS and VIIRS) and radiant flux (CERES). Future releases will provide ingest methods for ocean and land surface products, gridded and time averaged datasets (L3 Daily, Monthly and Yearly), and support for 3D products such as temperature and water vapor profiles. Emphasis will be on NPP Sensor, Environmental and

  9. Using Real Datasets for Interdisciplinary Business/Economics Projects

    Goel, Rajni; Straight, Ronald L.

    2005-01-01

    The workplace's global and dynamic nature allows and requires improved approaches for providing business and economics education. In this article, the authors explore ways of enhancing students' understanding of course material by using nontraditional, real-world datasets of particular interest to them. Teaching at a historically Black university,…

  10. Dataset-driven research for improving recommender systems for learning

    Verbert, Katrien; Drachsler, Hendrik; Manouselis, Nikos; Wolpers, Martin; Vuorikari, Riina; Duval, Erik

    2011-01-01

    Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M., Vuorikari, R., & Duval, E. (2011). Dataset-driven research for improving recommender systems for learning. In Ph. Long, & G. Siemens (Eds.), Proceedings of 1st International Conference Learning Analytics & Knowledge (pp. 44-53). February,

  11. dataTEL - Datasets for Technology Enhanced Learning

    Drachsler, Hendrik; Verbert, Katrien; Sicilia, Miguel-Angel; Wolpers, Martin; Manouselis, Nikos; Vuorikari, Riina; Lindstaedt, Stefanie; Fischer, Frank

    2011-01-01

    Drachsler, H., Verbert, K., Sicilia, M. A., Wolpers, M., Manouselis, N., Vuorikari, R., Lindstaedt, S., & Fischer, F. (2011). dataTEL - Datasets for Technology Enhanced Learning. STELLAR Alpine Rendez-Vous White Paper. Alpine Rendez-Vous 2011 White paper collection, Nr. 13., France (2011)

  12. A dataset of forest biomass structure for Eurasia.

    Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael

    2017-05-16

    The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930-2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.

  13. A reanalysis dataset of the South China Sea

    Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

    2014-01-01

    Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992–2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability. PMID:25977803

  14. Comparision of analysis of the QTLMAS XII common dataset

    Crooks, Lucy; Sahana, Goutam; de Koning, Dirk-Jan

    2009-01-01

    As part of the QTLMAS XII workshop, a simulated dataset was distributed and participants were invited to submit analyses of the data based on genome-wide association, fine mapping and genomic selection. We have evaluated the findings from the groups that reported fine mapping and genome-wide asso...

  15. The LAMBADA dataset: Word prediction requiring a broad discourse context

    Paperno, D.; Kruszewski, G.; Lazaridou, A.; Pham, Q.N.; Bernardi, R.; Pezzelle, S.; Baroni, M.; Boleda, G.; Fernández, R.; Erk, K.; Smith, N.A.

    2016-01-01

    We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the

  16. NEW WEB-BASED ACCESS TO NUCLEAR STRUCTURE DATASETS.

    WINCHELL,D.F.

    2004-09-26

    As part of an effort to migrate the National Nuclear Data Center (NNDC) databases to a relational platform, a new web interface has been developed for the dissemination of the nuclear structure datasets stored in the Evaluated Nuclear Structure Data File and Experimental Unevaluated Nuclear Data List.

  17. Cross-Cultural Concept Mapping of Standardized Datasets

    Kano Glückstad, Fumiko

    2012-01-01

    This work compares four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain [1]. Here, datasets based...

  18. Level-1 muon trigger performance with the full 2017 dataset

    CMS Collaboration

    2018-01-01

    This document describes the performance of the CMS Level-1 Muon Trigger with the full dataset of 2017. Efficiency plots are included for each track finder (TF) individually and for the system as a whole. The efficiency is measured to be greater than 90% for all track finders.

  19. Evaluation of Uncertainty in Precipitation Datasets for New Mexico, USA

    Besha, A. A.; Steele, C. M.; Fernald, A.

    2014-12-01

    Climate change, population growth and other factors are endangering water availability and sustainability in semiarid/arid areas particularly in the southwestern United States. Wide coverage of spatial and temporal measurements of precipitation are key for regional water budget analysis and hydrological operations which themselves are valuable tool for water resource planning and management. Rain gauge measurements are usually reliable and accurate at a point. They measure rainfall continuously, but spatial sampling is limited. Ground based radar and satellite remotely sensed precipitation have wide spatial and temporal coverage. However, these measurements are indirect and subject to errors because of equipment, meteorological variability, the heterogeneity of the land surface itself and lack of regular recording. This study seeks to understand precipitation uncertainty and in doing so, lessen uncertainty propagation into hydrological applications and operations. We reviewed, compared and evaluated the TRMM (Tropical Rainfall Measuring Mission) precipitation products, NOAA's (National Oceanic and Atmospheric Administration) Global Precipitation Climatology Centre (GPCC) monthly precipitation dataset, PRISM (Parameter elevation Regression on Independent Slopes Model) data and data from individual climate stations including Cooperative Observer Program (COOP), Remote Automated Weather Stations (RAWS), Soil Climate Analysis Network (SCAN) and Snowpack Telemetry (SNOTEL) stations. Though not yet finalized, this study finds that the uncertainty within precipitation estimates datasets is influenced by regional topography, season, climate and precipitation rate. Ongoing work aims to further evaluate precipitation datasets based on the relative influence of these phenomena so that we can identify the optimum datasets for input to statewide water budget analysis.

  20. Dataset: Multi Sensor-Orientation Movement Data of Goats

    Kamminga, Jacob Wilhelm

    2018-01-01

    This is a labeled dataset. Motion data were collected from six sensor nodes that were fixed with different orientations to a collar around the neck of goats. These six sensor nodes simultaneously, with different orientations, recorded various activities performed by the goat. We recorded the

  1. A dataset of human decision-making in teamwork management

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-01

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.

  2. UK surveillance: provision of quality assured information from combined datasets.

    Paiba, G A; Roberts, S R; Houston, C W; Williams, E C; Smith, L H; Gibbens, J C; Holdship, S; Lysons, R

    2007-09-14

    Surveillance information is most useful when provided within a risk framework, which is achieved by presenting results against an appropriate denominator. Often the datasets are captured separately and for different purposes, and will have inherent errors and biases that can be further confounded by the act of merging. The United Kingdom Rapid Analysis and Detection of Animal-related Risks (RADAR) system contains data from several sources and provides both data extracts for research purposes and reports for wider stakeholders. Considerable efforts are made to optimise the data in RADAR during the Extraction, Transformation and Loading (ETL) process. Despite efforts to ensure data quality, the final dataset inevitably contains some data errors and biases, most of which cannot be rectified during subsequent analysis. So, in order for users to establish the 'fitness for purpose' of data merged from more than one data source, Quality Statements are produced as defined within the overarching surveillance Quality Framework. These documents detail identified data errors and biases following ETL and report construction as well as relevant aspects of the datasets from which the data originated. This paper illustrates these issues using RADAR datasets, and describes how they can be minimised.

  3. participatory development of a minimum dataset for the khayelitsha ...

    This dataset was integrated with data requirements at ... model for defining health information needs at district level. This participatory process has enabled health workers to appraise their .... of reproductive health, mental health, disability and community ... each chose a facilitator and met in between the forum meetings.

  4. The NASA Subsonic Jet Particle Image Velocimetry (PIV) Dataset

    Bridges, James; Wernet, Mark P.

    2011-01-01

    Many tasks in fluids engineering require prediction of turbulence of jet flows. The present document documents the single-point statistics of velocity, mean and variance, of cold and hot jet flows. The jet velocities ranged from 0.5 to 1.4 times the ambient speed of sound, and temperatures ranged from unheated to static temperature ratio 2.7. Further, the report assesses the accuracies of the data, e.g., establish uncertainties for the data. This paper covers the following five tasks: (1) Document acquisition and processing procedures used to create the particle image velocimetry (PIV) datasets. (2) Compare PIV data with hotwire and laser Doppler velocimetry (LDV) data published in the open literature. (3) Compare different datasets acquired at the same flow conditions in multiple tests to establish uncertainties. (4) Create a consensus dataset for a range of hot jet flows, including uncertainty bands. (5) Analyze this consensus dataset for self-consistency and compare jet characteristics to those of the open literature. The final objective was fulfilled by using the potential core length and the spread rate of the half-velocity radius to collapse of the mean and turbulent velocity fields over the first 20 jet diameters.

  5. A Hybrid Neuro-Fuzzy Model For Integrating Large Earth-Science Datasets

    Porwal, A.; Carranza, J.; Hale, M.

    2004-12-01

    A GIS-based hybrid neuro-fuzzy approach to integration of large earth-science datasets for mineral prospectivity mapping is described. It implements a Takagi-Sugeno type fuzzy inference system in the framework of a four-layered feed-forward adaptive neural network. Each unique combination of the datasets is considered a feature vector whose components are derived by knowledge-based ordinal encoding of the constituent datasets. A subset of feature vectors with a known output target vector (i.e., unique conditions known to be associated with either a mineralized or a barren location) is used for the training of an adaptive neuro-fuzzy inference system. Training involves iterative adjustment of parameters of the adaptive neuro-fuzzy inference system using a hybrid learning procedure for mapping each training vector to its output target vector with minimum sum of squared error. The trained adaptive neuro-fuzzy inference system is used to process all feature vectors. The output for each feature vector is a value that indicates the extent to which a feature vector belongs to the mineralized class or the barren class. These values are used to generate a prospectivity map. The procedure is demonstrated by an application to regional-scale base metal prospectivity mapping in a study area located in the Aravalli metallogenic province (western India). A comparison of the hybrid neuro-fuzzy approach with pure knowledge-driven fuzzy and pure data-driven neural network approaches indicates that the former offers a superior method for integrating large earth-science datasets for predictive spatial mathematical modelling.

  6. Quantifying selective reporting and the Proteus phenomenon for multiple datasets with similar bias.

    Thomas Pfeiffer

    2011-03-01

    Full Text Available Meta-analyses play an important role in synthesizing evidence from diverse studies and datasets that address similar questions. A major obstacle for meta-analyses arises from biases in reporting. In particular, it is speculated that findings which do not achieve formal statistical significance are less likely reported than statistically significant findings. Moreover, the patterns of bias can be complex and may also depend on the timing of the research results and their relationship with previously published work. In this paper, we present an approach that is specifically designed to analyze large-scale datasets on published results. Such datasets are currently emerging in diverse research fields, particularly in molecular medicine. We use our approach to investigate a dataset on Alzheimer's disease (AD that covers 1167 results from case-control studies on 102 genetic markers. We observe that initial studies on a genetic marker tend to be substantially more biased than subsequent replications. The chances for initial, statistically non-significant results to be published are estimated to be about 44% (95% CI, 32% to 63% relative to statistically significant results, while statistically non-significant replications have almost the same chance to be published as statistically significant replications (84%; 95% CI, 66% to 107%. Early replications tend to be biased against initial findings, an observation previously termed Proteus phenomenon: The chances for non-significant studies going in the same direction as the initial result are estimated to be lower than the chances for non-significant studies opposing the initial result (73%; 95% CI, 55% to 96%. Such dynamic patterns in bias are difficult to capture by conventional methods, where typically simple publication bias is assumed to operate. Our approach captures and corrects for complex dynamic patterns of bias, and thereby helps generating conclusions from published results that are more robust

  7. Data Recommender: An Alternative Way to Discover Open Scientific Datasets

    Klump, J. F.; Devaraju, A.; Williams, G.; Hogan, D.; Davy, R.; Page, J.; Singh, D.; Peterson, N.

    2017-12-01

    Over the past few years, institutions and government agencies have adopted policies to openly release their data, which has resulted in huge amounts of open data becoming available on the web. When trying to discover the data, users face two challenges: an overload of choice and the limitations of the existing data search tools. On the one hand, there are too many datasets to choose from, and therefore, users need to spend considerable effort to find the datasets most relevant to their research. On the other hand, data portals commonly offer keyword and faceted search, which depend fully on the user queries to search and rank relevant datasets. Consequently, keyword and faceted search may return loosely related or irrelevant results, although the results may contain the same query. They may also return highly specific results that depend more on how well metadata was authored. They do not account well for variance in metadata due to variance in author styles and preferences. The top-ranked results may also come from the same data collection, and users are unlikely to discover new and interesting datasets. These search modes mainly suits users who can express their information needs in terms of the structure and terminology of the data portals, but may pose a challenge otherwise. The above challenges reflect that we need a solution that delivers the most relevant (i.e., similar and serendipitous) datasets to users, beyond the existing search functionalities on the portals. A recommender system is an information filtering system that presents users with relevant and interesting contents based on users' context and preferences. Delivering data recommendations to users can make data discovery easier, and as a result may enhance user engagement with the portal. We developed a hybrid data recommendation approach for the CSIRO Data Access Portal. The approach leverages existing recommendation techniques (e.g., content-based filtering and item co-occurrence) to produce

  8. Generating Contextual Descriptions of Virtual Reality (VR) Spaces

    Olson, D. M.; Zaman, C. H.; Sutherland, A.

    2017-12-01

    Virtual reality holds great potential for science communication, education, and research. However, interfaces for manipulating data and environments in virtual worlds are limited and idiosyncratic. Furthermore, speech and vision are the primary modalities by which humans collect information about the world, but the linking of visual and natural language domains is a relatively new pursuit in computer vision. Machine learning techniques have been shown to be effective at image and speech classification, as well as at describing images with language (Karpathy 2016), but have not yet been used to describe potential actions. We propose a technique for creating a library of possible context-specific actions associated with 3D objects in immersive virtual worlds based on a novel dataset generated natively in virtual reality containing speech, image, gaze, and acceleration data. We will discuss the design and execution of a user study in virtual reality that enabled the collection and the development of this dataset. We will also discuss the development of a hybrid machine learning algorithm linking vision data with environmental affordances in natural language. Our findings demonstrate that it is possible to develop a model which can generate interpretable verbal descriptions of possible actions associated with recognized 3D objects within immersive VR environments. This suggests promising applications for more intuitive user interfaces through voice interaction within 3D environments. It also demonstrates the potential to apply vast bodies of embodied and semantic knowledge to enrich user interaction within VR environments. This technology would allow for applications such as expert knowledge annotation of 3D environments, complex verbal data querying and object manipulation in virtual spaces, and computer-generated, dynamic 3D object affordances and functionality during simulations.

  9. Discovering New Global Climate Patterns: Curating a 21-Year High Temporal (Hourly) and Spatial (40km) Resolution Reanalysis Dataset

    Hou, C. Y.; Dattore, R.; Peng, G. S.

    2014-12-01

    The National Center for Atmospheric Research's Global Climate Four-Dimensional Data Assimilation (CFDDA) Hourly 40km Reanalysis dataset is a dynamically downscaled dataset with high temporal and spatial resolution. The dataset contains three-dimensional hourly analyses in netCDF format for the global atmospheric state from 1985 to 2005 on a 40km horizontal grid (0.4°grid increment) with 28 vertical levels, providing good representation of local forcing and diurnal variation of processes in the planetary boundary layer. This project aimed to make the dataset publicly available, accessible, and usable in order to provide a unique resource to allow and promote studies of new climate characteristics. When the curation project started, it had been five years since the data files were generated. Also, although the Principal Investigator (PI) had generated a user document at the end of the project in 2009, the document had not been maintained. Furthermore, the PI had moved to a new institution, and the remaining team members were reassigned to other projects. These factors made data curation in the areas of verifying data quality, harvest metadata descriptions, documenting provenance information especially challenging. As a result, the project's curation process found that: Data curator's skill and knowledge helped make decisions, such as file format and structure and workflow documentation, that had significant, positive impact on the ease of the dataset's management and long term preservation. Use of data curation tools, such as the Data Curation Profiles Toolkit's guidelines, revealed important information for promoting the data's usability and enhancing preservation planning. Involving data curators during each stage of the data curation life cycle instead of at the end could improve the curation process' efficiency. Overall, the project showed that proper resources invested in the curation process would give datasets the best chance to fulfill their potential to

  10. Comparison of global 3-D aviation emissions datasets

    S. C. Olsen

    2013-01-01

    Full Text Available Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NOx, sulfur oxides, carbon monoxide (CO, and hydrocarbons (HC impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NOx emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NOx emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly smaller in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NOx with regional peaks over the populated land masses of North America, Europe, and East Asia. For CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution

  11. Development of Gridded Ensemble Precipitation and Temperature Datasets for the Contiguous United States Plus Hawai'i and Alaska

    Newman, A. J.; Clark, M. P.; Nijssen, B.; Wood, A.; Gutmann, E. D.; Mizukami, N.; Longman, R. J.; Giambelluca, T. W.; Cherry, J.; Nowak, K.; Arnold, J.; Prein, A. F.

    2016-12-01

    Gridded precipitation and temperature products are inherently uncertain due to myriad factors. These include interpolation from a sparse observation network, measurement representativeness, and measurement errors. Despite this inherent uncertainty, uncertainty is typically not included, or is a specific addition to each dataset without much general applicability across different datasets. A lack of quantitative uncertainty estimates for hydrometeorological forcing fields limits their utility to support land surface and hydrologic modeling techniques such as data assimilation, probabilistic forecasting and verification. To address this gap, we have developed a first of its kind gridded, observation-based ensemble of precipitation and temperature at a daily increment for the period 1980-2012 over the United States (including Alaska and Hawaii). A longer, higher resolution version (1970-present, 1/16th degree) has also been implemented to support real-time hydrologic- monitoring and prediction in several regional US domains. We will present the development and evaluation of the dataset, along with initial applications of the dataset for ensemble data assimilation and probabilistic evaluation of high resolution regional climate model simulations. We will also present results on the new high resolution products for Alaska and Hawaii (2 km and 250 m respectively), to complete the first ensemble observation based product suite for the entire 50 states. Finally, we will present plans to improve the ensemble dataset, focusing on efforts to improve the methods used for station interpolation and ensemble generation, as well as methods to fuse station data with numerical weather prediction model output.

  12. A collection of annotated and harmonized human breast cancer transcriptome datasets, including immunologic classification [version 2; referees: 2 approved

    Jessica Roelands

    2018-02-01

    Full Text Available The increased application of high-throughput approaches in translational research has expanded the number of publicly available data repositories. Gathering additional valuable information contained in the datasets represents a crucial opportunity in the biomedical field. To facilitate and stimulate utilization of these datasets, we have recently developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB. In this note, we describe a curated compendium of 13 public datasets on human breast cancer, representing a total of 2142 transcriptome profiles. We classified the samples according to different immune based classification systems and integrated this information into the datasets. Annotated and harmonized datasets were uploaded to GXB. Study samples were categorized in different groups based on their immunologic tumor response profiles, intrinsic molecular subtypes and multiple clinical parameters. Ranked gene lists were generated based on relevant group comparisons. In this data note, we demonstrate the utility of GXB to evaluate the expression of a gene of interest, find differential gene expression between groups and investigate potential associations between variables with a specific focus on immunologic classification in breast cancer. This interactive resource is publicly available online at: http://breastcancer.gxbsidra.org/dm3/geneBrowser/list.

  13. On sample size and different interpretations of snow stability datasets

    Schirmer, M.; Mitterer, C.; Schweizer, J.

    2009-04-01

    Interpretations of snow stability variations need an assessment of the stability itself, independent of the scale investigated in the study. Studies on stability variations at a regional scale have often chosen stability tests such as the Rutschblock test or combinations of various tests in order to detect differences in aspect and elevation. The question arose: ‘how capable are such stability interpretations in drawing conclusions'. There are at least three possible errors sources: (i) the variance of the stability test itself; (ii) the stability variance at an underlying slope scale, and (iii) that the stability interpretation might not be directly related to the probability of skier triggering. Various stability interpretations have been proposed in the past that provide partly different results. We compared a subjective one based on expert knowledge with a more objective one based on a measure derived from comparing skier-triggered slopes vs. slopes that have been skied but not triggered. In this study, the uncertainties are discussed and their effects on regional scale stability variations will be quantified in a pragmatic way. An existing dataset with very large sample sizes was revisited. This dataset contained the variance of stability at a regional scale for several situations. The stability in this dataset was determined using the subjective interpretation scheme based on expert knowledge. The question to be answered was how many measurements were needed to obtain similar results (mainly stability differences in aspect or elevation) as with the complete dataset. The optimal sample size was obtained in several ways: (i) assuming a nominal data scale the sample size was determined with a given test, significance level and power, and by calculating the mean and standard deviation of the complete dataset. With this method it can also be determined if the complete dataset consists of an appropriate sample size. (ii) Smaller subsets were created with similar

  14. Lysosomal ceramide generated by acid sphingomyelinase triggers cytosolic cathepsin B-mediated degradation of X-linked inhibitor of apoptosis protein in natural killer/T lymphoma cell apoptosis

    Taniguchi, M; Ogiso, H; Takeuchi, T; Kitatani, K; Umehara, H; Okazaki, T

    2015-01-01

    We previously reported that IL-2 deprivation induced acid sphingomyelinase-mediated (ASM-mediated) ceramide elevation and apoptosis in an NK/T lymphoma cell line KHYG-1. However, the molecular mechanism of ASM?ceramide-mediated apoptosis during IL-2 deprivation is poorly understood. Here, we showed that IL-2 deprivation induces caspase-dependent apoptosis characterized by phosphatidylserine externalization, caspase-8, -9, and -3 cleavage, and degradation of X-linked inhibitor of apoptosis pro...

  15. Two Players Make a Formidable Combination: In Situ Generated Poly(acrylic anhydride-2-methyl-acrylic acid-2-oxirane-ethyl ester-methyl methacrylate) Cross-Linking Gel Polymer Electrolyte toward 5 V High-Voltage Batteries.

    Ma, Yue; Ma, Jun; Chai, Jingchao; Liu, Zhihong; Ding, Guoliang; Xu, Gaojie; Liu, Haisheng; Chen, Bingbing; Zhou, Xinhong; Cui, Guanglei; Chen, Liquan

    2017-11-29

    Electrochemical performance of high-voltage lithium batteries with high energy density is limited because of the electrolyte instability and the electrode/electrolyte interfacial reactivity. Hence, a cross-linking polymer network of poly(acrylic anhydride-2-methyl-acrylic acid-2-oxirane-ethyl ester-methyl methacrylate) (PAMM)-based electrolyte was introduced via in situ polymerization inspired by "shuangjian hebi", which is a statement in a traditional Chinese Kungfu story similar to the synergetic effect of 1 + 1 > 2. A poly(acrylic anhydride) and poly(methyl methacrylate)-based system is very promising as electrolyte materials for lithium-ion batteries, in which the anhydride and acrylate groups can provide high voltage resistance and fast ionic conductivity, respectively. As a result, the cross-linking PAMM-based electrolyte possesses a significant comprehensive enhancement, including electrochemical stability window exceeding 5 V vs Li + /Li, an ionic conductivity of 6.79 × 10 -4 S cm -1 at room temperature, high mechanical strength (27.5 MPa), good flame resistance, and excellent interface compatibility with Li metal. It is also demonstrated that this gel polymer electrolyte suppresses the negative effect resulting from dissolution of Mn 2+ ions at 25 and 55 °C. Thus, the LiNi 0.5 Mn 1.5 O 4 /Li and LiNi 0.5 Mn 1.5 O 4 /Li 4 Ti 5 O 12 cells using the optimized in situ polymerized cross-linking PAMM-based gel polymer electrolyte deliver stable charging/discharging profiles and excellent rate performance at room temperature and even at 55 °C. These findings suggest that the cross-linking PAMM is an intriguing candidate for 5 V class high-voltage gel polymer electrolyte toward high-energy lithium-on batteries.

  16. Fully automated pipeline for detection of sex linked genes using RNA-Seq data.

    Michalovova, Monika; Kubat, Zdenek; Hobza, Roman; Vyskot, Boris; Kejnovsky, Eduard

    2015-03-11

    Sex chromosomes present a genomic region which to some extent, differs between the genders of a single species. Reliable high-throughput methods for detection of sex chromosomes specific markers are needed, especially in species where genome information is limited. Next generation sequencing (NGS) opens the door for identification of unique sequences or searching for nucleotide polymorphisms between datasets. A combination of classical genetic segregation analysis along with RNA-Seq data can present an ideal tool to map and identify sex chromosome-specific expressed markers. To address this challenge, we established genetic cross of dioecious plant Rumex acetosa and generated RNA-Seq data from both parental generation and male and female offspring. We present a pipeline for detection of sex linked genes based on nucleotide polymorphism analysis. In our approach, tracking of nucleotide polymorphisms is carried out using a cross of preferably distant populations. For this reason, only 4 datasets are needed - reads from high-throughput sequencing platforms for parent generation (mother and father) and F1 generation (male and female progeny). Our pipeline uses custom scripts together with external assembly, mapping and variant calling software. Given the resource-intensive nature of the computation, servers with high capacity are a requirement. Therefore, in order to keep this pipeline easily accessible and reproducible, we implemented it in Galaxy - an open, web-based platform for data-intensive biomedical research. Our tools are present in the Galaxy Tool Shed, from which they can be installed to any local Galaxy instance. As an output of the pipeline, user gets a FASTA file with candidate transcriptionally active sex-linked genes, sorted by their relevance. At the same time, a BAM file with identified genes and alignment of reads is also provided. Thus, polymorphisms following segregation pattern can be easily visualized, which significantly enhances primer design

  17. Estimation of Missed Statin Prescription Use in an Administrative Claims Dataset.

    Wade, Rolin L; Patel, Jeetvan G; Hill, Jerrold W; De, Ajita P; Harrison, David J

    2017-09-01

    Nonadherence to statin medications is associated with increased risk of cardiovascular disease and poses a challenge to lipid management in patients who are at risk for atherosclerotic cardiovascular disease. Numerous studies have examined statin adherence based on administrative claims data; however, these data may underestimate statin use in patients who participate in generic drug discount programs or who have alternative coverage. To estimate the proportion of patients with missing statin claims in a claims database and determine how missing claims affect commonly used utilization metrics. This retrospective cohort study used pharmacy data from the PharMetrics Plus (P+) claims dataset linked to the IMS longitudinal pharmacy point-of-sale prescription database (LRx) from January 1, 2012, through December 31, 2014. Eligible patients were represented in the P+ and LRx datasets, had ≥1 claim for a statin (index claim) in either database, and had ≥ 24 months of continuous enrollment in P+. Patients were linked between P+ and LRx using a deterministic method. Duplicate claims between LRx and P+ were removed to produce a new dataset comprised of P+ claims augmented with LRx claims. Statin use was then compared between P+ and the augmented P+ dataset. Utilization metrics that were evaluated included percentage of patients with ≥ 1 missing statin claim over 12 months in P+; the number of patients misclassified as new users in P+; the number of patients misclassified as nonstatin users in P+; the change in 12-month medication possession ratio (MPR) and proportion of days covered (PDC) in P+; the comparison between P+ and LRx of classifications of statin treatment patterns (statin intensity and patients with treatment modifications); and the payment status for missing statin claims. Data from 965,785 patients with statin claims in P+ were analyzed (mean age 56.6 years; 57% male). In P+, 20.1% had ≥ 1 missing statin claim post-index; 13.7% were misclassified as

  18. Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

    Ma, X.

    2014-12-01

    Knowledge evolves in geoscience, and the evolution is reflected in datasets. In a context with distributed data sources, the evolution of knowledge may cause considerable challenges to data management and re-use. For example, a short news published in 2009 (Mascarelli, 2009) revealed the geoscience community's concern that the International Commission on Stratigraphy's change to the definition of Quaternary may bring heavy reworking of geologic maps. Now we are in the era of the World Wide Web, and geoscience knowledge is increasingly modeled and encoded in the form of ontologies and vocabularies by using semantic technologies. Accordingly, knowledge evolution leads to a consequence called ontology dynamics. Flouris et al. (2008) summarized 10 topics of general ontology changes/dynamics such as: ontology mapping, morphism, evolution, debugging and versioning, etc. Ontology dynamics makes impacts at several stages of a data life cycle and causes challenges, such as: the request for reworking of the extant data in a data center, semantic mismatch among data sources, differentiated understanding of a same piece of dataset between data providers and data users, as well as error propagation in cross-discipline data discovery and re-use (Ma et al., 2014). This presentation will analyze the best practices in the geoscience community so far and summarize a few recommendations to reduce the negative impacts of ontology dynamics in a data life cycle, including: communities of practice and collaboration on ontology and vocabulary building, link data records to standardized terms, and methods for (semi-)automatic reworking of datasets using semantic technologies. References: Flouris, G., Manakanatas, D., Kondylakis, H., Plexousakis, D., Antoniou, G., 2008. Ontology change: classification and survey. The Knowledge Engineering Review 23 (2), 117-152. Ma, X., Fox, P., Rozell, E., West, P., Zednik, S., 2014. Ontology dynamics in a data life cycle: Challenges and recommendations

  19. Context-Aware Generative Adversarial Privacy

    Chong Huang

    2017-12-01

    Full Text Available Preserving the utility of published datasets while simultaneously providing provable privacy guarantees is a well-known challenge. On the one hand, context-free privacy solutions, such as differential privacy, provide strong privacy guarantees, but often lead to a significant reduction in utility. On the other hand, context-aware privacy solutions, such as information theoretic privacy, achieve an improved privacy-utility tradeoff, but assume that the data holder has access to dataset statistics. We circumvent these limitations by introducing a novel context-aware privacy framework called generative adversarial privacy (GAP. GAP leverages recent advancements in generative adversarial networks (GANs to allow the data holder to learn privatization schemes from the dataset itself. Under GAP, learning the privacy mechanism is formulated as a constrained minimax game between two players: a privatizer that sanitizes the dataset in a way that limits the risk of inference attacks on the individuals’ private variables, and an adversary that tries to infer the private variables from the sanitized dataset. To evaluate GAP’s performance, we investigate two simple (yet canonical statistical dataset models: (a the binary data model; and (b the binary Gaussian mixture model. For both models, we derive game-theoretically optimal minimax privacy mechanisms, and show that the privacy mechanisms learned from data (in a generative adversarial fashion match the theoretically optimal ones. This demonstrates that our framework can be easily applied in practice, even in the absence of dataset statistics.

  20. Context-Aware Generative Adversarial Privacy

    Huang, Chong; Kairouz, Peter; Chen, Xiao; Sankar, Lalitha; Rajagopal, Ram

    2017-12-01

    Preserving the utility of published datasets while simultaneously providing provable privacy guarantees is a well-known challenge. On the one hand, context-free privacy solutions, such as differential privacy, provide strong privacy guarantees, but often lead to a significant reduction in utility. On the other hand, context-aware privacy solutions, such as information theoretic privacy, achieve an improved privacy-utility tradeoff, but assume that the data holder has access to dataset statistics. We circumvent these limitations by introducing a novel context-aware privacy framework called generative adversarial privacy (GAP). GAP leverages recent advancements in generative adversarial networks (GANs) to allow the data holder to learn privatization schemes from the dataset itself. Under GAP, learning the privacy mechanism is formulated as a constrained minimax game between two players: a privatizer that sanitizes the dataset in a way that limits the risk of inference attacks on the individuals' private variables, and an adversary that tries to infer the private variables from the sanitized dataset. To evaluate GAP's performance, we investigate two simple (yet canonical) statistical dataset models: (a) the binary data model, and (b) the binary Gaussian mixture model. For both models, we derive game-theoretically optimal minimax privacy mechanisms, and show that the privacy mechanisms learned from data (in a generative adversarial fashion) match the theoretically optimal ones. This demonstrates that our framework can be easily applied in practice, even in the absence of dataset statistics.

  1. Light-induced cross-linking and post-cross-linking modification of polyglycidol.

    Marquardt, F; Bruns, M; Keul, H; Yagci, Y; Möller, M

    2018-02-08

    The photoinduced radical generation process has received renewed interest due to its economic and ecological appeal. Herein the light-induced cross-linking of functional polyglycidol and its post-cross-linking modification are presented. Linear polyglycidol was first functionalized with a tertiary amine in a two-step reaction. Dimethylaminopropyl functional polyglycidol was cross-linked in a UV-light mediated reaction with camphorquinone as a type II photoinitiator. The cross-linked polyglycidol was further functionalized by quaternization with various organoiodine compounds. Aqueous dispersions of the cross-linked polymers were investigated by means of DLS and zeta potential measurements. Polymer films were evaluated by DSC and XPS.

  2. Publishing descriptions of non-public clinical datasets: proposed guidance for researchers, repositories, editors and funding organisations.

    Hrynaszkiewicz, Iain; Khodiyar, Varsha; Hufton, Andrew L; Sansone, Susanna-Assunta

    2016-01-01

    Sharing of experimental clinical research data usually happens between individuals or research groups rather than via public repositories, in part due to the need to protect research participant privacy. This approach to data sharing makes it difficult to connect journal articles with their underlying datasets and is often insufficient for ensuring access to data in the long term. Voluntary data sharing services such as the Yale Open Data Access (YODA) and Clinical Study Data Request (CSDR) projects have increased accessibility to clinical datasets for secondary uses while protecting patient privacy and the legitimacy of secondary analyses but these resources are generally disconnected from journal articles-where researchers typically search for reliable information to inform future research. New scholarly journal and article types dedicated to increasing accessibility of research data have emerged in recent years and, in general, journals are developing stronger links with data repositories. There is a need for increased collaboration between journals, data repositories, researchers, funders, and voluntary data sharing services to increase the visibility and reliability of clinical research. Using the journal Scientific Data as a case study, we propose and show examples of changes to the format and peer-review process for journal articles to more robustly link them to data that are only available on request. We also propose additional features for data repositories to better accommodate non-public clinical datasets, including Data Use Agreements (DUAs).

  3. Auxiliary Deep Generative Models

    Maaløe, Lars; Sønderby, Casper Kaae; Sønderby, Søren Kaae

    2016-01-01

    Deep generative models parameterized by neural networks have recently achieved state-of-the-art performance in unsupervised and semi-supervised learning. We extend deep generative models with auxiliary variables which improves the variational approximation. The auxiliary variables leave...... the generative model unchanged but make the variational distribution more expressive. Inspired by the structure of the auxiliary variable we also propose a model with two stochastic layers and skip connections. Our findings suggest that more expressive and properly specified deep generative models converge...... faster with better results. We show state-of-the-art performance within semi-supervised learning on MNIST (0.96%), SVHN (16.61%) and NORB (9.40%) datasets....

  4. A curated compendium of monocyte transcriptome datasets of relevance to human monocyte immunobiology research [version 2; referees: 2 approved

    Darawan Rinchai

    2016-04-01

    Full Text Available Systems-scale profiling approaches have become widely used in translational research settings. The resulting accumulation of large-scale datasets in public repositories represents a critical opportunity to promote insight and foster knowledge discovery. However, resources that can serve as an interface between biomedical researchers and such vast and heterogeneous dataset collections are needed in order to fulfill this potential. Recently, we have developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB. This tool can be used to overlay deep molecular phenotyping data with rich contextual information about analytes, samples and studies along with ancillary clinical or immunological profiling data. In this note, we describe a curated compendium of 93 public datasets generated in the context of human monocyte immunological studies, representing a total of 4,516 transcriptome profiles. Datasets were uploaded to an instance of GXB along with study description and sample annotations. Study samples were arranged in different groups. Ranked gene lists were generated based on relevant group comparisons. This resource is publicly available online at http://monocyte.gxbsidra.org/dm3/landing.gsp.

  5. A dataset of multiresolution functional brain parcellations in an elderly population with no or mild cognitive impairment

    Angela Tam

    2016-12-01

    Full Text Available We present group eight resolutions of brain parcellations for clusters generated from resting-state functional magnetic resonance images for 99 cognitively normal elderly persons and 129 patients with mild cognitive impairment, pooled from four independent datasets. This dataset was generated as part of the following study: Common Effects of Amnestic Mild Cognitive Impairment on Resting-State Connectivity Across Four Independent Studies (Tam et al., 2015 [1]. The brain parcellations have been registered to both symmetric and asymmetric MNI brain templates and generated using a method called bootstrap analysis of stable clusters (BASC (Bellec et al., 2010 [2]. We present two variants of these parcellations. One variant contains bihemisphereic parcels (4, 6, 12, 22, 33, 65, 111, and 208 total parcels across eight resolutions. The second variant contains spatially connected regions of interest (ROIs that span only one hemisphere (10, 17, 30, 51, 77, 199, and 322 total ROIs across eight resolutions. We also present maps illustrating functional connectivity differences between patients and controls for four regions of interest (striatum, dorsal prefrontal cortex, middle temporal lobe, and medial frontal cortex. The brain parcels and associated statistical maps have been publicly released as 3D volumes, available in .mnc and .nii file formats on figshare and on Neurovault. Finally, the code used to generate this dataset is available on Github.

  6. Meta-Analysis of High-Throughput Datasets Reveals Cellular Responses Following Hemorrhagic Fever Virus Infection

    Gavin C. Bowick

    2011-05-01

    Full Text Available The continuing use of high-throughput assays to investigate cellular responses to infection is providing a large repository of information. Due to the large number of differentially expressed transcripts, often running into the thousands, the majority of these data have not been thoroughly investigated. Advances in techniques for the downstream analysis of high-throughput datasets are providing additional methods for the generation of additional hypotheses for further investigation. The large number of experimental observations, combined with databases that correlate particular genes and proteins with canonical pathways, functions and diseases, allows for the bioinformatic exploration of functional networks that may be implicated in replication or pathogenesis. Herein, we provide an example of how analysis of published high-throughput datasets of cellular responses to hemorrhagic fever virus infection can generate additional functional data. We describe enrichment of genes involved in metabolism, post-translational modification and cardiac damage; potential roles for specific transcription factors and a conserved involvement of a pathway based around cyclooxygenase-2. We believe that these types of analyses can provide virologists with additional hypotheses for continued investigation.

  7. A multimodal MRI dataset of professional chess players.

    Li, Kaiming; Jiang, Jing; Qiu, Lihua; Yang, Xun; Huang, Xiaoqi; Lui, Su; Gong, Qiyong

    2015-01-01

    Chess is a good model to study high-level human brain functions such as spatial cognition, memory, planning, learning and problem solving. Recent studies have demonstrated that non-invasive MRI techniques are valuable for researchers to investigate the underlying neural mechanism of playing chess. For professional chess players (e.g., chess grand masters and masters or GM/Ms), what are the structural and functional alterations due to long-term professional practice, and how these alterations relate to behavior, are largely veiled. Here, we report a multimodal MRI dataset from 29 professional Chinese chess players (most of whom are GM/Ms), and 29 age matched novices. We hope that this dataset will provide researchers with new materials to further explore high-level human brain functions.

  8. Knowledge discovery with classification rules in a cardiovascular dataset.

    Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan

    2005-12-01

    In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.

  9. An integrated dataset for in silico drug discovery

    Cockell Simon J

    2010-12-01

    Full Text Available Drug development is expensive and prone to failure. It is potentially much less risky and expensive to reuse a drug developed for one condition for treating a second disease, than it is to develop an entirely new compound. Systematic approaches to drug repositioning are needed to increase throughput and find candidates more reliably. Here we address this need with an integrated systems biology dataset, developed using the Ondex data integration platform, for the in silico discovery of new drug repositioning candidates. We demonstrate that the information in this dataset allows known repositioning examples to be discovered. We also propose a means of automating the search for new treatment indications of existing compounds.

  10. Application of Density Estimation Methods to Datasets from a Glider

    2014-09-30

    humpback and sperm whales as well as different dolphin species. OBJECTIVES The objective of this research is to extend existing methods for cetacean...collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources...estimation from single sensor datasets. Required steps for a cue counting approach, where a cue has been defined as a clicking event (Küsel et al., 2011), to

  11. A review of continent scale hydrological datasets available for Africa

    Bonsor, H.C.

    2010-01-01

    As rainfall becomes less reliable with predicted climate change the ability to assess the spatial and seasonal variations in groundwater availability on a large-scale (catchment and continent) is becoming increasingly important (Bates, et al. 2007; MacDonald et al. 2009). The scarcity of observed hydrological data, or difficulty in obtaining such data, within Africa means remotely sensed (RS) datasets must often be used to drive large-scale hydrological models. The different ap...

  12. Dataset of mitochondrial genome variants in oncocytic tumors

    Lihua Lyu

    2018-04-01

    Full Text Available This dataset presents the mitochondrial genome variants associated with oncocytic tumors. These data were obtained by Sanger sequencing of the whole mitochondrial genomes of oncocytic tumors and the adjacent normal tissues from 32 patients. The mtDNA variants are identified after compared with the revised Cambridge sequence, excluding those defining haplogroups of our patients. The pathogenic prediction for the novel missense variants found in this study was performed with the Mitimpact 2 program.

  13. Soil chemistry in lithologically diverse datasets: the quartz dilution effect

    Bern, Carleton R.

    2009-01-01

    National- and continental-scale soil geochemical datasets are likely to move our understanding of broad soil geochemistry patterns forward significantly. Patterns of chemistry and mineralogy delineated from these datasets are strongly influenced by the composition of the soil parent material, which itself is largely a function of lithology and particle size sorting. Such controls present a challenge by obscuring subtler patterns arising from subsequent pedogenic processes. Here the effect of quartz concentration is examined in moist-climate soils from a pilot dataset of the North American Soil Geochemical Landscapes Project. Due to variable and high quartz contents (6.2–81.7 wt.%), and its residual and inert nature in soil, quartz is demonstrated to influence broad patterns in soil chemistry. A dilution effect is observed whereby concentrations of various elements are significantly and strongly negatively correlated with quartz. Quartz content drives artificial positive correlations between concentrations of some elements and obscures negative correlations between others. Unadjusted soil data show the highly mobile base cations Ca, Mg, and Na to be often strongly positively correlated with intermediately mobile Al or Fe, and generally uncorrelated with the relatively immobile high-field-strength elements (HFS) Ti and Nb. Both patterns are contrary to broad expectations for soils being weathered and leached. After transforming bulk soil chemistry to a quartz-free basis, the base cations are generally uncorrelated with Al and Fe, and negative correlations generally emerge with the HFS elements. Quartz-free element data may be a useful tool for elucidating patterns of weathering or parent-material chemistry in large soil datasets.

  14. Dataset on records of Hericium erinaceus in Slovakia

    Vladimír Kunca; Marek Čiliak

    2017-01-01

    The data presented in this article are related to the research article entitled ?Habitat preferences of Hericium erinaceus in Slovakia? (Kunca and ?iliak, 2016) [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status,...

  15. Diffeomorphic Iterative Centroid Methods for Template Estimation on Large Datasets

    Cury , Claire; Glaunès , Joan Alexis; Colliot , Olivier

    2014-01-01

    International audience; A common approach for analysis of anatomical variability relies on the stimation of a template representative of the population. The Large Deformation Diffeomorphic Metric Mapping is an attractive framework for that purpose. However, template estimation using LDDMM is computationally expensive, which is a limitation for the study of large datasets. This paper presents an iterative method which quickly provides a centroid of the population in the shape space. This centr...

  16. A Dataset from TIMSS to Examine the Relationship between Computer Use and Mathematics Achievement

    Kadijevich, Djordje M.

    2015-01-01

    Because the relationship between computer use and achievement is still puzzling, there is a need to prepare and analyze good quality datasets on computer use and achievement. Such a dataset can be derived from TIMSS data. This paper describes how this dataset can be prepared. It also gives an example of how the dataset may be analyzed. The…

  17. An Analysis on Better Testing than Training Performances on the Iris Dataset

    Schutten, Marten; Wiering, Marco

    2016-01-01

    The Iris dataset is a well known dataset containing information on three different types of Iris flowers. A typical and popular method for solving classification problems on datasets such as the Iris set is the support vector machine (SVM). In order to do so the dataset is separated in a set used

  18. Parton Distributions based on a Maximally Consistent Dataset

    Rojo, Juan

    2016-04-01

    The choice of data that enters a global QCD analysis can have a substantial impact on the resulting parton distributions and their predictions for collider observables. One of the main reasons for this has to do with the possible presence of inconsistencies, either internal within an experiment or external between different experiments. In order to assess the robustness of the global fit, different definitions of a conservative PDF set, that is, a PDF set based on a maximally consistent dataset, have been introduced. However, these approaches are typically affected by theory biases in the selection of the dataset. In this contribution, after a brief overview of recent NNPDF developments, we propose a new, fully objective, definition of a conservative PDF set, based on the Bayesian reweighting approach. Using the new NNPDF3.0 framework, we produce various conservative sets, which turn out to be mutually in agreement within the respective PDF uncertainties, as well as with the global fit. We explore some of their implications for LHC phenomenology, finding also good consistency with the global fit result. These results provide a non-trivial validation test of the new NNPDF3.0 fitting methodology, and indicate that possible inconsistencies in the fitted dataset do not affect substantially the global fit PDFs.

  19. New public dataset for spotting patterns in medieval document images

    En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent

    2017-01-01

    With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.

  20. Decoys Selection in Benchmarking Datasets: Overview and Perspectives

    Réau, Manon; Langenfeld, Florent; Zagury, Jean-François; Lagarde, Nathalie; Montes, Matthieu

    2018-01-01

    Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets. PMID:29416509