WorldWideScience

Sample records for evaluating geographic imputation

  1. Estimating the accuracy of geographical imputation

    Directory of Open Access Journals (Sweden)

    Boscoe Francis P

    2008-01-01

    Full Text Available Abstract Background To reduce the number of non-geocoded cases researchers and organizations sometimes include cases geocoded to postal code centroids along with cases geocoded with the greater precision of a full street address. Some analysts then use the postal code to assign information to the cases from finer-level geographies such as a census tract. Assignment is commonly completed using either a postal centroid or by a geographical imputation method which assigns a location by using both the demographic characteristics of the case and the population characteristics of the postal delivery area. To date no systematic evaluation of geographical imputation methods ("geo-imputation" has been completed. The objective of this study was to determine the accuracy of census tract assignment using geo-imputation. Methods Using a large dataset of breast, prostate and colorectal cancer cases reported to the New Jersey Cancer Registry, we determined how often cases were assigned to the correct census tract using alternate strategies of demographic based geo-imputation, and using assignments obtained from postal code centroids. Assignment accuracy was measured by comparing the tract assigned with the tract originally identified from the full street address. Results Assigning cases to census tracts using the race/ethnicity population distribution within a postal code resulted in more correctly assigned cases than when using postal code centroids. The addition of age characteristics increased the match rates even further. Match rates were highly dependent on both the geographic distribution of race/ethnicity groups and population density. Conclusion Geo-imputation appears to offer some advantages and no serious drawbacks as compared with the alternative of assigning cases to census tracts based on postal code centroids. For a specific analysis, researchers will still need to consider the potential impact of geocoding quality on their results and evaluate

  2. Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes

    Directory of Open Access Journals (Sweden)

    Puett Robin C

    2009-10-01

    Full Text Available Abstract Background There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals and at higher group-levels (e.g., spatial distribution. Methods We evaluated the accuracy of eight geo-imputation methods for address allocation from ZIP codes to census tracts at the individual and group level. The spatial apportioning approaches underlying the imputation methods included four fixed (deterministic and four random (stochastic allocation methods using land area, total population, population under age 20, and race/ethnicity as weighting factors. Data included more than 2,000 geocoded cases of diabetes mellitus among youth aged 0-19 in four U.S. regions. The imputed distribution of cases across tracts was compared to the true distribution using a chi-squared statistic. Results At the individual level, population-weighted (total or under age 20 fixed allocation showed the greatest level of accuracy, with correct census tract assignments averaging 30.01% across all regions, followed by the race/ethnicity-weighted random method (23.83%. The true distribution of cases across census tracts was that 58.2% of tracts exhibited no cases, 26.2% had one case, 9.5% had two cases, and less than 3% had three or more. This distribution was best captured by random allocation methods, with no significant differences (p-value > 0.90. However, significant differences in distributions based on fixed allocation methods were found (p-value Conclusion Fixed imputation methods seemed to yield greatest accuracy at the individual level, suggesting use for studies on area-level environmental exposures. Fixed methods result in artificial clusters in single census tracts. For studies

  3. Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

    DEFF Research Database (Denmark)

    Meseck, Kristin; Jankowska, Marta M; Schipperijn, Jasper

    2016-01-01

    The main purpose of the present study was to assess the impact of global positioning system (GPS) signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accurate...

  4. An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.

    Science.gov (United States)

    Liu, Yuzhe; Gopalakrishnan, Vanathi

    2017-03-01

    Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.

  5. A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation.

    Science.gov (United States)

    Välikangas, Tommi; Suomi, Tomi; Elo, Laura L

    2017-05-31

    Label-free mass spectrometry (MS) has developed into an important tool applied in various fields of biological and life sciences. Several software exist to process the raw MS data into quantified protein abundances, including open source and commercial solutions. Each software includes a set of unique algorithms for different tasks of the MS data processing workflow. While many of these algorithms have been compared separately, a thorough and systematic evaluation of their overall performance is missing. Moreover, systematic information is lacking about the amount of missing values produced by the different proteomics software and the capabilities of different data imputation methods to account for them.In this study, we evaluated the performance of five popular quantitative label-free proteomics software workflows using four different spike-in data sets. Our extensive testing included the number of proteins quantified and the number of missing values produced by each workflow, the accuracy of detecting differential expression and logarithmic fold change and the effect of different imputation and filtering methods on the differential expression results. We found that the Progenesis software performed consistently well in the differential expression analysis and produced few missing values. The missing values produced by the other software decreased their performance, but this difference could be mitigated using proper data filtering or imputation methods. Among the imputation methods, we found that the local least squares (lls) regression imputation consistently increased the performance of the software in the differential expression analysis, and a combination of both data filtering and local least squares imputation increased performance the most in the tested data sets. © The Author 2017. Published by Oxford University Press.

  6. Multiple imputation strategies for zero-inflated cost data in economic evaluations : which method works best?

    NARCIS (Netherlands)

    MacNeil Vroomen, Janet; Eekhout, Iris; Dijkgraaf, Marcel G; van Hout, Hein; de Rooij, Sophia E; Heymans, Martijn W; Bosmans, Judith E

    2016-01-01

    Cost and effect data often have missing data because economic evaluations are frequently added onto clinical studies where cost data are rarely the primary outcome. The objective of this article was to investigate which multiple imputation strategy is most appropriate to use for missing

  7. Missing in space: an evaluation of imputation methods for missing data in spatial analysis of risk factors for type II diabetes.

    Science.gov (United States)

    Baker, Jannah; White, Nicole; Mengersen, Kerrie

    2014-11-20

    Spatial analysis is increasingly important for identifying modifiable geographic risk factors for disease. However, spatial health data from surveys are often incomplete, ranging from missing data for only a few variables, to missing data for many variables. For spatial analyses of health outcomes, selection of an appropriate imputation method is critical in order to produce the most accurate inferences. We present a cross-validation approach to select between three imputation methods for health survey data with correlated lifestyle covariates, using as a case study, type II diabetes mellitus (DM II) risk across 71 Queensland Local Government Areas (LGAs). We compare the accuracy of mean imputation to imputation using multivariate normal and conditional autoregressive prior distributions. Choice of imputation method depends upon the application and is not necessarily the most complex method. Mean imputation was selected as the most accurate method in this application. Selecting an appropriate imputation method for health survey data, after accounting for spatial correlation and correlation between covariates, allows more complete analysis of geographic risk factors for disease with more confidence in the results to inform public policy decision-making.

  8. Evaluation and application of summary statistic imputation to discover new height-associated loci.

    Science.gov (United States)

    Rüeger, Sina; McDaid, Aaron; Kutalik, Zoltán

    2018-05-01

    As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian

  9. Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS Data.

    Directory of Open Access Journals (Sweden)

    Ariel W Chan

    Full Text Available Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS methods, such as Genotyping-By-Sequencing (GBS, offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1 can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2 are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'. We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and

  10. Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.

    Science.gov (United States)

    Chan, Ariel W; Hamblin, Martha T; Jannink, Jean-Luc

    2016-01-01

    Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the

  11. Defining, evaluating, and removing bias induced by linear imputation in longitudinal clinical trials with MNAR missing data.

    Science.gov (United States)

    Helms, Ronald W; Reece, Laura Helms; Helms, Russell W; Helms, Mary W

    2011-03-01

    Missing not at random (MNAR) post-dropout missing data from a longitudinal clinical trial result in the collection of "biased data," which leads to biased estimators and tests of corrupted hypotheses. In a full rank linear model analysis the model equation, E[Y] = Xβ, leads to the definition of the primary parameter β = (X'X)(-1)X'E[Y], and the definition of linear secondary parameters of the form θ = Lβ = L(X'X)(-1)X'E[Y], including, for example, a parameter representing a "treatment effect." These parameters depend explicitly on E[Y], which raises the questions: What is E[Y] when some elements of the incomplete random vector Y are not observed and MNAR, or when such a Y is "completed" via imputation? We develop a rigorous, readily interpretable definition of E[Y] in this context that leads directly to definitions of β, Bias(β) = E[β] - β, Bias(θ) = E[θ] - Lβ, and the extent of hypothesis corruption. These definitions provide a basis for evaluating, comparing, and removing biases induced by various linear imputation methods for MNAR incomplete data from longitudinal clinical trials. Linear imputation methods use earlier data from a subject to impute values for post-dropout missing values and include "Last Observation Carried Forward" (LOCF) and "Baseline Observation Carried Forward" (BOCF), among others. We illustrate the methods of evaluating, comparing, and removing biases and the effects of testing corresponding corrupted hypotheses via a hypothetical but very realistic longitudinal analgesic clinical trial.

  12. Using imputation to provide location information for nongeocoded addresses.

    Directory of Open Access Journals (Sweden)

    Frank C Curriero

    2010-02-01

    Full Text Available The importance of geography as a source of variation in health research continues to receive sustained attention in the literature. The inclusion of geographic information in such research often begins by adding data to a map which is predicated by some knowledge of location. A precise level of spatial information is conventionally achieved through geocoding, the geographic information system (GIS process of translating mailing address information to coordinates on a map. The geocoding process is not without its limitations, though, since there is always a percentage of addresses which cannot be converted successfully (nongeocodable. This raises concerns regarding bias since traditionally the practice has been to exclude nongeocoded data records from analysis.In this manuscript we develop and evaluate a set of imputation strategies for dealing with missing spatial information from nongeocoded addresses. The strategies are developed assuming a known zip code with increasing use of collateral information, namely the spatial distribution of the population at risk. Strategies are evaluated using prostate cancer data obtained from the Maryland Cancer Registry. We consider total case enumerations at the Census county, tract, and block group level as the outcome of interest when applying and evaluating the methods. Multiple imputation is used to provide estimated total case counts based on complete data (geocodes plus imputed nongeocodes with a measure of uncertainty. Results indicate that the imputation strategy based on using available population-based age, gender, and race information performed the best overall at the county, tract, and block group levels.The procedure allows for the potentially biased and likely under reported outcome, case enumerations based on only the geocoded records, to be presented with a statistically adjusted count (imputed count with a measure of uncertainty that are based on all the case data, the geocodes and imputed

  13. Combining Land Capability Evaluation, Geographic Information ...

    African Journals Online (AJOL)

    Combining Land Capability Evaluation, Geographic Information Systems, AnD Indigenous Technologies for Soil Conservation in Northern Ethiopia. ... Land capability and land use status were established following the procedures of a modified treatment-oriented capability classification using GIS. The case study ...

  14. Missing data imputation: focusing on single imputation.

    Science.gov (United States)

    Zhang, Zhongheng

    2016-01-01

    Complete case analysis is widely used for handling missing data, and it is the default method in many statistical packages. However, this method may introduce bias and some useful information will be omitted from analysis. Therefore, many imputation methods are developed to make gap end. The present article focuses on single imputation. Imputations with mean, median and mode are simple but, like complete case analysis, can introduce bias on mean and deviation. Furthermore, they ignore relationship with other variables. Regression imputation can preserve relationship between missing values and other variables. There are many sophisticated methods exist to handle missing values in longitudinal data. This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations.

  15. Mapping wildland fuels and forest structure for land management: a comparison of nearest neighbor imputation and other methods

    Science.gov (United States)

    Kenneth B. Pierce; Janet L. Ohmann; Michael C. Wimberly; Matthew J. Gregory; Jeremy S. Fried

    2009-01-01

    Land managers need consistent information about the geographic distribution of wildland fuels and forest structure over large areas to evaluate fire risk and plan fuel treatments. We compared spatial predictions for 12 fuel and forest structure variables across three regions in the western United States using gradient nearest neighbor (GNN) imputation, linear models (...

  16. Assessment of imputation methods using varying ecological information to fill the gaps in a tree functional trait database

    Science.gov (United States)

    Poyatos, Rafael; Sus, Oliver; Vilà-Cabrera, Albert; Vayreda, Jordi; Badiella, Llorenç; Mencuccini, Maurizio; Martínez-Vilalta, Jordi

    2016-04-01

    Plant functional traits are increasingly being used in ecosystem ecology thanks to the growing availability of large ecological databases. However, these databases usually contain a large fraction of missing data because measuring plant functional traits systematically is labour-intensive and because most databases are compilations of datasets with different sampling designs. As a result, within a given database, there is an inevitable variability in the number of traits available for each data entry and/or the species coverage in a given geographical area. The presence of missing data may severely bias trait-based analyses, such as the quantification of trait covariation or trait-environment relationships and may hamper efforts towards trait-based modelling of ecosystem biogeochemical cycles. Several data imputation (i.e. gap-filling) methods have been recently tested on compiled functional trait databases, but the performance of imputation methods applied to a functional trait database with a regular spatial sampling has not been thoroughly studied. Here, we assess the effects of data imputation on five tree functional traits (leaf biomass to sapwood area ratio, foliar nitrogen, maximum height, specific leaf area and wood density) in the Ecological and Forest Inventory of Catalonia, an extensive spatial database (covering 31900 km2). We tested the performance of species mean imputation, single imputation by the k-nearest neighbors algorithm (kNN) and a multiple imputation method, Multivariate Imputation with Chained Equations (MICE) at different levels of missing data (10%, 30%, 50%, and 80%). We also assessed the changes in imputation performance when additional predictors (species identity, climate, forest structure, spatial structure) were added in kNN and MICE imputations. We evaluated the imputed datasets using a battery of indexes describing departure from the complete dataset in trait distribution, in the mean prediction error, in the correlation matrix

  17. Multiple Imputation of Groundwater Data to Evaluate Spatial and Temporal Anthropogenic Influences on Subsurface Water Fluxes in Los Angeles, CA

    Science.gov (United States)

    Manago, K. F.; Hogue, T. S.; Hering, A. S.

    2014-12-01

    In the City of Los Angeles, groundwater accounts for 11% of the total water supply on average, and 30% during drought years. Due to ongoing drought in California, increased reliance on local water supply highlights the need for better understanding of regional groundwater dynamics and estimating sustainable groundwater supply. However, in an urban setting, such as Los Angeles, understanding or modeling groundwater levels is extremely complicated due to various anthropogenic influences such as groundwater pumping, artificial recharge, landscape irrigation, leaking infrastructure, seawater intrusion, and extensive impervious surfaces. This study analyzes anthropogenic effects on groundwater levels using groundwater monitoring well data from the County of Los Angeles Department of Public Works. The groundwater data is irregularly sampled with large gaps between samples, resulting in a sparsely populated dataset. A multiple imputation method is used to fill the missing data, allowing for multiple ensembles and improved error estimates. The filled data is interpolated to create spatial groundwater maps utilizing information from all wells. The groundwater data is evaluated at a monthly time step over the last several decades to analyze the effect of land cover and identify other influencing factors on groundwater levels spatially and temporally. Preliminary results show irrigated parks have the largest influence on groundwater fluctuations, resulting in large seasonal changes, exceeding changes in spreading grounds. It is assumed that these fluctuations are caused by watering practices required to sustain non-native vegetation. Conversely, high intensity urbanized areas resulted in muted groundwater fluctuations and behavior decoupling from climate patterns. Results provides improved understanding of anthropogenic effects on groundwater levels in addition to providing high quality datasets for validation of regional groundwater models.

  18. Avoid Filling Swiss Cheese with Whipped Cream; Imputation Techniques and Evaluation Procedures for Cross-Country Time Series

    OpenAIRE

    Michael Weber; Michaela Denk

    2011-01-01

    International organizations collect data from national authorities to create multivariate cross-sectional time series for their analyses. As data from countries with not yet well-established statistical systems may be incomplete, the bridging of data gaps is a crucial challenge. This paper investigates data structures and missing data patterns in the cross-sectional time series framework, reviews missing value imputation techniques used for micro data in official statistics, and discusses the...

  19. Gaussian mixture clustering and imputation of microarray data.

    Science.gov (United States)

    Ouyang, Ming; Welsh, William J; Georgopoulos, Panos

    2004-04-12

    In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.

  20. Objective Evaluation in an Online Geographic Information System Certificate Program

    OpenAIRE

    Scott L. WALKER

    2005-01-01

    Objective Evaluation in an Online Geographic Information System Certificate Program Asst. Professor. Dr. Scott L. WALKER Texas State University-San Marcos San Marcos, Texas, USA ABSTRACT Departmental decisions regarding distance education programs can be subject to subjective decision-making processes influenced by external factors such as strong faculty opinions or pressure to increase student enrolment. This paper outlines an evaluation of a departmental distance-education program....

  1. Public Undertakings and Imputability

    DEFF Research Database (Denmark)

    Ølykke, Grith Skovgaard

    2013-01-01

    In this article, the issue of impuability to the State of public undertakings’ decision-making is analysed and discussed in the context of the DSBFirst case. DSBFirst is owned by the independent public undertaking DSB and the private undertaking FirstGroup plc and won the contracts in the 2008...... Oeresund tender for the provision of passenger transport by railway. From the start, the services were provided at a loss, and in the end a part of DSBFirst was wound up. In order to frame the problems illustrated by this case, the jurisprudence-based imputability requirement in the definition of State aid...... in Article 107(1) TFEU is analysed. It is concluded that where the public undertaking transgresses the control system put in place by the State, conditions for imputability are not fulfilled, and it is argued that in the current state of law, there is no conditional link between the level of control...

  2. Cost reduction for web-based data imputation

    KAUST Repository

    Li, Zhixu

    2014-01-01

    Web-based Data Imputation enables the completion of incomplete data sets by retrieving absent field values from the Web. In particular, complete fields can be used as keywords in imputation queries for absent fields. However, due to the ambiguity of these keywords and the data complexity on the Web, different queries may retrieve different answers to the same absent field value. To decide the most probable right answer to each absent filed value, existing method issues quite a few available imputation queries for each absent value, and then vote on deciding the most probable right answer. As a result, we have to issue a large number of imputation queries for filling all absent values in an incomplete data set, which brings a large overhead. In this paper, we work on reducing the cost of Web-based Data Imputation in two aspects: First, we propose a query execution scheme which can secure the most probable right answer to an absent field value by issuing as few imputation queries as possible. Second, we recognize and prune queries that probably will fail to return any answers a priori. Our extensive experimental evaluation shows that our proposed techniques substantially reduce the cost of Web-based Imputation without hurting its high imputation accuracy. © 2014 Springer International Publishing Switzerland.

  3. Missing value imputation for epistatic MAPs

    LENUS (Irish Health Repository)

    Ryan, Colm

    2010-04-20

    Abstract Background Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data. Results We identify different categories for the missing data based on their underlying cause, and show that values from the largest category can be imputed effectively. We compare local and global imputation approaches across a variety of distinct E-MAP datasets, showing that both are competitive and preferable to filling in with zeros. In addition we show that these methods are effective in an E-MAP from a different species, suggesting that pairwise imputation techniques will be increasingly useful as analogous epistasis mapping techniques are developed in different species. We show that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions. Finally we show that imputed interactions, generated using nearest neighbor methods, are enriched for annotations in the same manner as measured interactions. Therefore our method potentially

  4. Double sampling with multiple imputation to answer large sample meta-research questions: Introduction and illustration by evaluating adherence to two simple CONSORT guidelines

    Directory of Open Access Journals (Sweden)

    Patrice L. Capers

    2015-03-01

    Full Text Available BACKGROUND: Meta-research can involve manual retrieval and evaluation of research, which is resource intensive. Creation of high throughput methods (e.g., search heuristics, crowdsourcing has improved feasibility of large meta-research questions, but possibly at the cost of accuracy. OBJECTIVE: To evaluate the use of double sampling combined with multiple imputation (DS+MI to address meta-research questions, using as an example adherence of PubMed entries to two simple Consolidated Standards of Reporting Trials (CONSORT guidelines for titles and abstracts. METHODS: For the DS large sample, we retrieved all PubMed entries satisfying the filters: RCT; human; abstract available; and English language (n=322,107. For the DS subsample, we randomly sampled 500 entries from the large sample. The large sample was evaluated with a lower rigor, higher throughput (RLOTHI method using search heuristics, while the subsample was evaluated using a higher rigor, lower throughput (RHITLO human rating method. Multiple imputation of the missing-completely-at-random RHITLO data for the large sample was informed by: RHITLO data from the subsample; RLOTHI data from the large sample; whether a study was an RCT; and country and year of publication. RESULTS: The RHITLO and RLOTHI methods in the subsample largely agreed (phi coefficients: title=1.00, abstract=0.92. Compliance with abstract and title criteria has increased over time, with non-US countries improving more rapidly. DS+MI logistic regression estimates were more precise than subsample estimates (e.g., 95% CI for change in title and abstract compliance by Year: subsample RHITLO 1.050-1.174 vs. DS+MI 1.082-1.151. As evidence of improved accuracy, DS+MI coefficient estimates were closer to RHITLO than the large sample RLOTHI. CONCLUSIONS: Our results support our hypothesis that DS+MI would result in improved precision and accuracy. This method is flexible and may provide a practical way to examine large corpora of

  5. Objective Evaluation in an Online Geographic Information System Certificate Program

    Directory of Open Access Journals (Sweden)

    Scott L. WALKER

    2005-01-01

    Full Text Available Objective Evaluation in an Online Geographic Information System Certificate Program Asst. Professor. Dr. Scott L. WALKER Texas State University-San Marcos San Marcos, Texas, USA ABSTRACT Departmental decisions regarding distance education programs can be subject to subjective decision-making processes influenced by external factors such as strong faculty opinions or pressure to increase student enrolment. This paper outlines an evaluation of a departmental distance-education program. The evaluation utilized several methods that strived to inject objectivity in evaluation and subsequent decision-making. A rapid multi-modal approach included evaluation methods of (1 considering the online psychosocial learning environment, (2 content analyses comparing the online version of classes to face-to-face versions, (3 cost comparisons in online vs. face-to-face classes, (4 student outcomes, (5 student retention, and (6 benchmarking. These approaches offer opportunities for departmental administrators and decision-making committees to make judgments informed by facts rather than being influenced by the emotions, beliefs, or opinions of organizational dynamics.

  6. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel.

    Science.gov (United States)

    Mitt, Mario; Kals, Mart; Pärn, Kalle; Gabriel, Stacey B; Lander, Eric S; Palotie, Aarno; Ripatti, Samuli; Morris, Andrew P; Metspalu, Andres; Esko, Tõnu; Mägi, Reedik; Palta, Priit

    2017-06-01

    Genetic imputation is a cost-efficient way to improve the power and resolution of genome-wide association (GWA) studies. Current publicly accessible imputation reference panels accurately predict genotypes for common variants with minor allele frequency (MAF)≥5% and low-frequency variants (0.5≤MAF<5%) across diverse populations, but the imputation of rare variation (MAF<0.5%) is still rather limited. In the current study, we evaluate imputation accuracy achieved with reference panels from diverse populations with a population-specific high-coverage (30 ×) whole-genome sequencing (WGS) based reference panel, comprising of 2244 Estonian individuals (0.25% of adult Estonians). Although the Estonian-specific panel contains fewer haplotypes and variants, the imputation confidence and accuracy of imputed low-frequency and rare variants was significantly higher. The results indicate the utility of population-specific reference panels for human genetic studies.

  7. GIS in Evaluation: Utilizing the Power of Geographic Information Systems to Represent Evaluation Data

    Science.gov (United States)

    Azzam, Tarek; Robinson, David

    2013-01-01

    This article provides an introduction to geographic information systems (GIS) and how the technology can be used to enhance evaluation practice. As a tool, GIS enables evaluators to incorporate contextual features (such as accessibility of program sites or community health needs) into evaluation designs and highlights the interactions between…

  8. Multiply-Imputed Synthetic Data: Advice to the Imputer

    Directory of Open Access Journals (Sweden)

    Loong Bronwyn

    2017-12-01

    Full Text Available Several statistical agencies have started to use multiply-imputed synthetic microdata to create public-use data in major surveys. The purpose of doing this is to protect the confidentiality of respondents’ identities and sensitive attributes, while allowing standard complete-data analyses of microdata. A key challenge, faced by advocates of synthetic data, is demonstrating that valid statistical inferences can be obtained from such synthetic data for non-confidential questions. Large discrepancies between observed-data and synthetic-data analytic results for such questions may arise because of uncongeniality; that is, differences in the types of inputs available to the imputer, who has access to the actual data, and to the analyst, who has access only to the synthetic data. Here, we discuss a simple, but possibly canonical, example of uncongeniality when using multiple imputation to create synthetic data, which specifically addresses the choices made by the imputer. An initial, unanticipated but not surprising, conclusion is that non-confidential design information used to impute synthetic data should be released with the confidential synthetic data to allow users of synthetic data to avoid possible grossly conservative inferences.

  9. Accounting for one-channel depletion improves missing value imputation in 2-dye microarray data.

    Science.gov (United States)

    Ritz, Cecilia; Edén, Patrik

    2008-01-19

    For 2-dye microarray platforms, some missing values may arise from an un-measurably low RNA expression in one channel only. Information of such "one-channel depletion" is so far not included in algorithms for imputation of missing values. Calculating the mean deviation between imputed values and duplicate controls in five datasets, we show that KNN-based imputation gives a systematic bias of the imputed expression values of one-channel depleted spots. Evaluating the correction of this bias by cross-validation showed that the mean square deviation between imputed values and duplicates were reduced up to 51%, depending on dataset. By including more information in the imputation step, we more accurately estimate missing expression values.

  10. Multiple imputation and its application

    CERN Document Server

    Carpenter, James

    2013-01-01

    A practical guide to analysing partially observed data. Collecting, analysing and drawing inferences from data is central to research in the medical and social sciences. Unfortunately, it is rarely possible to collect all the intended data. The literature on inference from the resulting incomplete  data is now huge, and continues to grow both as methods are developed for large and complex data structures, and as increasing computer power and suitable software enable researchers to apply these methods. This book focuses on a particular statistical method for analysing and drawing inferences from incomplete data, called Multiple Imputation (MI). MI is attractive because it is both practical and widely applicable. The authors aim is to clarify the issues raised by missing data, describing the rationale for MI, the relationship between the various imputation models and associated algorithms and its application to increasingly complex data structures. Multiple Imputation and its Application: Discusses the issues ...

  11. Flexible Imputation of Missing Data

    CERN Document Server

    van Buuren, Stef

    2012-01-01

    Missing data form a problem in every scientific discipline, yet the techniques required to handle them are complicated and often lacking. One of the great ideas in statistical science--multiple imputation--fills gaps in the data with plausible values, the uncertainty of which is coded in the data itself. It also solves other problems, many of which are missing data problems in disguise. Flexible Imputation of Missing Data is supported by many examples using real data taken from the author's vast experience of collaborative research, and presents a practical guide for handling missing data unde

  12. Imputation and quality control steps for combining multiple genome-wide datasets

    Directory of Open Access Journals (Sweden)

    Shefali S Verma

    2014-12-01

    Full Text Available The electronic MEdical Records and GEnomics (eMERGE network brings together DNA biobanks linked to electronic health records (EHRs from multiple institutions. Approximately 52,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes, and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2 were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.

  13. A spatial haplotype copying model with applications to genotype imputation.

    Science.gov (United States)

    Yang, Wen-Yun; Hormozdiari, Farhad; Eskin, Eleazar; Pasaniuc, Bogdan

    2015-05-01

    Ever since its introduction, the haplotype copy model has proven to be one of the most successful approaches for modeling genetic variation in human populations, with applications ranging from ancestry inference to genotype phasing and imputation. Motivated by coalescent theory, this approach assumes that any chromosome (haplotype) can be modeled as a mosaic of segments copied from a set of chromosomes sampled from the same population. At the core of the model is the assumption that any chromosome from the sample is equally likely to contribute a priori to the copying process. Motivated by recent works that model genetic variation in a geographic continuum, we propose a new spatial-aware haplotype copy model that jointly models geography and the haplotype copying process. We extend hidden Markov models of haplotype diversity such that at any given location, haplotypes that are closest in the genetic-geographic continuum map are a priori more likely to contribute to the copying process than distant ones. Through simulations starting from the 1000 Genomes data, we show that our model achieves superior accuracy in genotype imputation over the standard spatial-unaware haplotype copy model. In addition, we show the utility of our model in selecting a small personalized reference panel for imputation that leads to both improved accuracy as well as to a lower computational runtime than the standard approach. Finally, we show our proposed model can be used to localize individuals on the genetic-geographical map on the basis of their genotype data.

  14. R package imputeTestbench to compare imputations methods for univariate time series

    OpenAIRE

    Bokde, Neeraj; Kulat, Kishore; Beck, Marcus W; Asencio-Cortés, Gualberto

    2016-01-01

    This paper describes the R package imputeTestbench that provides a testbench for comparing imputation methods for missing data in univariate time series. The imputeTestbench package can be used to simulate the amount and type of missing data in a complete dataset and compare filled data using different imputation methods. The user has the option to simulate missing data by removing observations completely at random or in blocks of different sizes. Several default imputation methods are includ...

  15. Highly accurate sequence imputation enables precise QTL mapping in Brown Swiss cattle.

    Science.gov (United States)

    Frischknecht, Mirjam; Pausch, Hubert; Bapst, Beat; Signer-Hasler, Heidi; Flury, Christine; Garrick, Dorian; Stricker, Christian; Fries, Ruedi; Gredler-Grandl, Birgit

    2017-12-29

    Within the last few years a large amount of genomic information has become available in cattle. Densities of genomic information vary from a few thousand variants up to whole genome sequence information. In order to combine genomic information from different sources and infer genotypes for a common set of variants, genotype imputation is required. In this study we evaluated the accuracy of imputation from high density chips to whole genome sequence data in Brown Swiss cattle. Using four popular imputation programs (Beagle, FImpute, Impute2, Minimac) and various compositions of reference panels, the accuracy of the imputed sequence variant genotypes was high and differences between the programs and scenarios were small. We imputed sequence variant genotypes for more than 1600 Brown Swiss bulls and performed genome-wide association studies for milk fat percentage at two stages of lactation. We found one and three quantitative trait loci for early and late lactation fat content, respectively. Known causal variants that were imputed from the sequenced reference panel were among the most significantly associated variants of the genome-wide association study. Our study demonstrates that whole-genome sequence information can be imputed at high accuracy in cattle populations. Using imputed sequence variant genotypes in genome-wide association studies may facilitate causal variant detection.

  16. The Ability of Different Imputation Methods to Preserve the Significant Genes and Pathways in Cancer

    Directory of Open Access Journals (Sweden)

    Rosa Aghdam

    2017-12-01

    Full Text Available Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving significant genes and pathways. In the first step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next, 10 well-known imputation methods were applied to the complete datasets. The significance analysis of microarrays (SAM method was applied to detect the significant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving significant genes. To determine the impact of different imputation methods on the identification of important genes, the chi-squared test was used to compare the proportions of overlaps between significant genes detected from original data and those detected from the imputed datasets. Additionally, the significant genes are tested for their enrichment in important pathways, using the ConsensusPathDB. Our results showed that almost all the significant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no significant difference in the performance of various imputation methods tested. The source code and selected datasets are available on http://profiles.bs.ipm.ir/softwares/imputation_methods/.

  17. The Ability of Different Imputation Methods to Preserve the Significant Genes and Pathways in Cancer.

    Science.gov (United States)

    Aghdam, Rosa; Baghfalaki, Taban; Khosravi, Pegah; Saberi Ansari, Elnaz

    2017-12-01

    Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving significant genes and pathways. In the first step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next, 10 well-known imputation methods were applied to the complete datasets. The significance analysis of microarrays (SAM) method was applied to detect the significant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving significant genes. To determine the impact of different imputation methods on the identification of important genes, the chi-squared test was used to compare the proportions of overlaps between significant genes detected from original data and those detected from the imputed datasets. Additionally, the significant genes are tested for their enrichment in important pathways, using the ConsensusPathDB. Our results showed that almost all the significant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no significant difference in the performance of various imputation methods tested. The source code and selected datasets are available on http://profiles.bs.ipm.ir/softwares/imputation_methods/. Copyright © 2017. Production and hosting by Elsevier B.V.

  18. Missing value imputation: with application to handwriting data

    Science.gov (United States)

    Xu, Zhen; Srihari, Sargur N.

    2015-01-01

    Missing values make pattern analysis difficult, particularly with limited available data. In longitudinal research, missing values accumulate, thereby aggravating the problem. Here we consider how to deal with temporal data with missing values in handwriting analysis. In the task of studying development of individuality of handwriting, we encountered the fact that feature values are missing for several individuals at several time instances. Six algorithms, i.e., random imputation, mean imputation, most likely independent value imputation, and three methods based on Bayesian network (static Bayesian network, parameter EM, and structural EM), are compared with children's handwriting data. We evaluate the accuracy and robustness of the algorithms under different ratios of missing data and missing values, and useful conclusions are given. Specifically, static Bayesian network is used for our data which contain around 5% missing data to provide adequate accuracy and low computational cost.

  19. Geographic variation in lumbar diskectomy: a protocol for evaluation.

    Science.gov (United States)

    Barron, M; Kazandjian, V A

    1992-03-01

    In 1989 the Maryland Hospital Association (MHA) began developing a protocol related to lumbar diskectomy, a procedure with widely reported geographic variation in its use. The MHA's Laminectomy Advisory Committee drafted three criteria for performance of lumbar diskectomy and also developed a data-collection instrument with which the eight hospitals participating in a pilot study could abstract the necessary data from medical records. Both individual hospital and aggregate results showed wide variation in compliance with the criteria. These findings suggest research and development activities such as refinement of the data-collection instrument, use of the protocol for bench-marking, further investigation of clinical and other determinants of rate variation, and study of the effect of new diagnostic technology on utilization rates for this procedure.

  20. Data imputation analysis for Cosmic Rays time series

    Science.gov (United States)

    Fernandes, R. C.; Lucio, P. S.; Fernandez, J. H.

    2017-05-01

    The occurrence of missing data concerning Galactic Cosmic Rays time series (GCR) is inevitable since loss of data is due to mechanical and human failure or technical problems and different periods of operation of GCR stations. The aim of this study was to perform multiple dataset imputation in order to depict the observational dataset. The study has used the monthly time series of GCR Climax (CLMX) and Roma (ROME) from 1960 to 2004 to simulate scenarios of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of missing data compared to observed ROME series, with 50 replicates. Then, the CLMX station as a proxy for allocation of these scenarios was used. Three different methods for monthly dataset imputation were selected: AMÉLIA II - runs the bootstrap Expectation Maximization algorithm, MICE - runs an algorithm via Multivariate Imputation by Chained Equations and MTSDI - an Expectation Maximization algorithm-based method for imputation of missing values in multivariate normal time series. The synthetic time series compared with the observed ROME series has also been evaluated using several skill measures as such as RMSE, NRMSE, Agreement Index, R, R2, F-test and t-test. The results showed that for CLMX and ROME, the R2 and R statistics were equal to 0.98 and 0.96, respectively. It was observed that increases in the number of gaps generate loss of quality of the time series. Data imputation was more efficient with MTSDI method, with negligible errors and best skill coefficients. The results suggest a limit of about 60% of missing data for imputation, for monthly averages, no more than this. It is noteworthy that CLMX, ROME and KIEL stations present no missing data in the target period. This methodology allowed reconstructing 43 time series.

  1. Evaluating the accuracy and effectiveness of criminal geographic profiling methods: The case of Dandora, Kenya

    NARCIS (Netherlands)

    Mburu, L; Helbich, M

    2015-01-01

    Criminal geographic profiling (CGP) prioritizes offender search, extensively reducing the resources expended in criminal investigations. The utility of CGP has, however, remained unclear when variations in environmental characteristics and offense type are introduced. This study evaluates several

  2. 3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data.

    Science.gov (United States)

    Luo, Yuan; Szolovits, Peter; Dighe, Anand S; Baron, Jason M

    2018-06-01

    A key challenge in clinical data mining is that most clinical datasets contain missing data. Since many commonly used machine learning algorithms require complete datasets (no missing data), clinical analytic approaches often entail an imputation procedure to "fill in" missing data. However, although most clinical datasets contain a temporal component, most commonly used imputation methods do not adequately accommodate longitudinal time-based data. We sought to develop a new imputation algorithm, 3-dimensional multiple imputation with chained equations (3D-MICE), that can perform accurate imputation of missing clinical time series data. We extracted clinical laboratory test results for 13 commonly measured analytes (clinical laboratory tests). We imputed missing test results for the 13 analytes using 3 imputation methods: multiple imputation with chained equations (MICE), Gaussian process (GP), and 3D-MICE. 3D-MICE utilizes both MICE and GP imputation to integrate cross-sectional and longitudinal information. To evaluate imputation method performance, we randomly masked selected test results and imputed these masked results alongside results missing from our original data. We compared predicted results to measured results for masked data points. 3D-MICE performed significantly better than MICE and GP-based imputation in a composite of all 13 analytes, predicting missing results with a normalized root-mean-square error of 0.342, compared to 0.373 for MICE alone and 0.358 for GP alone. 3D-MICE offers a novel and practical approach to imputing clinical laboratory time series data. 3D-MICE may provide an additional tool for use as a foundation in clinical predictive analytics and intelligent clinical decision support.

  3. Missing data imputation using statistical and machine learning methods in a real breast cancer problem.

    Science.gov (United States)

    Jerez, José M; Molina, Ignacio; García-Laencina, Pedro J; Alba, Emilio; Ribelles, Nuria; Martín, Miguel; Franco, Leonardo

    2010-10-01

    Missing data imputation is an important task in cases where it is crucial to use all available data and not discard records with missing values. This work evaluates the performance of several statistical and machine learning imputation methods that were used to predict recurrence in patients in an extensive real breast cancer data set. Imputation methods based on statistical techniques, e.g., mean, hot-deck and multiple imputation, and machine learning techniques, e.g., multi-layer perceptron (MLP), self-organisation maps (SOM) and k-nearest neighbour (KNN), were applied to data collected through the "El Álamo-I" project, and the results were then compared to those obtained from the listwise deletion (LD) imputation method. The database includes demographic, therapeutic and recurrence-survival information from 3679 women with operable invasive breast cancer diagnosed in 32 different hospitals belonging to the Spanish Breast Cancer Research Group (GEICAM). The accuracies of predictions on early cancer relapse were measured using artificial neural networks (ANNs), in which different ANNs were estimated using the data sets with imputed missing values. The imputation methods based on machine learning algorithms outperformed imputation statistical methods in the prediction of patient outcome. Friedman's test revealed a significant difference (p=0.0091) in the observed area under the ROC curve (AUC) values, and the pairwise comparison test showed that the AUCs for MLP, KNN and SOM were significantly higher (p=0.0053, p=0.0048 and p=0.0071, respectively) than the AUC from the LD-based prognosis model. The methods based on machine learning techniques were the most suited for the imputation of missing values and led to a significant enhancement of prognosis accuracy compared to imputation methods based on statistical procedures. Copyright © 2010 Elsevier B.V. All rights reserved.

  4. Evaluation of National Geographic School Publishing Nonfiction Literacy Materials. Summary Report.

    Science.gov (United States)

    Metcalf, Kim K.; Smith, Carl B.; Legan, Natalie A.

    During the 2001-02 academic year, a purposive, national evaluation was undertaken of "Windows on Literacy" and "Reading Expeditions," two new school-based programs produced by the School Publishing Division of the National Geographic Society (NGS). The evaluation sought to determine the efficacy of the new materials for…

  5. Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels

    Directory of Open Access Journals (Sweden)

    Xiaoyi eGao

    2012-06-01

    Full Text Available Genotype imputation is a vital tool in genome-wide association studies (GWAS and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for reference panels. Multiple GWAS and meta-analyses are targeting Latinos, the most populous and fastest growing minority group in the US. However, genotype imputation resources for Latinos are rather limited compared to individuals of European ancestry at present, largely because of the lack of good reference data. One choice of reference panel for Latinos is one derived from the population of Mexican individuals in Los Angeles contained in the HapMap Phase 3 project and the 1000 Genomes Project. However, a detailed evaluation of the quality of the imputed genotypes derived from the public reference panels has not yet been reported. Using simulation studies, the Illumina OmniExpress GWAS data from the Los Angles Latino Eye Study and the MACH software package, we evaluated the accuracy of genotype imputation in Latinos. Our results show that the 1000 Genomes Project AMR+CEU+YRI reference panel provides the highest imputation accuracy for Latinos, and that also including Asian samples in the panel can reduce imputation accuracy. We also provide the imputation accuracy for each autosomal chromosome using the 1000 Genomes Project panel for Latinos. Our results serve as a guide to future imputation-based analysis in Latinos.

  6. An Imputation Model for Dropouts in Unemployment Data

    Directory of Open Access Journals (Sweden)

    Nilsson Petra

    2016-09-01

    Full Text Available Incomplete unemployment data is a fundamental problem when evaluating labour market policies in several countries. Many unemployment spells end for unknown reasons; in the Swedish Public Employment Service’s register as many as 20 percent. This leads to an ambiguity regarding destination states (employment, unemployment, retired, etc.. According to complete combined administrative data, the employment rate among dropouts was close to 50 for the years 1992 to 2006, but from 2007 the employment rate has dropped to 40 or less. This article explores an imputation approach. We investigate imputation models estimated both on survey data from 2005/2006 and on complete combined administrative data from 2005/2006 and 2011/2012. The models are evaluated in terms of their ability to make correct predictions. The models have relatively high predictive power.

  7. Imputation methods for filling missing data in urban air pollution data for Malaysia

    Directory of Open Access Journals (Sweden)

    Nur Afiqah Zakaria

    2018-06-01

    Full Text Available The air quality measurement data obtained from the continuous ambient air quality monitoring (CAAQM station usually contained missing data. The missing observations of the data usually occurred due to machine failure, routine maintenance and human error. In this study, the hourly monitoring data of CO, O3, PM10, SO2, NOx, NO2, ambient temperature and humidity were used to evaluate four imputation methods (Mean Top Bottom, Linear Regression, Multiple Imputation and Nearest Neighbour. The air pollutants observations were simulated into four percentages of simulated missing data i.e. 5%, 10%, 15% and 20%. Performance measures namely the Mean Absolute Error, Root Mean Squared Error, Coefficient of Determination and Index of Agreement were used to describe the goodness of fit of the imputation methods. From the results of the performance measures, Mean Top Bottom method was selected as the most appropriate imputation method for filling in the missing values in air pollutants data.

  8. Multi criteria evaluation for universal soil loss equation based on geographic information system

    Science.gov (United States)

    Purwaamijaya, I. M.

    2018-05-01

    The purpose of this research were to produce(l) a conceptual, functional model designed and implementation for universal soil loss equation (usle), (2) standard operational procedure for multi criteria evaluation of universal soil loss equation (usle) using geographic information system, (3) overlay land cover, slope, soil and rain fall layers to gain universal soil loss equation (usle) using multi criteria evaluation, (4) thematic map of universal soil loss equation (usle) in watershed, (5) attribute table of universal soil loss equation (usle) in watershed. Descriptive and formal correlation methods are used for this research. Cikapundung Watershed, Bandung, West Java, Indonesia was study location. This research was conducted on January 2016 to May 2016. A spatial analysis is used to superimposed land cover, slope, soil and rain layers become universal soil loss equation (usle). Multi criteria evaluation for universal soil loss equation (usle) using geographic information system could be used for conservation program.

  9. PlanetLab Europe as Geographically-Distributed Testbed for Software Development and Evaluation

    Directory of Open Access Journals (Sweden)

    Dan Komosny

    2015-01-01

    Full Text Available In this paper, we analyse the use of PlanetLab Europe for development and evaluation of geographically-oriented Internet services. PlanetLab is a global research network with the main purpose to support development of new Internet services and protocols. PlanetLab is divided into several branches; one of them is PlanetLab Europe. PlanetLab Europe consists of about 350 nodes at 150 geographically different sites. The nodes are accessible by remote login, and the users can run their software on the nodes. In the paper, we study the PlanetLab's properties that are significant for its use as a geographically distributed testbed. This includes node position accuracy, services availability and stability. We find a considerable number of location inaccuracies and a number of services that cannot be considered as reliable. Based on the results we propose a simple approach to nodes selection in testbeds for geographically-oriented Internet services development and evaluation.

  10. Data driven estimation of imputation error-a strategy for imputation with a reject option

    DEFF Research Database (Denmark)

    Bak, Nikolaj; Hansen, Lars Kai

    2016-01-01

    Missing data is a common problem in many research fields and is a challenge that always needs careful considerations. One approach is to impute the missing values, i.e., replace missing values with estimates. When imputation is applied, it is typically applied to all records with missing values i...

  11. Improving accuracy of rare variant imputation with a two-step imputation approach

    DEFF Research Database (Denmark)

    Kreiner-Møller, Eskil; Medina-Gomez, Carolina; Uitterlinden, André G

    2015-01-01

    not being comprehensively scrutinized. Next-generation arrays ensuring sufficient coverage together with new reference panels, as the 1000 Genomes panel, are emerging to facilitate imputation of low frequent single-nucleotide polymorphisms (minor allele frequency (MAF) ... reference sample genotyped on a dense array and hereafter to the 1000 Genomes reference panel. We show that mean imputation quality, measured by the r(2) using this approach, increases by 28% for variants with a MAF between 1 and 5% as compared with direct imputation to 1000 Genomes reference. Similarly......Genotype imputation has been the pillar of the success of genome-wide association studies (GWAS) for identifying common variants associated with common diseases. However, most GWAS have been run using only 60 HapMap samples as reference for imputation, meaning less frequent and rare variants...

  12. Geographic exposure risk of variant Creutzfeldt-Jakob disease in US blood donors: a risk-ranking model to evaluate alternative donor-deferral policies.

    Science.gov (United States)

    Yang, Hong; Huang, Yin; Gregori, Luisa; Asher, David M; Bui, Travis; Forshee, Richard A; Anderson, Steven A

    2017-04-01

    Variant Creutzfeldt-Jakob disease (vCJD) has been transmitted by blood transfusion (TTvCJD). The US Food and Drug Administration (FDA) recommends deferring blood donors who resided in or traveled to 30 European countries where they may have been exposed to bovine spongiform encephalopathy (BSE) through beef consumption. Those recommendations warrant re-evaluation, because new cases of BSE and vCJD have markedly abated. The FDA developed a risk-ranking model to calculate the geographic vCJD risk using country-specific case rates and person-years of exposure of US blood donors. We used the reported country vCJD case rates, when available, or imputed vCJD case rates from reported BSE and UK beef exports during the risk period. We estimated the risk reduction and donor loss should the deferral be restricted to a few high-risk countries. We also estimated additional risk reduction by leukocyte reduction (LR) of red blood cells (RBCs). The United Kingdom, Ireland, and France had the greatest vCJD risk, contributing approximately 95% of the total risk. The model estimated that deferring US donors who spent extended periods of time in these three countries, combined with currently voluntary LR (95% of RBC units), would reduce the vCJD risk by 89.3%, a reduction similar to that achieved under the current policy (89.8%). Limiting deferrals to exposure in these three countries would potentially allow donations from an additional 100,000 donors who are currently deferred. Our analysis suggests that a deferral option focusing on the three highest risk countries would achieve a level of blood safety similar to that achieved by the current policy. © 2016 AABB.

  13. A nonparametric multiple imputation approach for missing categorical data

    Directory of Open Access Journals (Sweden)

    Muhan Zhou

    2017-06-01

    Full Text Available Abstract Background Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness probabilities. Methods We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model and the other fits a logistic regression for predicting missingness probabilities (the missingness model. A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. Results The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. Conclusions We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with

  14. Cost reduction for web-based data imputation

    KAUST Repository

    Li, Zhixu; Shang, Shuo; Xie, Qing; Zhang, Xiangliang

    2014-01-01

    Web-based Data Imputation enables the completion of incomplete data sets by retrieving absent field values from the Web. In particular, complete fields can be used as keywords in imputation queries for absent fields. However, due to the ambiguity

  15. Comparison of missing value imputation methods in time series: the case of Turkish meteorological data

    Science.gov (United States)

    Yozgatligil, Ceylan; Aslan, Sipan; Iyigun, Cem; Batmaz, Inci

    2013-04-01

    This study aims to compare several imputation methods to complete the missing values of spatio-temporal meteorological time series. To this end, six imputation methods are assessed with respect to various criteria including accuracy, robustness, precision, and efficiency for artificially created missing data in monthly total precipitation and mean temperature series obtained from the Turkish State Meteorological Service. Of these methods, simple arithmetic average, normal ratio (NR), and NR weighted with correlations comprise the simple ones, whereas multilayer perceptron type neural network and multiple imputation strategy adopted by Monte Carlo Markov Chain based on expectation-maximization (EM-MCMC) are computationally intensive ones. In addition, we propose a modification on the EM-MCMC method. Besides using a conventional accuracy measure based on squared errors, we also suggest the correlation dimension (CD) technique of nonlinear dynamic time series analysis which takes spatio-temporal dependencies into account for evaluating imputation performances. Depending on the detailed graphical and quantitative analysis, it can be said that although computational methods, particularly EM-MCMC method, are computationally inefficient, they seem favorable for imputation of meteorological time series with respect to different missingness periods considering both measures and both series studied. To conclude, using the EM-MCMC algorithm for imputing missing values before conducting any statistical analyses of meteorological data will definitely decrease the amount of uncertainty and give more robust results. Moreover, the CD measure can be suggested for the performance evaluation of missing data imputation particularly with computational methods since it gives more precise results in meteorological time series.

  16. Multi-generational imputation of single nucleotide polymorphism marker genotypes and accuracy of genomic selection.

    Science.gov (United States)

    Toghiani, S; Aggrey, S E; Rekaya, R

    2016-07-01

    Availability of high-density single nucleotide polymorphism (SNP) genotyping platforms provided unprecedented opportunities to enhance breeding programmes in livestock, poultry and plant species, and to better understand the genetic basis of complex traits. Using this genomic information, genomic breeding values (GEBVs), which are more accurate than conventional breeding values. The superiority of genomic selection is possible only when high-density SNP panels are used to track genes and QTLs affecting the trait. Unfortunately, even with the continuous decrease in genotyping costs, only a small fraction of the population has been genotyped with these high-density panels. It is often the case that a larger portion of the population is genotyped with low-density and low-cost SNP panels and then imputed to a higher density. Accuracy of SNP genotype imputation tends to be high when minimum requirements are met. Nevertheless, a certain rate of genotype imputation errors is unavoidable. Thus, it is reasonable to assume that the accuracy of GEBVs will be affected by imputation errors; especially, their cumulative effects over time. To evaluate the impact of multi-generational selection on the accuracy of SNP genotypes imputation and the reliability of resulting GEBVs, a simulation was carried out under varying updating of the reference population, distance between the reference and testing sets, and the approach used for the estimation of GEBVs. Using fixed reference populations, imputation accuracy decayed by about 0.5% per generation. In fact, after 25 generations, the accuracy was only 7% lower than the first generation. When the reference population was updated by either 1% or 5% of the top animals in the previous generations, decay of imputation accuracy was substantially reduced. These results indicate that low-density panels are useful, especially when the generational interval between reference and testing population is small. As the generational interval

  17. Fully conditional specification in multivariate imputation

    NARCIS (Netherlands)

    van Buuren, S.; Brand, J. P.L.; Groothuis-Oudshoorn, C. G.M.; Rubin, D. B.

    2006-01-01

    The use of the Gibbs sampler with fully conditionally specified models, where the distribution of each variable given the other variables is the starting point, has become a popular method to create imputations in incomplete multivariate data. The theoretical weakness of this approach is that the

  18. [Ecosystem services evaluation based on geographic information system and remote sensing technology: a review].

    Science.gov (United States)

    Li, Wen-Jie; Zhang, Shi-Huang; Wang, Hui-Min

    2011-12-01

    Ecosystem services evaluation is a hot topic in current ecosystem management, and has a close link with human beings welfare. This paper summarized the research progress on the evaluation of ecosystem services based on geographic information system (GIS) and remote sensing (RS) technology, which could be reduced to the following three characters, i. e., ecological economics theory is widely applied as a key method in quantifying ecosystem services, GIS and RS technology play a key role in multi-source data acquisition, spatiotemporal analysis, and integrated platform, and ecosystem mechanism model becomes a powerful tool for understanding the relationships between natural phenomena and human activities. Aiming at the present research status and its inadequacies, this paper put forward an "Assembly Line" framework, which was a distributed one with scalable characteristics, and discussed the future development trend of the integration research on ecosystem services evaluation based on GIS and RS technologies.

  19. Combining geographic information system, multicriteria evaluation techniques and fuzzy logic in siting MSW landfills

    Science.gov (United States)

    Gemitzi, Alexandra; Tsihrintzis, Vassilios A.; Voudrias, Evangelos; Petalas, Christos; Stravodimos, George

    2007-01-01

    This study presents a methodology for siting municipal solid waste landfills, coupling geographic information systems (GIS), fuzzy logic, and multicriteria evaluation techniques. Both exclusionary and non-exclusionary criteria are used. Factors, i.e., non-exclusionary criteria, are divided in two distinct groups which do not have the same level of trade off. The first group comprises factors related to the physical environment, which cannot be expressed in terms of monetary cost and, therefore, they do not easily trade off. The second group includes those factors related to human activities, i.e., socioeconomic factors, which can be expressed as financial cost, thus showing a high level of trade off. GIS are used for geographic data acquisition and processing. The analytical hierarchy process (AHP) is the multicriteria evaluation technique used, enhanced with fuzzy factor standardization. Besides assigning weights to factors through the AHP, control over the level of risk and trade off in the siting process is achieved through a second set of weights, i.e., order weights, applied to factors in each factor group, on a pixel-by-pixel basis, thus taking into account the local site characteristics. The method has been applied to Evros prefecture (NE Greece), an area of approximately 4,000 km2. The siting methodology results in two intermediate suitability maps, one related to environmental and the other to socioeconomic criteria. Combination of the two intermediate maps results in the final composite suitability map for landfill siting.

  20. GIS-Based Evaluation of Spatial Interactions by Geographic Disproportionality of Industrial Diversity

    Directory of Open Access Journals (Sweden)

    Jemyung Lee

    2017-11-01

    Full Text Available Diversity of regional industry is regarded as a key factor for regional development, as it has a positive relationship with economic stability, which attracts population. This paper focuses on how the spatial imbalance of industrial diversity contributes to the population change caused by inter-regional migration. This paper introduces a spatial interaction model for the Geographic Information System (GIS-based simulation of the spatial interactions to evaluate the demographic attraction force. The proposed model adopts the notions of gravity, entropy, and virtual work. An industrial classification by profit level is introduced and its diversity is quantified with the entropy of information theory. The introduced model is applied to the cases of 207 regions in South Korea. Spatial interactions are simulated with an optimized model and their resultant forces, the demographic attraction forces, are compared with observed net migration for verification. The results show that the evaluated attraction forces from industrial diversity have a very significant, positive, and moderate relationship with net migration, while other conventional factors of industry, population, economy, and the job market do not. This paper concludes that the geographical quality of industrial diversity has positive and significant effects on population change by migration.

  1. LinkImputeR: user-guided genotype calling and imputation for non-model organisms.

    Science.gov (United States)

    Money, Daniel; Migicovsky, Zoë; Gardner, Kyle; Myles, Sean

    2017-07-10

    Genomic studies such as genome-wide association and genomic selection require genome-wide genotype data. All existing technologies used to create these data result in missing genotypes, which are often then inferred using genotype imputation software. However, existing imputation methods most often make use only of genotypes that are successfully inferred after having passed a certain read depth threshold. Because of this, any read information for genotypes that did not pass the threshold, and were thus set to missing, is ignored. Most genomic studies also choose read depth thresholds and quality filters without investigating their effects on the size and quality of the resulting genotype data. Moreover, almost all genotype imputation methods require ordered markers and are therefore of limited utility in non-model organisms. Here we introduce LinkImputeR, a software program that exploits the read count information that is normally ignored, and makes use of all available DNA sequence information for the purposes of genotype calling and imputation. It is specifically designed for non-model organisms since it requires neither ordered markers nor a reference panel of genotypes. Using next-generation DNA sequence (NGS) data from apple, cannabis and grape, we quantify the effect of varying read count and missingness thresholds on the quantity and quality of genotypes generated from LinkImputeR. We demonstrate that LinkImputeR can increase the number of genotype calls by more than an order of magnitude, can improve genotyping accuracy by several percent and can thus improve the power of downstream analyses. Moreover, we show that the effects of quality and read depth filters can differ substantially between data sets and should therefore be investigated on a per-study basis. By exploiting DNA sequence data that is normally ignored during genotype calling and imputation, LinkImputeR can significantly improve both the quantity and quality of genotype data generated from

  2. Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.

    Science.gov (United States)

    Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A

    2016-01-01

    Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.

  3. VIGAN: Missing View Imputation with Generative Adversarial Networks.

    Science.gov (United States)

    Shang, Chao; Palmer, Aaron; Sun, Jiangwen; Chen, Ko-Shin; Lu, Jin; Bi, Jinbo

    2017-01-01

    In an era when big data are becoming the norm, there is less concern with the quantity but more with the quality and completeness of the data. In many disciplines, data are collected from heterogeneous sources, resulting in multi-view or multi-modal datasets. The missing data problem has been challenging to address in multi-view data analysis. Especially, when certain samples miss an entire view of data, it creates the missing view problem. Classic multiple imputations or matrix completion methods are hardly effective here when no information can be based on in the specific view to impute data for such samples. The commonly-used simple method of removing samples with a missing view can dramatically reduce sample size, thus diminishing the statistical power of a subsequent analysis. In this paper, we propose a novel approach for view imputation via generative adversarial networks (GANs), which we name by VIGAN. This approach first treats each view as a separate domain and identifies domain-to-domain mappings via a GAN using randomly-sampled data from each view, and then employs a multi-modal denoising autoencoder (DAE) to reconstruct the missing view from the GAN outputs based on paired data across the views. Then, by optimizing the GAN and DAE jointly, our model enables the knowledge integration for domain mappings and view correspondences to effectively recover the missing view. Empirical results on benchmark datasets validate the VIGAN approach by comparing against the state of the art. The evaluation of VIGAN in a genetic study of substance use disorders further proves the effectiveness and usability of this approach in life science.

  4. [Multicriteria evaluation of environmental risk exposure using a geographic information system in Argentina].

    Science.gov (United States)

    Pietri, Diana De; Dietrich, Patricia; Mayo, Patricia; Carcagno, Alejandro

    2011-10-01

    Develop a spatial model that includes environmental factors posing a health hazard, for application in the Matanza-Riachuelo River Basin (MRB) in Argentina. Multicriteria evaluation procedures were used with geographic information systems to obtain territorial zoning based on the degree of suitability for residence. Variables that characterize the habitability of housing and potential sources of basin pollution were geographically referenced. Health information was taken from the Risk Factor Survey (RFS) to measure the relative risk of living in unsuitable areas (exposed population) compared with suitable areas (unexposed population). Sixty percent of the MRB area is in suitable condition, a situation that affects 40% of residents. The rest of the population lives in unsuitable territory, and 6% live in the basin's most unsuitable conditions. Environmental conditions that are detrimental to health in the unsuitable areas became evident during the interviews through three of the pathologies considered: diarrheal diseases, respiratory diseases, and cancer. A regional analysis that provides valid information to support decisionmaking was obtained. Considering the basin as a unit of analysis allowed the use of a single protocol to undertake comprehensive measurement of the magnitude of risk and, thus, set priorities.

  5. Evaluating the ecotourism potentials of Naharkhoran area in Gorgan using remote sensing and geographic information system

    Science.gov (United States)

    Oladi, Jafar; Bozorgnia, Delavar

    2010-10-01

    Ecotourism may be defined as voluntary travels to intact natural areas in order to enjoy the natural attractions as well as to get familiar with the culture of local communities. The main factor contributing to inappropriate land uses and natural resource destruction is overaggregation of ecotourists in some specific natural areas such as forests and rangelands; while other parts remain unvisited due to the lack of a proper propagation about those areas. Evaluating the ecotourism potentials of each area would lead to a wider participation of local people in natural resource conservation activities. In order to properly introduce the ecotourism potential areas, at first, we carried out land preparation practices using Geographic Information System (GIS) and Remote Sensing (RS) techniques; then, the maps of height, slope and orientation were produced using the digital elevation model (DEM) of the study area. Afterwards, we overlaid these maps and the ecotourism potential areas were identified on the map. These specified areas were classified into two land uses of mass and alternative ecotourism, with three subclasses (including class1, class2 and an inappropriate class) considered for each land use. To classify the image, the training areas determined on the ground using a GPS device (Ground Positioning System) were transferred on the RS image. Subsequently, the ecotourism potential areas were determined using a hybrid method. At the final phase, these areas were compared with the areas determined on the ecotourism potential map; as a result of this comparison, the overlaid ecotourism potential areas were distinguished on the Geographic information System.

  6. Clustering with Missing Values: No Imputation Required

    Science.gov (United States)

    Wagstaff, Kiri

    2004-01-01

    Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.

  7. BRITS: Bidirectional Recurrent Imputation for Time Series

    OpenAIRE

    Cao, Wei; Wang, Dong; Li, Jian; Zhou, Hao; Li, Lei; Li, Yitan

    2018-01-01

    Time series are widely used as signals in many classification/regression tasks. It is ubiquitous that time series contains many missing values. Given multiple correlated time series data, how to fill in missing values and to predict their class labels? Existing imputation methods often impose strong assumptions of the underlying data generating process, such as linear dynamics in the state space. In this paper, we propose BRITS, a novel method based on recurrent neural networks for missing va...

  8. Bootstrap inference when using multiple imputation.

    Science.gov (United States)

    Schomaker, Michael; Heumann, Christian

    2018-04-16

    Many modern estimators require bootstrapping to calculate confidence intervals because either no analytic standard error is available or the distribution of the parameter of interest is nonsymmetric. It remains however unclear how to obtain valid bootstrap inference when dealing with multiple imputation to address missing data. We present 4 methods that are intuitively appealing, easy to implement, and combine bootstrap estimation with multiple imputation. We show that 3 of the 4 approaches yield valid inference, but that the performance of the methods varies with respect to the number of imputed data sets and the extent of missingness. Simulation studies reveal the behavior of our approaches in finite samples. A topical analysis from HIV treatment research, which determines the optimal timing of antiretroviral treatment initiation in young children, demonstrates the practical implications of the 4 methods in a sophisticated and realistic setting. This analysis suffers from missing data and uses the g-formula for inference, a method for which no standard errors are available. Copyright © 2018 John Wiley & Sons, Ltd.

  9. Evaluation of Wind Energy Production in Texas using Geographic Information Systems (GIS)

    Science.gov (United States)

    Ferrer, L. M.

    2017-12-01

    Texas has the highest installed wind capacity in the United States. The purpose of this research was to estimate the theoretical wind turbine energy production and the utilization ratio of wind turbines in Texas. Windfarm data was combined applying Geographic Information System (GIS) methodology to create an updated GIS wind turbine database, including location and technical specifications. Applying GIS diverse tools, the windfarm data was spatially joined with National Renewable Energy Laboratory (NREL) wind data to calculate the wind speed at each turbine hub. The power output for each turbine at the hub wind speed was evaluated by the GIS system according the respective turbine model power curve. In total over 11,700 turbines are installed in Texas with an estimated energy output of 60 GWh per year and an average utilization ratio of 0.32. This research indicates that applying GIS methodologies will be crucial in the growth of wind energy and efficiency in Texas.

  10. Development and evaluation of a geographic information retrieval system using fine grained toponyms

    Directory of Open Access Journals (Sweden)

    Damien Palacio

    2015-12-01

    Full Text Available Geographic information retrieval (GIR is concerned with returning information in response to an information need, typically expressed in terms of a thematic and spatial component linked by a spatial relationship. However, evaluation initiatives have often failed to show significant differences between simple text baselines and more complex spatially enabled GIR approaches. We explore the effectiveness of three systems (a text baseline, spatial query expansion, and a full GIR system utilizing both text and spatial indexes at retrieving documents from a corpus describing mountaineering expeditions, centred around fine grained toponyms. To allow evaluation, we use user generated content (UGC in the form of metadata associated with individual articles to build a test collection of queries and judgments. The test collection allowed us to demonstrate that a GIR-based method significantly outperformed a text baseline for all but very specific queries associated with very small query radii. We argue that such approaches to test collection development have much to offer in the evaluation of GIR.

  11. Evaluating fuel poverty policy in Northern Ireland using a geographic approach

    International Nuclear Information System (INIS)

    Walker, Ryan; Liddell, Christine; McKenzie, Paul; Morris, Chris

    2013-01-01

    Recent audits have shown that anti-fuel poverty policies in the UK depend on loosely defined targeting and cannot accurately identify fuel poor households. New methods of targeting are necessary to improve fuel poverty policy. This paper uses Geographic Information System (GIS) techniques to evaluate the targeting of a home energy efficiency scheme small area level in Northern Ireland, based on the level of need. The concept of need is modelled using an area-based, multi-dimensional fuel poverty risk index. The characteristics and spatial distribution of household retrofits are explored. Policy activity and expenditure are compared with the level of need in an area. Results indicate that policy activity is only weakly associated with the level of need in an area, although policy appears to be well targeted in a few areas. Contrary to existing evidence, rural areas appear to be well served by policy, receiving above average numbers of retrofits and expenditure. There are typically two types of retrofit (major and minor). Most retrofits are minor and may not reduce fuel poverty. These results evidence the limitations of the current targeting system and suggest that there may be scope for improved policy implemented via a more proactive, area-based approach. - Highlights: • We analyse the spatial distribution of home energy efficiency installations. • Significant geographic disparity exists in the rate and cost of home retrofits. • Targeting is only weakly associated with the level of need. • Many interventions are small-scale and are unlikely to reduce fuel poverty. • Results suggest scope for more proactive policy delivered from area-based platforms

  12. Missing value imputation in DNA microarrays based on conjugate gradient method.

    Science.gov (United States)

    Dorri, Fatemeh; Azmi, Paeiz; Dorri, Faezeh

    2012-02-01

    Analysis of gene expression profiles needs a complete matrix of gene array values; consequently, imputation methods have been suggested. In this paper, an algorithm that is based on conjugate gradient (CG) method is proposed to estimate missing values. k-nearest neighbors of the missed entry are first selected based on absolute values of their Pearson correlation coefficient. Then a subset of genes among the k-nearest neighbors is labeled as the best similar ones. CG algorithm with this subset as its input is then used to estimate the missing values. Our proposed CG based algorithm (CGimpute) is evaluated on different data sets. The results are compared with sequential local least squares (SLLSimpute), Bayesian principle component analysis (BPCAimpute), local least squares imputation (LLSimpute), iterated local least squares imputation (ILLSimpute) and adaptive k-nearest neighbors imputation (KNNKimpute) methods. The average of normalized root mean squares error (NRMSE) and relative NRMSE in different data sets with various missing rates shows CGimpute outperforms other methods. Copyright © 2011 Elsevier Ltd. All rights reserved.

  13. Coefficient shifts in geographical ecology: an empirical evaluation of spatial and non-spatial regression

    DEFF Research Database (Denmark)

    Bini, L. M.; Diniz-Filho, J. A. F.; Rangel, T. F. L. V. B.

    2009-01-01

    A major focus of geographical ecology and macroecology is to understand the causes of spatially structured ecological patterns. However, achieving this understanding can be complicated when using multiple regression, because the relative importance of explanatory variables, as measured by regress...

  14. Traffic Speed Data Imputation Method Based on Tensor Completion

    Directory of Open Access Journals (Sweden)

    Bin Ran

    2015-01-01

    Full Text Available Traffic speed data plays a key role in Intelligent Transportation Systems (ITS; however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS. In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC, an efficient tensor completion method, is employed to estimate the missing traffic speed data. This proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. The proposed method is evaluated on Performance Measurement System (PeMS database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches.

  15. Traffic speed data imputation method based on tensor completion.

    Science.gov (United States)

    Ran, Bin; Tan, Huachun; Feng, Jianshuai; Liu, Ying; Wang, Wuhong

    2015-01-01

    Traffic speed data plays a key role in Intelligent Transportation Systems (ITS); however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS). In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC), an efficient tensor completion method, is employed to estimate the missing traffic speed data. This proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. The proposed method is evaluated on Performance Measurement System (PeMS) database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches.

  16. EnerGis: A geographical information based system for the evaluation of integrated energy conversion systems in urban areas

    International Nuclear Information System (INIS)

    Girardin, Luc; Marechal, Francois; Dubuis, Matthias; Calame-Darbellay, Nicole; Favrat, Daniel

    2010-01-01

    A geographical information system has been developed to model the energy requirements of an urban area. The purpose of the platform is to model with sufficient detail the energy services requirements of a given geographical area in order to allow the evaluation of the integration of advanced integrated energy conversion systems. This tool is used to study the emergence of more efficient cities that realize energy efficiency measures, integrate energy efficient conversion technologies and promote the use of endogenous renewable energy. The model is illustrated with case studies for the energetic planning of the Geneva district (Switzerland).

  17. Eco-Environment Status Evaluation and Change Analysis of Qinghai Based on National Geographic Conditions Census Data

    Science.gov (United States)

    Zheng, M.; Zhu, M.; Wang, Y.; Xu, C.; Yang, H.

    2018-04-01

    As the headstream of the Yellow River, the Yangtze River and the Lantsang River, located in the hinterland of Qinghai-Tibet Plateau, Qinghai province is hugely significant for ecosystem as well as for ecological security and sustainable development in China. With the accomplishment of the first national geographic condition census, the frequent monitoring has begun. The classification indicators of the census and monitoring data are highly correlated with Technical Criterion for Ecosystem Status Evaluation released by Ministry of Environmental Protection in 2015. Based on three years' geographic conditions data (2014-2016), Landsat-8 images and thematic data (water resource, pollution emissions, meteorological data, soil erosion, etc.), a multi-years and high-precision eco-environment status evaluation and spatiotemporal change analysis of Qinghai province has been researched on the basis of Technical Criterion for Ecosystem Status Evaluation in this paper. Unlike the evaluation implemented by environmental protection department, the evaluation unit in this paper is town rather than county. The evaluation result shows that the eco-environment status in Qinghai is generally in a fine condition, and has significant regional differences. The eco-environment status evaluation based on national geographic conditions census and monitoring data can improve both the time and space precision. The eco-environment status with high space precise and multi-indices is a key basis for environment protection decision-making.

  18. Two-pass imputation algorithm for missing value estimation in gene expression time series.

    Science.gov (United States)

    Tsiporkova, Elena; Boeva, Veselka

    2007-10-01

    Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different

  19. Multiple imputation in the presence of non-normal data.

    Science.gov (United States)

    Lee, Katherine J; Carlin, John B

    2017-02-20

    Multiple imputation (MI) is becoming increasingly popular for handling missing data. Standard approaches for MI assume normality for continuous variables (conditionally on the other variables in the imputation model). However, it is unclear how to impute non-normally distributed continuous variables. Using simulation and a case study, we compared various transformations applied prior to imputation, including a novel non-parametric transformation, to imputation on the raw scale and using predictive mean matching (PMM) when imputing non-normal data. We generated data from a range of non-normal distributions, and set 50% to missing completely at random or missing at random. We then imputed missing values on the raw scale, following a zero-skewness log, Box-Cox or non-parametric transformation and using PMM with both type 1 and 2 matching. We compared inferences regarding the marginal mean of the incomplete variable and the association with a fully observed outcome. We also compared results from these approaches in the analysis of depression and anxiety symptoms in parents of very preterm compared with term-born infants. The results provide novel empirical evidence that the decision regarding how to impute a non-normal variable should be based on the nature of the relationship between the variables of interest. If the relationship is linear in the untransformed scale, transformation can introduce bias irrespective of the transformation used. However, if the relationship is non-linear, it may be important to transform the variable to accurately capture this relationship. A useful alternative is to impute the variable using PMM with type 1 matching. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  20. Geographic Information Systems using CODES linked data (Crash outcome data evaluation system)

    Science.gov (United States)

    2001-04-01

    This report presents information about geographic information systems (GIS) and CODES linked data. Section one provides an overview of a GIS and the benefits of linking to CODES. Section two outlines the basic issues relative to the types of map data...

  1. A Comparison of Joint Model and Fully Conditional Specification Imputation for Multilevel Missing Data

    Science.gov (United States)

    Mistler, Stephen A.; Enders, Craig K.

    2017-01-01

    Multiple imputation methods can generally be divided into two broad frameworks: joint model (JM) imputation and fully conditional specification (FCS) imputation. JM draws missing values simultaneously for all incomplete variables using a multivariate distribution, whereas FCS imputes variables one at a time from a series of univariate conditional…

  2. Multiple Imputation of Predictor Variables Using Generalized Additive Models

    NARCIS (Netherlands)

    de Jong, Roel; van Buuren, Stef; Spiess, Martin

    2016-01-01

    The sensitivity of multiple imputation methods to deviations from their distributional assumptions is investigated using simulations, where the parameters of scientific interest are the coefficients of a linear regression model, and values in predictor variables are missing at random. The

  3. Partial F-tests with multiply imputed data in the linear regression framework via coefficient of determination.

    Science.gov (United States)

    Chaurasia, Ashok; Harel, Ofer

    2015-02-10

    Tests for regression coefficients such as global, local, and partial F-tests are common in applied research. In the framework of multiple imputation, there are several papers addressing tests for regression coefficients. However, for simultaneous hypothesis testing, the existing methods are computationally intensive because they involve calculation with vectors and (inversion of) matrices. In this paper, we propose a simple method based on the scalar entity, coefficient of determination, to perform (global, local, and partial) F-tests with multiply imputed data. The proposed method is evaluated using simulated data and applied to suicide prevention data. Copyright © 2014 John Wiley & Sons, Ltd.

  4. Comparison of different Methods for Univariate Time Series Imputation in R

    OpenAIRE

    Moritz, Steffen; Sardá, Alexis; Bartz-Beielstein, Thomas; Zaefferer, Martin; Stork, Jörg

    2015-01-01

    Missing values in datasets are a well-known problem and there are quite a lot of R packages offering imputation functions. But while imputation in general is well covered within R, it is hard to find functions for imputation of univariate time series. The problem is, most standard imputation techniques can not be applied directly. Most algorithms rely on inter-attribute correlations, while univariate time series imputation needs to employ time dependencies. This paper provides an overview of ...

  5. Multiple Improvements of Multiple Imputation Likelihood Ratio Tests

    OpenAIRE

    Chan, Kin Wai; Meng, Xiao-Li

    2017-01-01

    Multiple imputation (MI) inference handles missing data by first properly imputing the missing values $m$ times, and then combining the $m$ analysis results from applying a complete-data procedure to each of the completed datasets. However, the existing method for combining likelihood ratio tests has multiple defects: (i) the combined test statistic can be negative in practice when the reference null distribution is a standard $F$ distribution; (ii) it is not invariant to re-parametrization; ...

  6. A web-based approach to data imputation

    KAUST Repository

    Li, Zhixu

    2013-10-24

    In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Moreover, several optimization techniques are also proposed to reduce the cost of estimating the confidence of imputation queries at both the tuple-level and the database-level. Experiments based on several real-world data collections demonstrate not only the effectiveness of WebPut compared to existing approaches, but also the efficiency of our proposed algorithms and optimization techniques. © 2013 Springer Science+Business Media New York.

  7. Missing data treatments matter: an analysis of multiple imputation for anterior cervical discectomy and fusion procedures.

    Science.gov (United States)

    Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Cui, Jonathan J; Basques, Bryce A; Albert, Todd J; Grauer, Jonathan N

    2018-04-09

    The presence of missing data is a limitation of large datasets, including the National Surgical Quality Improvement Program (NSQIP). In addressing this issue, most studies use complete case analysis, which excludes cases with missing data, thus potentially introducing selection bias. Multiple imputation, a statistically rigorous approach that approximates missing data and preserves sample size, may be an improvement over complete case analysis. The present study aims to evaluate the impact of using multiple imputation in comparison with complete case analysis for assessing the associations between preoperative laboratory values and adverse outcomes following anterior cervical discectomy and fusion (ACDF) procedures. This is a retrospective review of prospectively collected data. Patients undergoing one-level ACDF were identified in NSQIP 2012-2015. Perioperative adverse outcome variables assessed included the occurrence of any adverse event, severe adverse events, and hospital readmission. Missing preoperative albumin and hematocrit values were handled using complete case analysis and multiple imputation. These preoperative laboratory levels were then tested for associations with 30-day postoperative outcomes using logistic regression. A total of 11,999 patients were included. Of this cohort, 63.5% of patients had missing preoperative albumin and 9.9% had missing preoperative hematocrit. When using complete case analysis, only 4,311 patients were studied. The removed patients were significantly younger, healthier, of a common body mass index, and male. Logistic regression analysis failed to identify either preoperative hypoalbuminemia or preoperative anemia as significantly associated with adverse outcomes. When employing multiple imputation, all 11,999 patients were included. Preoperative hypoalbuminemia was significantly associated with the occurrence of any adverse event and severe adverse events. Preoperative anemia was significantly associated with the

  8. Geographic information systems - tool for evaluation of the hydro-energy performance of water supply systems

    OpenAIRE

    Aline Christian Pimentel Almeida Santos; José Almir Rodrigues Pereira; Augusto da Gama Rego; Rogério da Silva Santos

    2017-01-01

    The most relevant challenges in the water supply system (WSS) are high water losses and the waste of electric energy. This paper aimed to assess the capacity of the Geographic Information System (GIS) in the analysis of the hydro-energy performance of WSSs. The Stage 1 comprises the selection of data and the respective hydro-energy indexes are defined; cartographic data are defined in Stage 2 and a geo-referenced database is constructed in Stage 3. In the stage 4, the data of the Central Wate...

  9. Geographic Names

    Data.gov (United States)

    Minnesota Department of Natural Resources — The Geographic Names Information System (GNIS), developed by the United States Geological Survey in cooperation with the U.S. Board of Geographic Names, provides...

  10. The Crash Intensity Evaluation Using General Centrality Criterions and a Geographically Weighted Regression

    Science.gov (United States)

    Ghadiriyan Arani, M.; Pahlavani, P.; Effati, M.; Noori Alamooti, F.

    2017-09-01

    Today, one of the social problems influencing on the lives of many people is the road traffic crashes especially the highway ones. In this regard, this paper focuses on highway of capital and the most populous city in the U.S. state of Georgia and the ninth largest metropolitan area in the United States namely Atlanta. Geographically weighted regression and general centrality criteria are the aspects of traffic used for this article. In the first step, in order to estimate of crash intensity, it is needed to extract the dual graph from the status of streets and highways to use general centrality criteria. With the help of the graph produced, the criteria are: Degree, Pageranks, Random walk, Eccentricity, Closeness, Betweenness, Clustering coefficient, Eigenvector, and Straightness. The intensity of crash point is counted for every highway by dividing the number of crashes in that highway to the total number of crashes. Intensity of crash point is calculated for each highway. Then, criteria and crash point were normalized and the correlation between them was calculated to determine the criteria that are not dependent on each other. The proposed hybrid approach is a good way to regression issues because these effective measures result to a more desirable output. R2 values for geographically weighted regression using the Gaussian kernel was 0.539 and also 0.684 was obtained using a triple-core cube. The results showed that the triple-core cube kernel is better for modeling the crash intensity.

  11. THE CRASH INTENSITY EVALUATION USING GENERAL CENTRALITY CRITERIONS AND A GEOGRAPHICALLY WEIGHTED REGRESSION

    Directory of Open Access Journals (Sweden)

    M. Ghadiriyan Arani

    2017-09-01

    Full Text Available Today, one of the social problems influencing on the lives of many people is the road traffic crashes especially the highway ones. In this regard, this paper focuses on highway of capital and the most populous city in the U.S. state of Georgia and the ninth largest metropolitan area in the United States namely Atlanta. Geographically weighted regression and general centrality criteria are the aspects of traffic used for this article. In the first step, in order to estimate of crash intensity, it is needed to extract the dual graph from the status of streets and highways to use general centrality criteria. With the help of the graph produced, the criteria are: Degree, Pageranks, Random walk, Eccentricity, Closeness, Betweenness, Clustering coefficient, Eigenvector, and Straightness. The intensity of crash point is counted for every highway by dividing the number of crashes in that highway to the total number of crashes. Intensity of crash point is calculated for each highway. Then, criteria and crash point were normalized and the correlation between them was calculated to determine the criteria that are not dependent on each other. The proposed hybrid approach is a good way to regression issues because these effective measures result to a more desirable output. R2 values for geographically weighted regression using the Gaussian kernel was 0.539 and also 0.684 was obtained using a triple-core cube. The results showed that the triple-core cube kernel is better for modeling the crash intensity.

  12. Age at menopause: imputing age at menopause for women with a hysterectomy with application to risk of postmenopausal breast cancer

    Science.gov (United States)

    Rosner, Bernard; Colditz, Graham A.

    2011-01-01

    Purpose Age at menopause, a major marker in the reproductive life, may bias results for evaluation of breast cancer risk after menopause. Methods We follow 38,948 premenopausal women in 1980 and identify 2,586 who reported hysterectomy without bilateral oophorectomy, and 31,626 who reported natural menopause during 22 years of follow-up. We evaluate risk factors for natural menopause, impute age at natural menopause for women reporting hysterectomy without bilateral oophorectomy and estimate the hazard of reaching natural menopause in the next 2 years. We apply this imputed age at menopause to both increase sample size and to evaluate the relation between postmenopausal exposures and risk of breast cancer. Results Age, cigarette smoking, age at menarche, pregnancy history, body mass index, history of benign breast disease, and history of breast cancer were each significantly related to age at natural menopause; duration of oral contraceptive use and family history of breast cancer were not. The imputation increased sample size substantially and although some risk factors after menopause were weaker in the expanded model (height, and alcohol use), use of hormone therapy is less biased. Conclusions Imputing age at menopause increases sample size, broadens generalizability making it applicable to women with hysterectomy, and reduces bias. PMID:21441037

  13. Dose estimation and evaluation of protector measures for a power plant's accidents scenario, using geographical information system

    International Nuclear Information System (INIS)

    Costa, E.M.; Biagio, R.M.S.; Alves, R.N.

    1999-01-01

    Since the initial phase of a project of a nuclear plant several environmental studies are carried out, and a considerable amount of relevant information is generated. Therefore, there is an increasing need of an integrated analysis of this information in order to better evaluate the potential impact associated to hypothetical accident scenarios of such plants. This paper presents a case-study, in which a hypothetical accident scenario is analysed taking into account the environmental and populational information of the Brazilian nuclear power plants region by using a geographical information system. Important areas for planning of protective measures are identified to provide a basis for further analysis. (author)

  14. Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes

    Directory of Open Access Journals (Sweden)

    Lotz Meredith J

    2008-01-01

    Full Text Available Abstract Background Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method has its own advantages, but the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of eight current imputation methods on multiple types of microarray experiments, including time series, multiple exposures, and multiple exposures × time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data set. Results We found that the optimal imputation algorithms (LSA, LLS, and BPCA are all highly competitive with each other, and that no method is uniformly superior in all the data sets we examined. The success of each method can also depend on the underlying "complexity" of the expression data, where we take complexity to indicate the difficulty in mapping the gene expression matrix to a lower-dimensional subspace. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS scheme is useful for selecting an appropriate imputation algorithm. We further propose a simulation-based self-training selection (STS scheme. This technique has been used previously for microarray data imputation, but for different purposes. The scheme selects the optimal or near-optimal method with high accuracy but at an increased computational cost. Conclusion Our findings provide insight into the problem of which imputation method is optimal for a given data set. Three top-performing methods (LSA, LLS and BPCA are competitive with each other. Global-based imputation methods (PLS, SVD, BPCA

  15. Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes.

    Science.gov (United States)

    Brock, Guy N; Shaffer, John R; Blakesley, Richard E; Lotz, Meredith J; Tseng, George C

    2008-01-10

    Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method has its own advantages, but the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of eight current imputation methods on multiple types of microarray experiments, including time series, multiple exposures, and multiple exposures x time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data set. We found that the optimal imputation algorithms (LSA, LLS, and BPCA) are all highly competitive with each other, and that no method is uniformly superior in all the data sets we examined. The success of each method can also depend on the underlying "complexity" of the expression data, where we take complexity to indicate the difficulty in mapping the gene expression matrix to a lower-dimensional subspace. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS) scheme is useful for selecting an appropriate imputation algorithm. We further propose a simulation-based self-training selection (STS) scheme. This technique has been used previously for microarray data imputation, but for different purposes. The scheme selects the optimal or near-optimal method with high accuracy but at an increased computational cost. Our findings provide insight into the problem of which imputation method is optimal for a given data set. Three top-performing methods (LSA, LLS and BPCA) are competitive with each other. Global-based imputation methods (PLS, SVD, BPCA) performed better on mcroarray data with lower complexity

  16. Missing Data Imputation of Solar Radiation Data under Different Atmospheric Conditions

    Science.gov (United States)

    Turrado, Concepción Crespo; López, María del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; de Cos Juez, Francisco Javier

    2014-01-01

    Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW. PMID:25356644

  17. Missing Data Imputation of Solar Radiation Data under Different Atmospheric Conditions

    Directory of Open Access Journals (Sweden)

    Concepción Crespo Turrado

    2014-10-01

    Full Text Available Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE. This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW and Multiple Linear Regression (MLR. The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW.

  18. Missing data imputation of solar radiation data under different atmospheric conditions.

    Science.gov (United States)

    Turrado, Concepción Crespo; López, María Del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; Juez, Francisco Javier de Cos

    2014-10-29

    Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW.

  19. Assessing accuracy of genotype imputation in American Indians.

    Directory of Open Access Journals (Sweden)

    Alka Malhotra

    Full Text Available Genotype imputation is commonly used in genetic association studies to test untyped variants using information on linkage disequilibrium (LD with typed markers. Imputing genotypes requires a suitable reference population in which the LD pattern is known, most often one selected from HapMap. However, some populations, such as American Indians, are not represented in HapMap. In the present study, we assessed accuracy of imputation using HapMap reference populations in a genome-wide association study in Pima Indians.Data from six randomly selected chromosomes were used. Genotypes in the study population were masked (either 1% or 20% of SNPs available for a given chromosome. The masked genotypes were then imputed using the software Markov Chain Haplotyping Algorithm. Using four HapMap reference populations, average genotype error rates ranged from 7.86% for Mexican Americans to 22.30% for Yoruba. In contrast, use of the original Pima Indian data as a reference resulted in an average error rate of 1.73%.Our results suggest that the use of HapMap reference populations results in substantial inaccuracy in the imputation of genotypes in American Indians. A possible solution would be to densely genotype or sequence a reference American Indian population.

  20. Geographic information systems - tool for evaluation of the hydro-energy performance of water supply systems

    Directory of Open Access Journals (Sweden)

    Aline Christian Pimentel Almeida Santos

    2017-05-01

    Full Text Available The most relevant challenges in the water supply system (WSS are high water losses and the waste of electric energy. This paper aimed to assess the capacity of the Geographic Information System (GIS in the analysis of the hydro-energy performance of WSSs. The Stage 1 comprises the selection of data and the respective hydro-energy indexes are defined; cartographic data are defined in Stage 2 and a geo-referenced database is constructed in Stage 3. In the stage 4, the data of the Central Water Supply Zone administered by the Water Works Company of the state of Pará in Belém, Brazil were employed to assess its applicability, in which the sectors with the worst hydro-energy performance were identified, such as Sector 9, with the highest water loss rates (59.11% and electric energy consumption per m3 of water produced (1.57 kwh m-³. The results shows that geo-referential assessment of the hydro-energy performance of WSSs provided accurate information for decision-taking related to the rational use of water and electricity in the systems.

  1. The multiple imputation method: a case study involving secondary data analysis.

    Science.gov (United States)

    Walani, Salimah R; Cleland, Charles M

    2015-05-01

    To illustrate with the example of a secondary data analysis study the use of the multiple imputation method to replace missing data. Most large public datasets have missing data, which need to be handled by researchers conducting secondary data analysis studies. Multiple imputation is a technique widely used to replace missing values while preserving the sample size and sampling variability of the data. The 2004 National Sample Survey of Registered Nurses. The authors created a model to impute missing values using the chained equation method. They used imputation diagnostics procedures and conducted regression analysis of imputed data to determine the differences between the log hourly wages of internationally educated and US-educated registered nurses. The authors used multiple imputation procedures to replace missing values in a large dataset with 29,059 observations. Five multiple imputed datasets were created. Imputation diagnostics using time series and density plots showed that imputation was successful. The authors also present an example of the use of multiple imputed datasets to conduct regression analysis to answer a substantive research question. Multiple imputation is a powerful technique for imputing missing values in large datasets while preserving the sample size and variance of the data. Even though the chained equation method involves complex statistical computations, recent innovations in software and computation have made it possible for researchers to conduct this technique on large datasets. The authors recommend nurse researchers use multiple imputation methods for handling missing data to improve the statistical power and external validity of their studies.

  2. TRIP: An interactive retrieving-inferring data imputation approach

    KAUST Repository

    Li, Zhixu

    2016-06-25

    Data imputation aims at filling in missing attribute values in databases. Existing imputation approaches to nonquantitive string data can be roughly put into two categories: (1) inferring-based approaches [2], and (2) retrieving-based approaches [1]. Specifically, the inferring-based approaches find substitutes or estimations for the missing ones from the complete part of the data set. However, they typically fall short in filling in unique missing attribute values which do not exist in the complete part of the data set [1]. The retrieving-based approaches resort to external resources for help by formulating proper web search queries to retrieve web pages containing the missing values from the Web, and then extracting the missing values from the retrieved web pages [1]. This webbased retrieving approach reaches a high imputation precision and recall, but on the other hand, issues a large number of web search queries, which brings a large overhead [1]. © 2016 IEEE.

  3. TRIP: An interactive retrieving-inferring data imputation approach

    KAUST Repository

    Li, Zhixu; Qin, Lu; Cheng, Hong; Zhang, Xiangliang; Zhou, Xiaofang

    2016-01-01

    Data imputation aims at filling in missing attribute values in databases. Existing imputation approaches to nonquantitive string data can be roughly put into two categories: (1) inferring-based approaches [2], and (2) retrieving-based approaches [1]. Specifically, the inferring-based approaches find substitutes or estimations for the missing ones from the complete part of the data set. However, they typically fall short in filling in unique missing attribute values which do not exist in the complete part of the data set [1]. The retrieving-based approaches resort to external resources for help by formulating proper web search queries to retrieve web pages containing the missing values from the Web, and then extracting the missing values from the retrieved web pages [1]. This webbased retrieving approach reaches a high imputation precision and recall, but on the other hand, issues a large number of web search queries, which brings a large overhead [1]. © 2016 IEEE.

  4. Imputed prices of greenhouse gases and land forests

    International Nuclear Information System (INIS)

    Uzawa, Hirofumi

    1993-01-01

    The theory of dynamic optimum formulated by Maeler gives us the basic theoretical framework within which it is possible to analyse the economic and, possibly, political circumstances under which the phenomenon of global warming occurs, and to search for the policy and institutional arrangements whereby it would be effectively arrested. The analysis developed here is an application of Maeler's theory to atmospheric quality. In the analysis a central role is played by the concept of imputed price in the dynamic context. Our determination of imputed prices of atmospheric carbon dioxide and land forests takes into account the difference in the stages of economic development. Indeed, the ratios of the imputed prices of atmospheric carbon dioxide and land forests over the per capita level of real national income are identical for all countries involved. (3 figures, 2 tables) (Author)

  5. IMPROVEMENT EVALUATION ON CERAMIC ROOF EXTRACTION USING WORLDVIEW-2 IMAGERY AND GEOGRAPHIC DATA MINING APPROACH

    Directory of Open Access Journals (Sweden)

    V. S. Brum-Bastos

    2016-06-01

    Full Text Available Advances in geotechnologies and in remote sensing have improved analysis of urban environments. The new sensors are increasingly suited to urban studies, due to the enhancement in spatial, spectral and radiometric resolutions. Urban environments present high heterogeneity, which cannot be tackled using pixel–based approaches on high resolution images. Geographic Object–Based Image Analysis (GEOBIA has been consolidated as a methodology for urban land use and cover monitoring; however, classification of high resolution images is still troublesome. This study aims to assess the improvement on ceramic roof classification using WorldView-2 images due to the increase of 4 new bands besides the standard “Blue-Green-Red-Near Infrared” bands. Our methodology combines GEOBIA, C4.5 classification tree algorithm, Monte Carlo simulation and statistical tests for classification accuracy. Two samples groups were considered: 1 eight multispectral and panchromatic bands, and 2 four multispectral and panchromatic bands, representing previous high-resolution sensors. The C4.5 algorithm generates a decision tree that can be used for classification; smaller decision trees are closer to the semantic networks produced by experts on GEOBIA, while bigger trees, are not straightforward to implement manually, but are more accurate. The choice for a big or small tree relies on the user’s skills to implement it. This study aims to determine for what kind of user the addition of the 4 new bands might be beneficial: 1 the common user (smaller trees or 2 a more skilled user with coding and/or data mining abilities (bigger trees. In overall the classification was improved by the addition of the four new bands for both types of users.

  6. Evaluation of trap capture in a geographically closed population of brown treesnakes on Guam

    Science.gov (United States)

    Tyrrell, C.L.; Christy, M.T.; Rodda, G.H.; Yackel Adams, A.A.; Ellingson, A.R.; Savidge, J.A.; Dean-Bradley, K.; Bischof, R.

    2009-01-01

    1. Open population mark-recapture analysis of unbounded populations accommodates some types of closure violations (e.g. emigration, immigration). In contrast, closed population analysis of such populations readily allows estimation of capture heterogeneity and behavioural response, but requires crucial assumptions about closure (e.g. no permanent emigration) that are suspect and rarely tested empirically. 2. In 2003, we erected a double-sided barrier to prevent movement of snakes in or out of a 5-ha semi-forested study site in northern Guam. This geographically closed population of >100 snakes was monitored using a series of transects for visual searches and a 13 ?? 13 trapping array, with the aim of marking all snakes within the site. Forty-five marked snakes were also supplemented into the resident population to quantify the efficacy of our sampling methods. We used the program mark to analyse trap captures (101 occasions), referenced to census data from visual surveys, and quantified heterogeneity, behavioural response, and size bias in trappability. Analytical inclusion of untrapped individuals greatly improved precision in the estimation of some covariate effects. 3. A novel discovery was that trap captures for individual snakes consisted of asynchronous bouts of high capture probability lasting about 7 days (ephemeral behavioural effect). There was modest behavioural response (trap happiness) and significant latent (unexplained) heterogeneity, with small influences on capture success of date, gender, residency status (translocated or not), and body condition. 4. Trapping was shown to be an effective tool for eradicating large brown treesnakes Boiga irregularis (>900 mm snout-vent length, SVL). 5. Synthesis and applications. Mark-recapture modelling is commonly used by ecological managers to estimate populations. However, existing models involve making assumptions about either closure violations or response to capture. Physical closure of our population on a

  7. Evaluation of the 3d Urban Modelling Capabilities in Geographical Information Systems

    Science.gov (United States)

    Dogru, A. O.; Seker, D. Z.

    2010-12-01

    Geographical Information System (GIS) Technology, which provides successful solutions to basic spatial problems, is currently widely used in 3 dimensional (3D) modeling of physical reality with its developing visualization tools. The modeling of large and complicated phenomenon is a challenging problem in terms of computer graphics currently in use. However, it is possible to visualize that phenomenon in 3D by using computer systems. 3D models are used in developing computer games, military training, urban planning, tourism and etc. The use of 3D models for planning and management of urban areas is very popular issue of city administrations. In this context, 3D City models are produced and used for various purposes. However the requirements of the models vary depending on the type and scope of the application. While a high level visualization, where photorealistic visualization techniques are widely used, is required for touristy and recreational purposes, an abstract visualization of the physical reality is generally sufficient for the communication of the thematic information. The visual variables, which are the principle components of cartographic visualization, such as: color, shape, pattern, orientation, size, position, and saturation are used for communicating the thematic information. These kinds of 3D city models are called as abstract models. Standardization of technologies used for 3D modeling is now available by the use of CityGML. CityGML implements several novel concepts to support interoperability, consistency and functionality. For example it supports different Levels-of-Detail (LoD), which may arise from independent data collection processes and are used for efficient visualization and efficient data analysis. In one CityGML data set, the same object may be represented in different LoD simultaneously, enabling the analysis and visualization of the same object with regard to different degrees of resolution. Furthermore, two CityGML data sets

  8. Multiple imputation of missing passenger boarding data in the national census of ferry operators

    Science.gov (United States)

    2008-08-01

    This report presents findings from the 2006 National Census of Ferry Operators (NCFO) augmented with imputed values for passengers and passenger miles. Due to the imputation procedures used to calculate missing data, totals in Table 1 may not corresp...

  9. Evaluating the effects of urban congestion pricing: Geographical accessibility versus social surplus

    NARCIS (Netherlands)

    Tillema, T.; Verhoef, E.T.; van Wee, G.P.; van Amelsfort, D.

    2011-01-01

    In urbanised areas around the world, road pricing policies are considered more and more frequently, the aim often being to alleviate (some of the) external traffic-related costs. To assess the effects of a proposed road pricing measure, several evaluation measures can be used, coming from different

  10. Evaluating the effects of urban congestion pricing : geographical accessibility versus social surplus

    NARCIS (Netherlands)

    Tillema, Taede; Verhoef, Erik; van Wee, Bert; van Amelsfort, Dirk; van Wee, G.P

    2011-01-01

    In urbanised areas around the world, road pricing policies are considered more and more frequently, the aim often being to alleviate (some of the) external traffic-related costs. To assess the effects of a proposed road pricing measure, several evaluation measures can be used, coming from different

  11. Evaluation Models for E-Learning Platform in Riyadh City Universities (RCU with Applied of Geographical Information System (GIS

    Directory of Open Access Journals (Sweden)

    Abdulaziz I. Alharrah

    2014-12-01

    Full Text Available E-learning that integrates digital knowledge content, network and information technology has become an emerging learning method. As the e-learning platform approach is becoming an important tool to allow the flexibility and quality requested by such a kind of learning process. There is a new kind of problem faced by organizations consisting in the selection of the most suitable e-learning platform. This paper proposes evaluation model for E-Learning platform in Riyadh City universities (RCU with Applied Geographic Information System (GIS. The E-Learning platform solution selection is a multiple criteria decision-making problem that needs to be addressed objectively taking into consideration the relative weights of the criteria for any organization. We formulate the quoted multi criteria problem as a decision hierarchy to be solved using GIS. AGIS-based evaluation index system and web-based evaluating platform were established. In this paper we will show the general evaluation strategy and some obtained results using our model to evaluate some existing commercial platforms.The results of evaluation model are outlined as follows: Total weights of the proposed framework in management feature is 20.25/25, in collaborative feature is 9.2/10, in adaption learning path is 6.8/10 and in interactive learning object is 5/5. The total weights of all features are 41.25/50. In this study an evaluation model was applied on Riyadh City universities like KSU, IMAMU, NAUSS, YU and FU. Then, the results were compared with each other. The total weighs of KSU was 41. While the total weights of FU, IMAMU, YU and NAUSS was 40, 37, 36 and 32, respectively. Evaluation process shows that the proposed framework satisfied the objectives with applied GIS.

  12. Evaluation of complex geographic excursion - Vinařická horka using a tablet

    OpenAIRE

    Hájková, Kateřina

    2015-01-01

    The thesis analyses data from both qualitative and quantitative research on the positives and negatives of field education with tablets. An indispensable part is an assessment of the premise that education with tablets affects the attitudes of pupils to geography (whether positively or negatively) and to natural sciences in general. The thesis further summarises important ideas of students and teachers in primary and secondary schools. The conclusion evaluates whether the current emphasis of ...

  13. Synthetic Multiple-Imputation Procedure for Multistage Complex Samples

    Directory of Open Access Journals (Sweden)

    Zhou Hanzhi

    2016-03-01

    Full Text Available Multiple imputation (MI is commonly used when item-level missing data are present. However, MI requires that survey design information be built into the imputation models. For multistage stratified clustered designs, this requires dummy variables to represent strata as well as primary sampling units (PSUs nested within each stratum in the imputation model. Such a modeling strategy is not only operationally burdensome but also inferentially inefficient when there are many strata in the sample design. Complexity only increases when sampling weights need to be modeled. This article develops a generalpurpose analytic strategy for population inference from complex sample designs with item-level missingness. In a simulation study, the proposed procedures demonstrate efficient estimation and good coverage properties. We also consider an application to accommodate missing body mass index (BMI data in the analysis of BMI percentiles using National Health and Nutrition Examination Survey (NHANES III data. We argue that the proposed methods offer an easy-to-implement solution to problems that are not well-handled by current MI techniques. Note that, while the proposed method borrows from the MI framework to develop its inferential methods, it is not designed as an alternative strategy to release multiply imputed datasets for complex sample design data, but rather as an analytic strategy in and of itself.

  14. Sensitivity analysis in multiple imputation in effectiveness studies of psychotherapy.

    Science.gov (United States)

    Crameri, Aureliano; von Wyl, Agnes; Koemeda, Margit; Schulthess, Peter; Tschuschke, Volker

    2015-01-01

    The importance of preventing and treating incomplete data in effectiveness studies is nowadays emphasized. However, most of the publications focus on randomized clinical trials (RCT). One flexible technique for statistical inference with missing data is multiple imputation (MI). Since methods such as MI rely on the assumption of missing data being at random (MAR), a sensitivity analysis for testing the robustness against departures from this assumption is required. In this paper we present a sensitivity analysis technique based on posterior predictive checking, which takes into consideration the concept of clinical significance used in the evaluation of intra-individual changes. We demonstrate the possibilities this technique can offer with the example of irregular longitudinal data collected with the Outcome Questionnaire-45 (OQ-45) and the Helping Alliance Questionnaire (HAQ) in a sample of 260 outpatients. The sensitivity analysis can be used to (1) quantify the degree of bias introduced by missing not at random data (MNAR) in a worst reasonable case scenario, (2) compare the performance of different analysis methods for dealing with missing data, or (3) detect the influence of possible violations to the model assumptions (e.g., lack of normality). Moreover, our analysis showed that ratings from the patient's and therapist's version of the HAQ could significantly improve the predictive value of the routine outcome monitoring based on the OQ-45. Since analysis dropouts always occur, repeated measurements with the OQ-45 and the HAQ analyzed with MI are useful to improve the accuracy of outcome estimates in quality assurance assessments and non-randomized effectiveness studies in the field of outpatient psychotherapy.

  15. Geographical evaluation of the impact of nuclear power plants on settlement structures

    International Nuclear Information System (INIS)

    Divinsky, B.

    1992-01-01

    The effects of nuclear power plants are classed with respect to their character (one-sided or many-sided), order (primary or secondary), quality (positive or adverse), duration (temporary or permanent), and space (microregional or macroregional). The following topics must be included in the methodology of evaluation of the impacts of a nuclear power plant on the region: characteristics of the present settlement network, relationships within the settlement system, spatial transformation of settlements, development of urbanization, population density, town and village sizes, functional types of settlements, migration, age and social structure of the population, economic activity, town and village facilities, technical infrastructure, transport and traffic, psycho-social impacts of the occurrence of the nuclear power plant, microecology (microenvironment). (M.D.). 5 refs

  16. Sequence imputation of HPV16 genomes for genetic association studies.

    Directory of Open Access Journals (Sweden)

    Benjamin Smith

    Full Text Available Human Papillomavirus type 16 (HPV16 causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs is unknown. In addition, we neither know which nucleotides vary across and within HPV types and lineages, nor which of the single nucleotide polymorphisms (SNPs determine oncogenicity.A reference set of 62 HPV16 complete genome sequences was established and used to examine patterns of evolutionary relatedness amongst variants using a pairwise identity heatmap and HPV16 phylogeny. A BLAST-based algorithm was developed to impute complete genome data from partial sequence information using the reference database. To interrogate the oncogenic risk of determined and imputed HPV16 SNPs, odds-ratios for each SNP were calculated in a case-control viral genome-wide association study (VWAS using biopsy confirmed high-grade cervix neoplasia and self-limited HPV16 infections from Guanacaste, Costa Rica.HPV16 variants display evolutionarily stable lineages that contain conserved diagnostic SNPs. The imputation algorithm indicated that an average of 97.5±1.03% of SNPs could be accurately imputed. The VWAS revealed specific HPV16 viral SNPs associated with variant lineages and elevated odds ratios; however, individual causal SNPs could not be distinguished with certainty due to the nature of HPV evolution.Conserved and lineage-specific SNPs can be imputed with a high degree of accuracy from limited viral polymorphic data due to the lack of recombination and the stochastic mechanism of variation accumulation in the HPV genome. However, to determine the role of novel variants or non-lineage-specific SNPs by VWAS will require direct sequence analysis. The investigation of patterns of genetic variation and the identification of diagnostic SNPs for lineages of HPV16 variants provides a valuable resource for future studies of HPV16

  17. Imputing amino acid polymorphisms in human leukocyte antigens.

    Directory of Open Access Journals (Sweden)

    Xiaoming Jia

    Full Text Available DNA sequence variation within human leukocyte antigen (HLA genes mediate susceptibility to a wide range of human diseases. The complex genetic structure of the major histocompatibility complex (MHC makes it difficult, however, to collect genotyping data in large cohorts. Long-range linkage disequilibrium between HLA loci and SNP markers across the major histocompatibility complex (MHC region offers an alternative approach through imputation to interrogate HLA variation in existing GWAS data sets. Here we describe a computational strategy, SNP2HLA, to impute classical alleles and amino acid polymorphisms at class I (HLA-A, -B, -C and class II (-DPA1, -DPB1, -DQA1, -DQB1, and -DRB1 loci. To characterize performance of SNP2HLA, we constructed two European ancestry reference panels, one based on data collected in HapMap-CEPH pedigrees (90 individuals and another based on data collected by the Type 1 Diabetes Genetics Consortium (T1DGC, 5,225 individuals. We imputed HLA alleles in an independent data set from the British 1958 Birth Cohort (N = 918 with gold standard four-digit HLA types and SNPs genotyped using the Affymetrix GeneChip 500 K and Illumina Immunochip microarrays. We demonstrate that the sample size of the reference panel, rather than SNP density of the genotyping platform, is critical to achieve high imputation accuracy. Using the larger T1DGC reference panel, the average accuracy at four-digit resolution is 94.7% using the low-density Affymetrix GeneChip 500 K, and 96.7% using the high-density Illumina Immunochip. For amino acid polymorphisms within HLA genes, we achieve 98.6% and 99.3% accuracy using the Affymetrix GeneChip 500 K and Illumina Immunochip, respectively. Finally, we demonstrate how imputation and association testing at amino acid resolution can facilitate fine-mapping of primary MHC association signals, giving a specific example from type 1 diabetes.

  18. [Data fusion and multi-components quantitative analysis for identification and quality evaluation of Gentiana rigescens from different geographical origins].

    Science.gov (United States)

    Wang, Qin-Qin; Shen, Tao; Zuo, Zhi-Tian; Huang, Heng-Yu; Wang, Yuan-Zhong

    2018-03-01

    The accumulation of secondary metabolites of traditional Chinese medicine (TCM) is closely related to its origins. The identification of origins and multi-components quantitative evaluation are of great significance to ensure the quality of medicinal materials. In this study, the identification of Gentiana rigescens from different geographical origins was conducted by data fusion of Fourier transform infrared (FTIR) spectroscopy and high performance liquid chromatography (HPLC) in combination of partial least squares discriminant analysis; meanwhile quantitative analysis of index components was conducted to provide an accurate and comprehensive identification and quality evaluation strategy for selecting the best production areas of G. rigescens. In this study, the FTIR and HPLC information of 169 G. rigescens samples from Yunnan, Sichuan, Guangxi and Guizhou Provinces were collected. The raw infrared spectra were pre-treated by multiplicative scatter correction, standard normal variate (SNV) and Savitzky-Golay (SG) derivative. Then the performances of FTIR, HPLC, and low-level data fusion and mid-level data fusion for identification were compared, and the contents of gentiopicroside, swertiamarin, loganic acid and sweroside were determined by HPLC. The results showed that the FTIR spectra of G. rigescens from different geographical origins were different, and the best pre-treatment method was SNV+SG-derivative (second derivative, 15 as the window parameter, and 2 as the polynomial order). The results showed that the accuracy rate of low- and mid-level data fusion (96.43%) in prediction set was higher than that of FTIR and HPLC (94.64%) in prediction set. In addition, the accuracy of low-level data fusion (100%) in the training set was higher than that of mid-level data fusion (99.12%) in training set. The contents of the iridoid glycosides in Yunnan were the highest among different provinces. The average content of gentiopicroside, as a bioactive marker in Chinese

  19. Evaluation of the Systematic Status of Geographical Variations in Arcuphantes hibanus (Arachnida: Araneae: Linyphiidae), with Descriptions of Two New Species.

    Science.gov (United States)

    Nakano, Takafumi; Ihara, Yoh; Kumasaki, Yusuke; Baba, Yuki G; Tomikawa, Ko

    2017-08-01

    The systematic status of geographical variants of Arcuphantes hibanus Saito, 1992 belonging to the A. longiscapus species group, indigenous to western Honshu and Shikoku, Japan, was evaluated using morphological and molecular data. Two species, A. enmusubi Ihara, Nakano and Tomikawa, sp. nov. and A. occidentalis Ihara, Nakano and Tomikawa, sp. nov., are described, and A. hibanus is redescribed with redefinition of its taxonomic status. These three species are diagnosed by the characteristics of paracymbium, pseudolamella, and epigynal basal part. Phylogenetic trees obtained with mitochondrial cytochrome c oxidase subunit I and 16S rRNA markers showed that the variants are mutually genetically highly diverged. However, the mtDNA phylogenies failed to recover the monophyly of A. hibanus redefined herein. Contrary to the mtDNA phylogenetic analyses, a neighbor-network analysis of nuclear internal transcribed spacer 1 sequences of A. hibanus, A. enmusubi and A. occidentalis spiders showed that each of them forms a cluster. The results of mitochondrial and nuclear DNA analyses in each of the three species are briefly discussed, along with their taxonomic identities.

  20. Towards a more efficient representation of imputation operators in TPOT

    OpenAIRE

    Garciarena, Unai; Mendiburu, Alexander; Santana, Roberto

    2018-01-01

    Automated Machine Learning encompasses a set of meta-algorithms intended to design and apply machine learning techniques (e.g., model selection, hyperparameter tuning, model assessment, etc.). TPOT, a software for optimizing machine learning pipelines based on genetic programming (GP), is a novel example of this kind of applications. Recently we have proposed a way to introduce imputation methods as part of TPOT. While our approach was able to deal with problems with missing data, it can prod...

  1. DTW-APPROACH FOR UNCORRELATED MULTIVARIATE TIME SERIES IMPUTATION

    OpenAIRE

    Phan , Thi-Thu-Hong; Poisson Caillault , Emilie; Bigand , André; Lefebvre , Alain

    2017-01-01

    International audience; Missing data are inevitable in almost domains of applied sciences. Data analysis with missing values can lead to a loss of efficiency and unreliable results, especially for large missing sub-sequence(s). Some well-known methods for multivariate time series imputation require high correlations between series or their features. In this paper , we propose an approach based on the shape-behaviour relation in low/un-correlated multivariate time series under an assumption of...

  2. Which DTW Method Applied to Marine Univariate Time Series Imputation

    OpenAIRE

    Phan , Thi-Thu-Hong; Caillault , Émilie; Lefebvre , Alain; Bigand , André

    2017-01-01

    International audience; Missing data are ubiquitous in any domains of applied sciences. Processing datasets containing missing values can lead to a loss of efficiency and unreliable results, especially for large missing sub-sequence(s). Therefore, the aim of this paper is to build a framework for filling missing values in univariate time series and to perform a comparison of different similarity metrics used for the imputation task. This allows to suggest the most suitable methods for the imp...

  3. Imputation of missing data in time series for air pollutants

    Science.gov (United States)

    Junger, W. L.; Ponce de Leon, A.

    2015-02-01

    Missing data are major concerns in epidemiological studies of the health effects of environmental air pollutants. This article presents an imputation-based method that is suitable for multivariate time series data, which uses the EM algorithm under the assumption of normal distribution. Different approaches are considered for filtering the temporal component. A simulation study was performed to assess validity and performance of proposed method in comparison with some frequently used methods. Simulations showed that when the amount of missing data was as low as 5%, the complete data analysis yielded satisfactory results regardless of the generating mechanism of the missing data, whereas the validity began to degenerate when the proportion of missing values exceeded 10%. The proposed imputation method exhibited good accuracy and precision in different settings with respect to the patterns of missing observations. Most of the imputations obtained valid results, even under missing not at random. The methods proposed in this study are implemented as a package called mtsdi for the statistical software system R.

  4. Combining item response theory with multiple imputation to equate health assessment questionnaires.

    Science.gov (United States)

    Gu, Chenyang; Gutman, Roee

    2017-09-01

    The assessment of patients' functional status across the continuum of care requires a common patient assessment tool. However, assessment tools that are used in various health care settings differ and cannot be easily contrasted. For example, the Functional Independence Measure (FIM) is used to evaluate the functional status of patients who stay in inpatient rehabilitation facilities, the Minimum Data Set (MDS) is collected for all patients who stay in skilled nursing facilities, and the Outcome and Assessment Information Set (OASIS) is collected if they choose home health care provided by home health agencies. All three instruments or questionnaires include functional status items, but the specific items, rating scales, and instructions for scoring different activities vary between the different settings. We consider equating different health assessment questionnaires as a missing data problem, and propose a variant of predictive mean matching method that relies on Item Response Theory (IRT) models to impute unmeasured item responses. Using real data sets, we simulated missing measurements and compared our proposed approach to existing methods for missing data imputation. We show that, for all of the estimands considered, and in most of the experimental conditions that were examined, the proposed approach provides valid inferences, and generally has better coverages, relatively smaller biases, and shorter interval estimates. The proposed method is further illustrated using a real data set. © 2016, The International Biometric Society.

  5. Using beta coefficients to impute missing correlations in meta-analysis research: Reasons for caution.

    Science.gov (United States)

    Roth, Philip L; Le, Huy; Oh, In-Sue; Van Iddekinge, Chad H; Bobko, Philip

    2018-06-01

    Meta-analysis has become a well-accepted method for synthesizing empirical research about a given phenomenon. Many meta-analyses focus on synthesizing correlations across primary studies, but some primary studies do not report correlations. Peterson and Brown (2005) suggested that researchers could use standardized regression weights (i.e., beta coefficients) to impute missing correlations. Indeed, their beta estimation procedures (BEPs) have been used in meta-analyses in a wide variety of fields. In this study, the authors evaluated the accuracy of BEPs in meta-analysis. We first examined how use of BEPs might affect results from a published meta-analysis. We then developed a series of Monte Carlo simulations that systematically compared the use of existing correlations (that were not missing) to data sets that incorporated BEPs (that impute missing correlations from corresponding beta coefficients). These simulations estimated ρ̄ (mean population correlation) and SDρ (true standard deviation) across a variety of meta-analytic conditions. Results from both the existing meta-analysis and the Monte Carlo simulations revealed that BEPs were associated with potentially large biases when estimating ρ̄ and even larger biases when estimating SDρ. Using only existing correlations often substantially outperformed use of BEPs and virtually never performed worse than BEPs. Overall, the authors urge a return to the standard practice of using only existing correlations in meta-analysis. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  6. Performance of genotype imputation for low frequency and rare variants from the 1000 genomes.

    Science.gov (United States)

    Zheng, Hou-Feng; Rong, Jing-Jing; Liu, Ming; Han, Fang; Zhang, Xing-Wei; Richards, J Brent; Wang, Li

    2015-01-01

    Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF ≤ 0.3%), only 0-1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute.

  7. Geographical Tatoos

    Directory of Open Access Journals (Sweden)

    Valéria Cazetta

    2014-08-01

    Full Text Available The article deals with maps tattooed on bodies. My interest in studying the corporeality is inserted in a broader project entitled Geographies and (in Bodies. There is several published research on tattoos, but none in particular about tattooed maps. However some of these works interested me because they present important discussions in contemporary about body modification that helped me locate the body modifications most within the culture than on the nature. At this time, I looked at pictures of geographical tattoos available in several sites of the internet.

  8. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    Science.gov (United States)

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Imputation of genotypes in Danish two-way crossbred pigs using low density panels

    DEFF Research Database (Denmark)

    Xiang, Tao; Christensen, Ole Fredslund; Legarra, Andres

    Genotype imputation is commonly used as an initial step of genomic selection. Studies on humans, plants and ruminants suggested many factors would affect the performance of imputation. However, studies rarely investigated pigs, especially crossbred pigs. In this study, different scenarios...... of imputation from 5K SNPs to 7K SNPs on Danish Landrace, Yorkshire, and crossbred Landrace-Yorkshire were compared. In conclusion, genotype imputation on crossbreds performs equally well as in purebreds, when parental breeds are used as the reference panel. When the size of reference is considerably large...... SNPs. This dataset will be analyzed for genomic selection in a future study...

  10. Improved Correction of Misclassification Bias With Bootstrap Imputation.

    Science.gov (United States)

    van Walraven, Carl

    2018-07-01

    Diagnostic codes used in administrative database research can create bias due to misclassification. Quantitative bias analysis (QBA) can correct for this bias, requires only code sensitivity and specificity, but may return invalid results. Bootstrap imputation (BI) can also address misclassification bias but traditionally requires multivariate models to accurately estimate disease probability. This study compared misclassification bias correction using QBA and BI. Serum creatinine measures were used to determine severe renal failure status in 100,000 hospitalized patients. Prevalence of severe renal failure in 86 patient strata and its association with 43 covariates was determined and compared with results in which renal failure status was determined using diagnostic codes (sensitivity 71.3%, specificity 96.2%). Differences in results (misclassification bias) were then corrected with QBA or BI (using progressively more complex methods to estimate disease probability). In total, 7.4% of patients had severe renal failure. Imputing disease status with diagnostic codes exaggerated prevalence estimates [median relative change (range), 16.6% (0.8%-74.5%)] and its association with covariates [median (range) exponentiated absolute parameter estimate difference, 1.16 (1.01-2.04)]. QBA produced invalid results 9.3% of the time and increased bias in estimates of both disease prevalence and covariate associations. BI decreased misclassification bias with increasingly accurate disease probability estimates. QBA can produce invalid results and increase misclassification bias. BI avoids invalid results and can importantly decrease misclassification bias when accurate disease probability estimates are used.

  11. Estimating Stand Height and Tree Density in Pinus taeda plantations using in-situ data, airborne LiDAR and k-Nearest Neighbor Imputation

    Directory of Open Access Journals (Sweden)

    CARLOS ALBERTO SILVA

    Full Text Available ABSTRACT Accurate forest inventory is of great economic importance to optimize the entire supply chain management in pulp and paper companies. The aim of this study was to estimate stand dominate and mean heights (HD and HM and tree density (TD of Pinus taeda plantations located in South Brazil using in-situ measurements, airborne Light Detection and Ranging (LiDAR data and the non- k-nearest neighbor (k-NN imputation. Forest inventory attributes and LiDAR derived metrics were calculated at 53 regular sample plots and we used imputation models to retrieve the forest attributes at plot and landscape-levels. The best LiDAR-derived metrics to predict HD, HM and TD were H99TH, HSD, SKE and HMIN. The Imputation model using the selected metrics was more effective for retrieving height than tree density. The model coefficients of determination (adj.R2 and a root mean squared difference (RMSD for HD, HM and TD were 0.90, 0.94, 0.38m and 6.99, 5.70, 12.92%, respectively. Our results show that LiDAR and k-NN imputation can be used to predict stand heights with high accuracy in Pinus taeda. However, furthers studies need to be realized to improve the accuracy prediction of TD and to evaluate and compare the cost of acquisition and processing of LiDAR data against the conventional inventory procedures.

  12. Estimating Stand Height and Tree Density in Pinus taeda plantations using in-situ data, airborne LiDAR and k-Nearest Neighbor Imputation.

    Science.gov (United States)

    Silva, Carlos Alberto; Klauberg, Carine; Hudak, Andrew T; Vierling, Lee A; Liesenberg, Veraldo; Bernett, Luiz G; Scheraiber, Clewerson F; Schoeninger, Emerson R

    2018-01-01

    Accurate forest inventory is of great economic importance to optimize the entire supply chain management in pulp and paper companies. The aim of this study was to estimate stand dominate and mean heights (HD and HM) and tree density (TD) of Pinus taeda plantations located in South Brazil using in-situ measurements, airborne Light Detection and Ranging (LiDAR) data and the non- k-nearest neighbor (k-NN) imputation. Forest inventory attributes and LiDAR derived metrics were calculated at 53 regular sample plots and we used imputation models to retrieve the forest attributes at plot and landscape-levels. The best LiDAR-derived metrics to predict HD, HM and TD were H99TH, HSD, SKE and HMIN. The Imputation model using the selected metrics was more effective for retrieving height than tree density. The model coefficients of determination (adj.R2) and a root mean squared difference (RMSD) for HD, HM and TD were 0.90, 0.94, 0.38m and 6.99, 5.70, 12.92%, respectively. Our results show that LiDAR and k-NN imputation can be used to predict stand heights with high accuracy in Pinus taeda. However, furthers studies need to be realized to improve the accuracy prediction of TD and to evaluate and compare the cost of acquisition and processing of LiDAR data against the conventional inventory procedures.

  13. Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets

    Directory of Open Access Journals (Sweden)

    Min-Wei Huang

    2018-01-01

    Full Text Available Many real-world medical datasets contain some proportion of missing (attribute values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.

  14. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    NARCIS (Netherlands)

    I. Tachmazidou (Ioanna); Süveges, D. (Dániel); J. Min (Josine); G.R.S. Ritchie (Graham R.S.); Steinberg, J. (Julia); K. Walter (Klaudia); V. Iotchkova (Valentina); J.A. Schwartzentruber (Jeremy); J. Huang (Jian); Y. Memari (Yasin); McCarthy, S. (Shane); Crawford, A.A. (Andrew A.); C. Bombieri (Cristina); M. Cocca (Massimiliano); A.-E. Farmaki (Aliki-Eleni); T.R. Gaunt (Tom); P. Jousilahti (Pekka); M.N. Kooijman (Marjolein ); Lehne, B. (Benjamin); G. Malerba (Giovanni); S. Männistö (Satu); A. Matchan (Angela); M.C. Medina-Gomez (Carolina); S. Metrustry (Sarah); A. Nag (Abhishek); I. Ntalla (Ioanna); L. Paternoster (Lavinia); N.W. Rayner (Nigel William); C. Sala (Cinzia); W.R. Scott (William R.); H.A. Shihab (Hashem A.); L. Southam (Lorraine); B. St Pourcain (Beate); M. Traglia (Michela); K. Trajanoska (Katerina); Zaza, G. (Gialuigi); W. Zhang (Weihua); M.S. Artigas; Bansal, N. (Narinder); M. Benn (Marianne); Chen, Z. (Zhongsheng); P. Danecek (Petr); Lin, W.-Y. (Wei-Yu); A. Locke (Adam); J. Luan (Jian'An); A.K. Manning (Alisa); Mulas, A. (Antonella); C. Sidore (Carlo); A. Tybjaerg-Hansen; A. Varbo (Anette); M. Zoledziewska (Magdalena); C. Finan (Chris); Hatzikotoulas, K. (Konstantinos); A.E. Hendricks (Audrey E.); J.P. Kemp (John); A. Moayyeri (Alireza); Panoutsopoulou, K. (Kalliope); Szpak, M. (Michal); S.G. Wilson (Scott); M. Boehnke (Michael); F. Cucca (Francesco); Di Angelantonio, E. (Emanuele); C. Langenberg (Claudia); C.M. Lindgren (Cecilia M.); McCarthy, M.I. (Mark I.); A.P. Morris (Andrew); B.G. Nordestgaard (Børge); R.A. Scott (Robert); M.D. Tobin (Martin); N.J. Wareham (Nick); P.R. Burton (Paul); J.C. Chambers (John); Smith, G.D. (George Davey); G.V. Dedoussis (George); J.F. Felix (Janine); O.H. Franco (Oscar); Gambaro, G. (Giovanni); P. Gasparini (Paolo); C.J. Hammond (Christopher J.); A. Hofman (Albert); V.W.V. Jaddoe (Vincent); M.E. Kleber (Marcus); J.S. Kooner (Jaspal S.); M. Perola (Markus); C.L. Relton (Caroline); S.M. Ring (Susan); F. Rivadeneira Ramirez (Fernando); V. Salomaa (Veikko); T.D. Spector (Timothy); O. Stegle (Oliver); D. Toniolo (Daniela); A.G. Uitterlinden (André); I.E. Barroso (Inês); C.M.T. Greenwood (Celia); Perry, J.R.B. (John R.B.); Walker, B.R. (Brian R.); A.S. Butterworth (Adam); Y. Xue (Yali); R. Durbin (Richard); K.S. Small (Kerrin); N. Soranzo (Nicole); N.J. Timpson (Nicholas); E. Zeggini (Eleftheria)

    2016-01-01

    textabstractDeep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the

  15. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    DEFF Research Database (Denmark)

    Tachmazidou, Ioanna; Süveges, Dániel; Min, Josine L

    2017-01-01

    Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader alleli...

  16. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Science.gov (United States)

    2010-10-01

    ... money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS AND... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the representative...

  17. Rapid discrimination of geographical origin and evaluation of antioxidant activity of Salvia miltiorrhiza var. alba by Fourier transform near infrared spectroscopy.

    Science.gov (United States)

    Duan, Xiaoju; Zhang, Danlu; Nie, Lei; Zang, Hengchang

    2014-03-25

    Radix Salvia miltiorrhiza Bge. var. alba C.Y. Wu and H.W. Li and Radix S. miltrorrhiza belong to the same genus. S. miltiorrhiza var. alba has a unique effectiveness for thromboangiitis besides therapeutical efficay of S. miltrorrhiza. It exhibits antioxidant activity (AA), while its quality and efficacy also vary with geographic locations. Therefore, a rapid and nondestructive method based on Fourier transform near infrared spectroscopy (FT-NIRS) was developed for discrimination of geographical origin and evaluation of AA of S. miltiorrhiza var. alba. The discrimination of geographical origin was achieved by using discriminant analysis and the accuracy was 100%. Partial least squares (PLS) regression was employed to establish the model for evaluation of AA by NIRS. The spectral regions were selected by interval PLS (i-PLS) method. Different pre-treated methods were compared for the spectral pre-processing. The final optimal results of PLS model showed that correlation coefficients in the calibration set (Rc) and the prediction set (Rp), root mean square error of prediction (RMSEP) and residual prediction deviation (RPD) were 0.974, 0.950, 0.163 mg mL(-1) and 2.66, respectively. The results demonstrated that NIRs combined with chemometric methods could be a rapid and nondestructive tool to discriminate geographical origin and evaluate AA of S. miltiorrhiza var. alba. The developed NIRS method might have a potential application to high-throughput screening of a great number of raw S. miltiorrhiza var. alba samples for AA. Copyright © 2013 Elsevier B.V. All rights reserved.

  18. [Imputing missing data in public health: general concepts and application to dichotomous variables].

    Science.gov (United States)

    Hernández, Gilma; Moriña, David; Navarro, Albert

    The presence of missing data in collected variables is common in health surveys, but the subsequent imputation thereof at the time of analysis is not. Working with imputed data may have certain benefits regarding the precision of the estimators and the unbiased identification of associations between variables. The imputation process is probably still little understood by many non-statisticians, who view this process as highly complex and with an uncertain goal. To clarify these questions, this note aims to provide a straightforward, non-exhaustive overview of the imputation process to enable public health researchers ascertain its strengths. All this in the context of dichotomous variables which are commonplace in public health. To illustrate these concepts, an example in which missing data is handled by means of simple and multiple imputation is introduced. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.

  19. Imputing data that are missing at high rates using a boosting algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Cauthen, Katherine Regina [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Lambert, Gregory [Apple Inc., Cupertino, CA (United States); Ray, Jaideep [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Lefantzi, Sophia [Sandia National Lab. (SNL-CA), Livermore, CA (United States)

    2016-09-01

    Traditional multiple imputation approaches may perform poorly for datasets with high rates of missingness unless many m imputations are used. This paper implements an alternative machine learning-based approach to imputing data that are missing at high rates. Here, we use boosting to create a strong learner from a weak learner fitted to a dataset missing many observations. This approach may be applied to a variety of types of learners (models). The approach is demonstrated by application to a spatiotemporal dataset for predicting dengue outbreaks in India from meteorological covariates. A Bayesian spatiotemporal CAR model is boosted to produce imputations, and the overall RMSE from a k-fold cross-validation is used to assess imputation accuracy.

  20. Nonparametric autocovariance estimation from censored time series by Gaussian imputation.

    Science.gov (United States)

    Park, Jung Wook; Genton, Marc G; Ghosh, Sujit K

    2009-02-01

    One of the most frequently used methods to model the autocovariance function of a second-order stationary time series is to use the parametric framework of autoregressive and moving average models developed by Box and Jenkins. However, such parametric models, though very flexible, may not always be adequate to model autocovariance functions with sharp changes. Furthermore, if the data do not follow the parametric model and are censored at a certain value, the estimation results may not be reliable. We develop a Gaussian imputation method to estimate an autocovariance structure via nonparametric estimation of the autocovariance function in order to address both censoring and incorrect model specification. We demonstrate the effectiveness of the technique in terms of bias and efficiency with simulations under various rates of censoring and underlying models. We describe its application to a time series of silicon concentrations in the Arctic.

  1. Differential network analysis with multiply imputed lipidomic data.

    Directory of Open Access Journals (Sweden)

    Maiju Kujala

    Full Text Available The importance of lipids for cell function and health has been widely recognized, e.g., a disorder in the lipid composition of cells has been related to atherosclerosis caused cardiovascular disease (CVD. Lipidomics analyses are characterized by large yet not a huge number of mutually correlated variables measured and their associations to outcomes are potentially of a complex nature. Differential network analysis provides a formal statistical method capable of inferential analysis to examine differences in network structures of the lipids under two biological conditions. It also guides us to identify potential relationships requiring further biological investigation. We provide a recipe to conduct permutation test on association scores resulted from partial least square regression with multiple imputed lipidomic data from the LUdwigshafen RIsk and Cardiovascular Health (LURIC study, particularly paying attention to the left-censored missing values typical for a wide range of data sets in life sciences. Left-censored missing values are low-level concentrations that are known to exist somewhere between zero and a lower limit of quantification. To make full use of the LURIC data with the missing values, we utilize state of the art multiple imputation techniques and propose solutions to the challenges that incomplete data sets bring to differential network analysis. The customized network analysis helps us to understand the complexities of the underlying biological processes by identifying lipids and lipid classes that interact with each other, and by recognizing the most important differentially expressed lipids between two subgroups of coronary artery disease (CAD patients, the patients that had a fatal CVD event and the ones who remained stable during two year follow-up.

  2. A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods.

    Science.gov (United States)

    Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A

    2015-12-01

    Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.

  3. Increasing imputation and prediction accuracy for Chinese Holsteins using joint Chinese-Nordic reference population

    DEFF Research Database (Denmark)

    Ma, Peipei; Lund, Mogens Sandø; Ding, X

    2015-01-01

    This study investigated the effect of including Nordic Holsteins in the reference population on the imputation accuracy and prediction accuracy for Chinese Holsteins. The data used in this study include 85 Chinese Holstein bulls genotyped with both 54K chip and 777K (HD) chip, 2862 Chinese cows...... was improved slightly when using the marker data imputed based on the combined HD reference data, compared with using the marker data imputed based on the Chinese HD reference data only. On the other hand, when using the combined reference population including 4398 Nordic Holstein bulls, the accuracy...... to increase reference population rather than increasing marker density...

  4. Introductory comments on the USGS geographic applications program

    Science.gov (United States)

    Gerlach, A. C.

    1970-01-01

    The third phase of remote sensing technologies and potentials applied to the operations of the U.S. Geological Survey is introduced. Remote sensing data with multidisciplinary spatial data from traditional sources is combined with geographic theory and techniques of environmental modeling. These combined imputs are subject to four sequential activities that involve: (1) thermatic mapping of land use and environmental factors; (2) the dynamics of change detection; (3) environmental surveillance to identify sudden changes and general trends; and (4) preparation of statistical model and analytical reports. Geography program functions, products, clients, and goals are presented in graphical form, along with aircraft photo missions, geography test sites, and FY-70.

  5. Evaluation of the Geographical and Family Background of Student Nurses and Midwives and their Knowledge of Cancer and Nutrition.

    Science.gov (United States)

    Turkistanli, Esin Ceber; Ergun, Fisun Enuzun; Sari, Dilek; Dalli, Dilek; Aydemir, Gulsun

    2002-01-01

    Plant foods are the custodians of numerous dietary constituents, including vitamins, minerals, fibre, and other potentially anticarcinogenic agents. Eating habits are influenced by many biological, social, psychological, and cultural factors. Despite the relative paucity of definite evidence relevant to prevention in cancer and the tools available for early detection of cancer, people should be informed about the protective factors (dietary influence, life-style and exercise) continuously to develop new habits which will protect against cancer. A descriptive study was here designed to examine the effects of geographical and family background on nutrition of nursing students and their knowledge of recommended dietary guidelines for health promotion and cancer prevention. Most of students and their families lived in Aegean and Marmara regions, and in general they regularly consumed vegetables, fruits and cereals. Fresh vegetable and fruit consumption is rather high in Thrace, Aegean, Marmara and Mediterranean regions of Turkey. Students were found to be well informed during courses on dietary guidelines for health promotion and cancer prevention. The greatest promise for cancer prevention rests on our ability to change multiple and often interrelated behaviours that have been shown to increase the risk of cancer.

  6. RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

    KAUST Repository

    Kim, Ji-Sung; Gao, Xin; Rzhetsky, Andrey

    2018-01-01

    are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race

  7. Flexible Modeling of Survival Data with Covariates Subject to Detection Limits via Multiple Imputation.

    Science.gov (United States)

    Bernhardt, Paul W; Wang, Huixia Judy; Zhang, Daowen

    2014-01-01

    Models for survival data generally assume that covariates are fully observed. However, in medical studies it is not uncommon for biomarkers to be censored at known detection limits. A computationally-efficient multiple imputation procedure for modeling survival data with covariates subject to detection limits is proposed. This procedure is developed in the context of an accelerated failure time model with a flexible seminonparametric error distribution. The consistency and asymptotic normality of the multiple imputation estimator are established and a consistent variance estimator is provided. An iterative version of the proposed multiple imputation algorithm that approximates the EM algorithm for maximum likelihood is also suggested. Simulation studies demonstrate that the proposed multiple imputation methods work well while alternative methods lead to estimates that are either biased or more variable. The proposed methods are applied to analyze the dataset from a recently-conducted GenIMS study.

  8. Comparison of three boosting methods in parent-offspring trios for genotype imputation using simulation study

    Directory of Open Access Journals (Sweden)

    Abbas Mikhchi

    2016-01-01

    Full Text Available Abstract Background Genotype imputation is an important process of predicting unknown genotypes, which uses reference population with dense genotypes to predict missing genotypes for both human and animal genetic variations at a low cost. Machine learning methods specially boosting methods have been used in genetic studies to explore the underlying genetic profile of disease and build models capable of predicting missing values of a marker. Methods In this study strategies and factors affecting the imputation accuracy of parent-offspring trios compared from lower-density SNP panels (5 K to high density (10 K SNP panel using three different Boosting methods namely TotalBoost (TB, LogitBoost (LB and AdaBoost (AB. The methods employed using simulated data to impute the un-typed SNPs in parent-offspring trios. Four different datasets of G1 (100 trios with 5 k SNPs, G2 (100 trios with 10 k SNPs, G3 (500 trios with 5 k SNPs, and G4 (500 trio with 10 k SNPs were simulated. In four datasets all parents were genotyped completely, and offspring genotyped with a lower density panel. Results Comparison of the three methods for imputation showed that the LB outperformed AB and TB for imputation accuracy. The time of computation were different between methods. The AB was the fastest algorithm. The higher SNP densities resulted the increase of the accuracy of imputation. Larger trios (i.e. 500 was better for performance of LB and TB. Conclusions The conclusion is that the three methods do well in terms of imputation accuracy also the dense chip is recommended for imputation of parent-offspring trios.

  9. Simple nuclear norm based algorithms for imputing missing data and forecasting in time series

    OpenAIRE

    Butcher, Holly Louise; Gillard, Jonathan William

    2017-01-01

    There has been much recent progress on the use of the nuclear norm for the so-called matrix completion problem (the problem of imputing missing values of a matrix). In this paper we investigate the use of the nuclear norm for modelling time series, with particular attention to imputing missing data and forecasting. We introduce a simple alternating projections type algorithm based on the nuclear norm for these tasks, and consider a number of practical examples.

  10. Missing value imputation for microarray gene expression data using histone acetylation information

    Directory of Open Access Journals (Sweden)

    Feng Jihua

    2008-05-01

    Full Text Available Abstract Background It is an important pre-processing step to accurately estimate missing values in microarray data, because complete datasets are required in numerous expression profile analysis in bioinformatics. Although several methods have been suggested, their performances are not satisfactory for datasets with high missing percentages. Results The paper explores the feasibility of doing missing value imputation with the help of gene regulatory mechanism. An imputation framework called histone acetylation information aided imputation method (HAIimpute method is presented. It incorporates the histone acetylation information into the conventional KNN(k-nearest neighbor and LLS(local least square imputation algorithms for final prediction of the missing values. The experimental results indicated that the use of acetylation information can provide significant improvements in microarray imputation accuracy. The HAIimpute methods consistently improve the widely used methods such as KNN and LLS in terms of normalized root mean squared error (NRMSE. Meanwhile, the genes imputed by HAIimpute methods are more correlated with the original complete genes in terms of Pearson correlation coefficients. Furthermore, the proposed methods also outperform GOimpute, which is one of the existing related methods that use the functional similarity as the external information. Conclusion We demonstrated that the using of histone acetylation information could greatly improve the performance of the imputation especially at high missing percentages. This idea can be generalized to various imputation methods to facilitate the performance. Moreover, with more knowledge accumulated on gene regulatory mechanism in addition to histone acetylation, the performance of our approach can be further improved and verified.

  11. The utility of imputed matched sets. Analyzing probabilistically linked databases in a low information setting.

    Science.gov (United States)

    Thomas, A M; Cook, L J; Dean, J M; Olson, L M

    2014-01-01

    To compare results from high probability matched sets versus imputed matched sets across differing levels of linkage information. A series of linkages with varying amounts of available information were performed on two simulated datasets derived from multiyear motor vehicle crash (MVC) and hospital databases, where true matches were known. Distributions of high probability and imputed matched sets were compared against the true match population for occupant age, MVC county, and MVC hour. Regression models were fit to simulated log hospital charges and hospitalization status. High probability and imputed matched sets were not significantly different from occupant age, MVC county, and MVC hour in high information settings (p > 0.999). In low information settings, high probability matched sets were significantly different from occupant age and MVC county (p sets were not (p > 0.493). High information settings saw no significant differences in inference of simulated log hospital charges and hospitalization status between the two methods. High probability and imputed matched sets were significantly different from the outcomes in low information settings; however, imputed matched sets were more robust. The level of information available to a linkage is an important consideration. High probability matched sets are suitable for high to moderate information settings and for situations involving case-specific analysis. Conversely, imputed matched sets are preferable for low information settings when conducting population-based analyses.

  12. Missing Value Imputation Based on Gaussian Mixture Model for the Internet of Things

    Directory of Open Access Journals (Sweden)

    Xiaobo Yan

    2015-01-01

    Full Text Available This paper addresses missing value imputation for the Internet of Things (IoT. Nowadays, the IoT has been used widely and commonly by a variety of domains, such as transportation and logistics domain and healthcare domain. However, missing values are very common in the IoT for a variety of reasons, which results in the fact that the experimental data are incomplete. As a result of this, some work, which is related to the data of the IoT, can’t be carried out normally. And it leads to the reduction in the accuracy and reliability of the data analysis results. This paper, for the characteristics of the data itself and the features of missing data in IoT, divides the missing data into three types and defines three corresponding missing value imputation problems. Then, we propose three new models to solve the corresponding problems, and they are model of missing value imputation based on context and linear mean (MCL, model of missing value imputation based on binary search (MBS, and model of missing value imputation based on Gaussian mixture model (MGI. Experimental results showed that the three models can improve the accuracy, reliability, and stability of missing value imputation greatly and effectively.

  13. Outdoor pilot-scale cultivation of Spirulina sp. LEB-18 in different geographic locations for evaluating its growth and chemical composition.

    Science.gov (United States)

    de Jesus, Cristiane Santos; da Silva Uebel, Lívia; Costa, Samantha Serra; Miranda, Andréa Lobo; de Morais, Etiele Greque; de Morais, Michele Greque; Costa, Jorge Alberto Vieira; Nunes, Itaciara Larroza; de Souza Ferreira, Ederlan; Druzian, Janice Izabel

    2018-05-01

    This study evaluated whether outdoor cultivation of Spirulina sp. in different geographical locations affected its growth and biomass quality, with respect to the chemical composition, volatile compound and heavy metal content, and thermal stability. The positive effect of solar radiation and temperature on biomass productivity in Spirulina sp. cultivated in the northeast was directly related to its improved nutritional characteristics, which occurred with an increase in protein, phycocyanin, and polyunsaturated fatty acid (mainly γ-linolenic) content. The biomass produced in Northeast and South Brazil showed high thermal stability and had volatile compounds that could be used as biomarkers of Spirulina, and their parameters were within the limits of internationally recognized standards for food additives; hence, they have been considered safe foods. However, the growth of crops in south Brazil occurred at lower rates due to low temperatures and luminous intensities, indicative of the robustness of microalgae in relation to these parameters. Copyright © 2018 Elsevier Ltd. All rights reserved.

  14. A preliminary evaluation of environmental indexes of great hydropower plants localized in various Brazilian states and geographical aspects

    International Nuclear Information System (INIS)

    Caetano de Souza, Antonio Carlos

    2009-01-01

    The predominantly tropical climate and the predominance of plateaus in Brazil contributed to the development of a high hydroelectric potential, which determined the choice of hydropower plants as main technology of electricity generation. Though this is a renewable source, dams must being established to guarantee high amount of water all over the year generating high environmental and social impact. One of the ways to evaluate environmental impacts caused by large hydropower plants is adoption of environmental indexes which are formed by ratio of installed or firm power with dam area of a hydropower plant. The objective of this study is to evaluate the impact caused by these dams through the use of environmental indexes. Statistical instruments were utilized to evaluate environmental indexes in the five Brazilian regions, twenty six states, and fifteen main rivers (where at least three large hydropower plants are encountered). The periods when each hydropower plant operation was initiated were also considered. In this study, the greatest media values were found in South, Southeast, and Northeast regions respectively, and the smallest media values were found in North and Mid-West regions, respectively. More, the greatest encountered media indexes were also found in dams established in the 1950s. (author)

  15. Evaluating the long-term consequences of air pollution in early life: geographical correlations between coal consumption in 1951/1952 and current mortality in England and Wales.

    Science.gov (United States)

    Phillips, David I W; Osmond, Clive; Southall, Humphrey; Aucott, Paula; Jones, Alexander; Holgate, Stephen T

    2018-04-27

    To evaluate associations between early life air pollution and subsequent mortality. Geographical study. Local government districts within England and Wales. Routinely collected geographical data on the use of coal and related solid fuels in 1951-1952 were used as an index of air pollution. We evaluated the relationship between these data and both all-cause and disease-specific mortality among men and women aged 35-74 years in local government districts between 1993 and 2012. Domestic (household) coal consumption had the most powerful associations with mortality. There were strong correlations between domestic coal use and all-cause mortality (relative risk per SD increase in fuel use 1.124, 95% CI 1.123 to 1.126), and respiratory (1.238, 95% CI 1.234 to 1.242), cardiovascular (1.138, 95% CI 1.136 to 1.140) and cancer mortality (1.073, 95% CI 1.071 to 1.075). These effects persisted after adjustment for socioeconomic indicators in 1951, current socioeconomic indicators and current pollution levels. Coal was the major cause of pollution in the UK until the Clean Air Act of 1956 led to a rapid decline in consumption. These data suggest that coal-based pollution, experienced over 60 years ago in early life, affects human health now by increasing mortality from a wide variety of diseases. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  16. Comparison of different methods for imputing genome-wide marker genotypes in Swedish and Finnish Red Cattle

    DEFF Research Database (Denmark)

    Ma, Peipei; Brøndum, Rasmus Froberg; Qin, Zahng

    2013-01-01

    This study investigated the imputation accuracy of different methods, considering both the minor allele frequency and relatedness between individuals in the reference and test data sets. Two data sets from the combined population of Swedish and Finnish Red Cattle were used to test the influence...... coefficient was lower when the minor allele frequency was lower. The results indicate that Beagle and IMPUTE2 provide the most robust and accurate imputation accuracies, but considering computing time and memory usage, FImpute is another alternative method....

  17. Nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks.

    Science.gov (United States)

    Li, YuanYuan; Parker, Lynne E

    2014-01-01

    Missing data is common in Wireless Sensor Networks (WSNs), especially with multi-hop communications. There are many reasons for this phenomenon, such as unstable wireless communications, synchronization issues, and unreliable sensors. Unfortunately, missing data creates a number of problems for WSNs. First, since most sensor nodes in the network are battery-powered, it is too expensive to have the nodes retransmit missing data across the network. Data re-transmission may also cause time delays when detecting abnormal changes in an environment. Furthermore, localized reasoning techniques on sensor nodes (such as machine learning algorithms to classify states of the environment) are generally not robust enough to handle missing data. Since sensor data collected by a WSN is generally correlated in time and space, we illustrate how replacing missing sensor values with spatially and temporally correlated sensor values can significantly improve the network's performance. However, our studies show that it is important to determine which nodes are spatially and temporally correlated with each other. Simple techniques based on Euclidean distance are not sufficient for complex environmental deployments. Thus, we have developed a novel Nearest Neighbor (NN) imputation method that estimates missing data in WSNs by learning spatial and temporal correlations between sensor nodes. To improve the search time, we utilize a k d-tree data structure, which is a non-parametric, data-driven binary search tree. Instead of using traditional mean and variance of each dimension for k d-tree construction, and Euclidean distance for k d-tree search, we use weighted variances and weighted Euclidean distances based on measured percentages of missing data. We have evaluated this approach through experiments on sensor data from a volcano dataset collected by a network of Crossbow motes, as well as experiments using sensor data from a highway traffic monitoring application. Our experimental

  18. Dealing with missing data in a multi-question depression scale: a comparison of imputation methods

    Directory of Open Access Journals (Sweden)

    Stuart Heather

    2006-12-01

    Full Text Available Abstract Background Missing data present a challenge to many research projects. The problem is often pronounced in studies utilizing self-report scales, and literature addressing different strategies for dealing with missing data in such circumstances is scarce. The objective of this study was to compare six different imputation techniques for dealing with missing data in the Zung Self-reported Depression scale (SDS. Methods 1580 participants from a surgical outcomes study completed the SDS. The SDS is a 20 question scale that respondents complete by circling a value of 1 to 4 for each question. The sum of the responses is calculated and respondents are classified as exhibiting depressive symptoms when their total score is over 40. Missing values were simulated by randomly selecting questions whose values were then deleted (a missing completely at random simulation. Additionally, a missing at random and missing not at random simulation were completed. Six imputation methods were then considered; 1 multiple imputation, 2 single regression, 3 individual mean, 4 overall mean, 5 participant's preceding response, and 6 random selection of a value from 1 to 4. For each method, the imputed mean SDS score and standard deviation were compared to the population statistics. The Spearman correlation coefficient, percent misclassified and the Kappa statistic were also calculated. Results When 10% of values are missing, all the imputation methods except random selection produce Kappa statistics greater than 0.80 indicating 'near perfect' agreement. MI produces the most valid imputed values with a high Kappa statistic (0.89, although both single regression and individual mean imputation also produced favorable results. As the percent of missing information increased to 30%, or when unbalanced missing data were introduced, MI maintained a high Kappa statistic. The individual mean and single regression method produced Kappas in the 'substantial agreement' range

  19. Evaluation of heavy metals level (arsenic, nickel, mercury and lead effecting on health in drinking water resource of Kohgiluyeh county using geographic information system (GIS

    Directory of Open Access Journals (Sweden)

    Abdolazim Alinejad

    2016-08-01

    Full Text Available This study was conducted to determine the amount of heavy metals (Arsenic, Nickel, Mercury, and Lead in drinking water resource of Kohgiluyeh County using Geographic Information System (GIS. This cross-sectional study was conducted on drinking water resource of Kohgiluyeh County (33 water supplies and 4 heavy metals in 2013. 264 samples were analyzed in this study. The experiments were performed at the laboratory of Water and Wastewater Company based on Standard Method. The Atomic Adsorption was used to evaluate the amount of heavy metals. The results were mapping by Geographic Information System software (GIS 9.3 after processing of parameters. Finally, the data were analyzed by SPSS 16 and Excel 2007. The maximum amount of each heavy metal and its resource were shown as follow: Nickel or Ni (Source of w12, 124ppb, Arsenic or As (w33, 42 ppb, Mercury or Hg (w22 and w30, 96ppb, Lead or Pb (w21, 1553ppb. Also, the GIS maps showed that Lead in the central region was very high, Mercury and Arsenic in the northern region were high and Nickel in the eastern and western regions was high. The Kriging method and Gauss model were introduced as best method for interpolation of these metals. Since the concentration of these heavy metals was higher than standard levels in most drinking water supplies in Kohgiluyeh County and these high levels of heavy metals can cause the adverse effects on human health; therefore, the environmental and geological studies are necessary to identify the pollution resource and elimination and removal of heavy metals

  20. PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.

    Directory of Open Access Journals (Sweden)

    Oren E Livne

    2015-03-01

    Full Text Available Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm, a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs, from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.

  1. Multiple imputation by chained equations for systematically and sporadically missing multilevel data.

    Science.gov (United States)

    Resche-Rigon, Matthieu; White, Ian R

    2018-06-01

    In multilevel settings such as individual participant data meta-analysis, a variable is 'systematically missing' if it is wholly missing in some clusters and 'sporadically missing' if it is partly missing in some clusters. Previously proposed methods to impute incomplete multilevel data handle either systematically or sporadically missing data, but frequently both patterns are observed. We describe a new multiple imputation by chained equations (MICE) algorithm for multilevel data with arbitrary patterns of systematically and sporadically missing variables. The algorithm is described for multilevel normal data but can easily be extended for other variable types. We first propose two methods for imputing a single incomplete variable: an extension of an existing method and a new two-stage method which conveniently allows for heteroscedastic data. We then discuss the difficulties of imputing missing values in several variables in multilevel data using MICE, and show that even the simplest joint multilevel model implies conditional models which involve cluster means and heteroscedasticity. However, a simulation study finds that the proposed methods can be successfully combined in a multilevel MICE procedure, even when cluster means are not included in the imputation models.

  2. Geographical information systems

    DEFF Research Database (Denmark)

    Möller, Bernd

    2004-01-01

    The chapter gives an introduction to Geographical Information Systems (GIS) with particular focus on their application within environmental management.......The chapter gives an introduction to Geographical Information Systems (GIS) with particular focus on their application within environmental management....

  3. Evaluation of Refuge Life Risk using Geographical and Social Grid-Models with Satellite-Based House Ratio and Flood Depth by Tsunami Simulation

    Science.gov (United States)

    Kaneko, D.; Hosoyamada, T.

    2017-12-01

    The authors have developed social and geographical models for evaluating and applying life risk to the Kamakura coast near the south-western part of the metropolitan areas of Tokyo. The coastline close to the seismic center of the South Kanto earthquake is in the riskiest belt in the metropolitan area with a high possibility of house collapse and tsunami run-up. Kamakura is an important historical city, visited by many tourists who are not familiar with seismic dangers. There is a high probability of loss of human life during an evacuation of the city during tsunami waves. To evaluate the distribution of life risk characteristics in the area, models for citizens and sightseers are developed that includes social data such as population density, wooden-house ratio, and geographical evacuation distance and tsunami-flooding depth. The population of Kamakura City is 174,050 and the risk of tsunami evacuation is high in the area from the southern part of Kamakura Station to Zaimokuza block, where the population is approximately 15,310 people. There are about 26,000 tourists visiting this area on weekdays and about 100,000 sightseers visiting the area on Saturdays and Sundays. On weekdays the population per mesh will increase by half of the 2,000 inhabitants. On Saturdays and Sundays the population density will be 4 thousand who will double those of the inhabitants. A disaster prevention hill is proposed as a tsunami countermeasure on the coast of Kamakura City. The hill is covered by pine forest with a high-standard road, evacuation center, and sightseeing parking lots embedded in the hilly bank. In normal times, tourists and citizens use this area as a seaside pine park. Long concrete box structures strengthen the hill inside the mound, which has two levels, the lower equipped with high-standard-width roads on the ground level. The parking areas will resolve daily traffic congestion issues along the Kamakura main streets. The evaluation of over-flooding tsunamis and

  4. Geographic Media Literacy

    Science.gov (United States)

    Lukinbeal, Chris

    2014-01-01

    While the use of media permeates geographic research and pedagogic practice, the underlying literacies that link geography and media remain uncharted. This article argues that geographic media literacy incorporates visual literacy, information technology literacy, information literacy, and media literacy. Geographic media literacy is the ability…

  5. Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy.

    Science.gov (United States)

    Johnson, Eric O; Hancock, Dana B; Levy, Joshua L; Gaddis, Nathan C; Saccone, Nancy L; Bierut, Laura J; Page, Grier P

    2013-05-01

    A great promise of publicly sharing genome-wide association data is the potential to create composite sets of controls. However, studies often use different genotyping arrays, and imputation to a common set of SNPs has shown substantial bias: a problem which has no broadly applicable solution. Based on the idea that using differing genotyped SNP sets as inputs creates differential imputation errors and thus bias in the composite set of controls, we examined the degree to which each of the following occurs: (1) imputation based on the union of genotyped SNPs (i.e., SNPs available on one or more arrays) results in bias, as evidenced by spurious associations (type 1 error) between imputed genotypes and arbitrarily assigned case/control status; (2) imputation based on the intersection of genotyped SNPs (i.e., SNPs available on all arrays) does not evidence such bias; and (3) imputation quality varies by the size of the intersection of genotyped SNP sets. Imputations were conducted in European Americans and African Americans with reference to HapMap phase II and III data. Imputation based on the union of genotyped SNPs across the Illumina 1M and 550v3 arrays showed spurious associations for 0.2 % of SNPs: ~2,000 false positives per million SNPs imputed. Biases remained problematic for very similar arrays (550v1 vs. 550v3) and were substantial for dissimilar arrays (Illumina 1M vs. Affymetrix 6.0). In all instances, imputing based on the intersection of genotyped SNPs (as few as 30 % of the total SNPs genotyped) eliminated such bias while still achieving good imputation quality.

  6. Evaluation of location and number of aid post for sustainable humanitarian relief using agent based modeling (ABM) and geographic information system (GIS)

    Science.gov (United States)

    Khair, Fauzi; Sopha, Bertha Maya

    2017-12-01

    One of the crucial phases in disaster management is the response phase or the emergency response phase. It requires a sustainable system and a well-integrated management system. Any errors in the system on this phase will impact on significant increase of the victims number as well as material damage caused. Policies related to the location of aid posts are important decisions. The facts show that there are many failures in the process of providing assistance to the refugees due to lack of preparation and determination of facilities and aid post location. Therefore, this study aims to evaluate the number and location of aid posts on Merapi eruption in 2010. This study uses an integration between Agent Based Modeling (ABM) and Geographic Information System (GIS) about evaluation of the number and location of the aid post using some scenarios. The ABM approach aims to describe the agents behaviour (refugees and volunteers) in the event of a disaster with their respective characteristics. While the spatial data, GIS useful to describe real condition of the Sleman regency road. Based on the simulation result, it shows alternative scenarios that combine DERU UGM post, Maguwoharjo Stadium, Tagana Post and Pakem Main Post has better result in handling and distributing aid to evacuation barrack compared to initial scenario. Alternative scenarios indicates the unmet demands are less than the initial scenario.

  7. A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers

    Directory of Open Access Journals (Sweden)

    Concepción Crespo Turrado

    2015-12-01

    Full Text Available Nowadays, data collection is a key process in the study of electrical power networks when searching for harmonics and a lack of balance among phases. In this context, the lack of data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, and current in each phase and power factor adversely affects any time series study performed. When this occurs, a data imputation process must be accomplished in order to substitute the data that is missing for estimated values. This paper presents a novel missing data imputation method based on multivariate adaptive regression splines (MARS and compares it with the well-known technique called multivariate imputation by chained equations (MICE. The results obtained demonstrate how the proposed method outperforms the MICE algorithm.

  8. Time Series Imputation via L1 Norm-Based Singular Spectrum Analysis

    Science.gov (United States)

    Kalantari, Mahdi; Yarmohammadi, Masoud; Hassani, Hossein; Silva, Emmanuel Sirimal

    Missing values in time series data is a well-known and important problem which many researchers have studied extensively in various fields. In this paper, a new nonparametric approach for missing value imputation in time series is proposed. The main novelty of this research is applying the L1 norm-based version of Singular Spectrum Analysis (SSA), namely L1-SSA which is robust against outliers. The performance of the new imputation method has been compared with many other established methods. The comparison is done by applying them to various real and simulated time series. The obtained results confirm that the SSA-based methods, especially L1-SSA can provide better imputation in comparison to other methods.

  9. On multivariate imputation and forecasting of decadal wind speed missing data.

    Science.gov (United States)

    Wesonga, Ronald

    2015-01-01

    This paper demonstrates the application of multiple imputations by chained equations and time series forecasting of wind speed data. The study was motivated by the high prevalence of missing wind speed historic data. Findings based on the fully conditional specification under multiple imputations by chained equations, provided reliable wind speed missing data imputations. Further, the forecasting model shows, the smoothing parameter, alpha (0.014) close to zero, confirming that recent past observations are more suitable for use to forecast wind speeds. The maximum decadal wind speed for Entebbe International Airport was estimated to be 17.6 metres per second at a 0.05 level of significance with a bound on the error of estimation of 10.8 metres per second. The large bound on the error of estimations confirms the dynamic tendencies of wind speed at the airport under study.

  10. A suggested approach for imputation of missing dietary data for young children in daycare.

    Science.gov (United States)

    Stevens, June; Ou, Fang-Shu; Truesdale, Kimberly P; Zeng, Donglin; Vaughn, Amber E; Pratt, Charlotte; Ward, Dianne S

    2015-01-01

    Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult. The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method. Data were from children aged 2-5 years in the My Parenting SOS project (n=308; 870 24-h diet recalls). Mixed models were used to simultaneously predict breakfast, dinner, and evening snacks (B+D+ES); lunch; and daytime snacks for all children after adjusting for age, sex, and body mass index (BMI). From these models, we imputed the missing weekday daycare lunches by interpolation using the mean lunch to B+D+ES [L/(B+D+ES)] ratio among non-daycare children on weekdays and the L/(B+D+ES) ratio for all children on weekends. Daytime snack data were used to impute snacks. The reported mean (± standard deviation) weekday intake was lower for daycare children [725 (±324) kcal] compared to non-daycare children [1,048 (±463) kcal]. Weekend intake for all children was 1,173 (±427) kcal. After imputation, weekday caloric intake for daycare children was 1,230 (±409) kcal. Daily intakes that included imputed data were associated with age and sex but not with BMI. This work indicates that imputation is a promising method for improving the precision of daily nutrient data from young children.

  11. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

    Directory of Open Access Journals (Sweden)

    Ward Judson A

    2013-01-01

    Full Text Available Abstract Background Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry. Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Results Genotyping by Sequencing (GBS was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs linked these results to published maps for cross-validation and map comparison. Conclusions GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation

  12. A suggested approach for imputation of missing dietary data for young children in daycare

    Directory of Open Access Journals (Sweden)

    June Stevens

    2015-12-01

    Full Text Available Background: Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult. Objective: The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method. Design: Data were from children aged 2-5 years in the My Parenting SOS project (n=308; 870 24-h diet recalls. Mixed models were used to simultaneously predict breakfast, dinner, and evening snacks (B+D+ES; lunch; and daytime snacks for all children after adjusting for age, sex, and body mass index (BMI. From these models, we imputed the missing weekday daycare lunches by interpolation using the mean lunch to B+D+ES [L/(B+D+ES] ratio among non-daycare children on weekdays and the L/(B+D+ES ratio for all children on weekends. Daytime snack data were used to impute snacks. Results: The reported mean (± standard deviation weekday intake was lower for daycare children [725 (±324 kcal] compared to non-daycare children [1,048 (±463 kcal]. Weekend intake for all children was 1,173 (±427 kcal. After imputation, weekday caloric intake for daycare children was 1,230 (±409 kcal. Daily intakes that included imputed data were associated with age and sex but not with BMI. Conclusion: This work indicates that imputation is a promising method for improving the precision of daily nutrient data from young children.

  13. Analyzing the changing gender wage gap based on multiply imputed right censored wages

    OpenAIRE

    Gartner, Hermann; Rässler, Susanne

    2005-01-01

    "In order to analyze the gender wage gap with the German IAB-employment register we have to solve the problem of censored wages at the upper limit of the social security system. We treat this problem as a missing data problem. We regard the missingness mechanism as not missing at random (NMAR, according to Little and Rubin, 1987, 2002) as well as missing by design. The censored wages are multiply imputed by draws of a random variable from a truncated distribution. The multiple imputation is b...

  14. UniFIeD Univariate Frequency-based Imputation for Time Series Data

    OpenAIRE

    Friese, Martina; Stork, Jörg; Ramos Guerra, Ricardo; Bartz-Beielstein, Thomas; Thaker, Soham; Flasch, Oliver; Zaefferer, Martin

    2013-01-01

    This paper introduces UniFIeD, a new data preprocessing method for time series. UniFIeD can cope with large intervals of missing data. A scalable test function generator, which allows the simulation of time series with different gap sizes, is presented additionally. An experimental study demonstrates that (i) UniFIeD shows a significant better performance than simple imputation methods and (ii) UniFIeD is able to handle situations, where advanced imputation methods fail. The results are indep...

  15. Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis

    NARCIS (Netherlands)

    Eekhout, I.; Wiel, M.A. van de; Heymans, M.W.

    2017-01-01

    Background. Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels

  16. Geographic information systems as a tool for environmental evaluation of hydropower potential; Sistemas de informacoes geograficas como ferramenta para avaliacao ambiental de potenciais hidreletricos

    Energy Technology Data Exchange (ETDEWEB)

    Dzedzej, Maira; Correa, Fabio; Malta, Joao [IX Consultoria e Representacoes Ltda, Itajuba, MG (Brazil); Flauzino, Barbara Karoline [IX Consultoria e Representacoes Ltda, Itajuba, MG (Brazil); Universidade Federal de Itajuba (UNIFEI), MG (Brazil); Santos, Afonso Henriques Moreira [MS Consultoria Ltda, Itajuba, MG (Brazil); Universidade Federal de Itajuba (UNIFEI), MG (Brazil)

    2010-07-01

    The hydropower plants are responsible for much of the energy generated in the country, there is also a large hydro potential in Brazilian rivers. This form of power generation is considered renewable and fits into the concept of sustainable development, however, social and environmental impacts from the implementation of hydropower projects are known and widely discussed, especially when it comes to large plants. In this context, study the environmental analysis of potential hydropower was incorporated at various stages of the studies implementation, in order to, identify environmental factors and that will restrict or impede construction, to obtain the best option for the environment, evaluate the role and of social and environmental impacts, contribute to improving the design and functionality of the enterprises in order to reduce overall costs, minimize conflicts and assist in preserving the environment. To fulfill these functions to a satisfactory and reliable level, it the study has increasingly used the techniques, tools and applications of Geographic Information Systems in the process of environmental assessment, since they provide procurement, integration, visualization and data analysis of natural resources, its uses and protection, offering greater security and speed in decision making. This paper presents some applications of GIS in environmental assessment processes, developed mainly in the steps of estimating hydropower potential, hydropower inventory, basic design and environmental licensing. (author)

  17. Developing an analytical tool for evaluating EMS system design changes and their impact on cardiac arrest outcomes: combining geographic information systems with register data on survival rates

    Directory of Open Access Journals (Sweden)

    Sund Björn

    2013-02-01

    Full Text Available Abstract Background Out-of-hospital cardiac arrest (OHCA is a frequent and acute medical condition that requires immediate care. We estimate survival rates from OHCA in the area of Stockholm, through developing an analytical tool for evaluating Emergency Medical Services (EMS system design changes. The study also is an attempt to validate the proposed model used to generate the outcome measures for the study. Methods and results This was done by combining a geographic information systems (GIS simulation of driving times with register data on survival rates. The emergency resources comprised ambulance alone and ambulance plus fire services. The simulation model predicted a baseline survival rate of 3.9 per cent, and reducing the ambulance response time by one minute increased survival to 4.6 per cent. Adding the fire services as first responders (dual dispatch increased survival to 6.2 per cent from the baseline level. The model predictions were validated using empirical data. Conclusion We have presented an analytical tool that easily can be generalized to other regions or countries. The model can be used to predict outcomes of cardiac arrest prior to investment in EMS design changes that affect the alarm process, e.g. (1 static changes such as trimming the emergency call handling time or (2 dynamic changes such as location of emergency resources or which resources should carry a defibrillator.

  18. Relative efficiency of joint-model and full-conditional-specification multiple imputation when conditional models are compatible: The general location model.

    Science.gov (United States)

    Seaman, Shaun R; Hughes, Rachael A

    2018-06-01

    Estimating the parameters of a regression model of interest is complicated by missing data on the variables in that model. Multiple imputation is commonly used to handle these missing data. Joint model multiple imputation and full-conditional specification multiple imputation are known to yield imputed data with the same asymptotic distribution when the conditional models of full-conditional specification are compatible with that joint model. We show that this asymptotic equivalence of imputation distributions does not imply that joint model multiple imputation and full-conditional specification multiple imputation will also yield asymptotically equally efficient inference about the parameters of the model of interest, nor that they will be equally robust to misspecification of the joint model. When the conditional models used by full-conditional specification multiple imputation are linear, logistic and multinomial regressions, these are compatible with a restricted general location joint model. We show that multiple imputation using the restricted general location joint model can be substantially more asymptotically efficient than full-conditional specification multiple imputation, but this typically requires very strong associations between variables. When associations are weaker, the efficiency gain is small. Moreover, full-conditional specification multiple imputation is shown to be potentially much more robust than joint model multiple imputation using the restricted general location model to mispecification of that model when there is substantial missingness in the outcome variable.

  19. Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies

    Directory of Open Access Journals (Sweden)

    McElwee Joshua

    2009-06-01

    Full Text Available Abstract Background Although high-throughput genotyping arrays have made whole-genome association studies (WGAS feasible, only a small proportion of SNPs in the human genome are actually surveyed in such studies. In addition, various SNP arrays assay different sets of SNPs, which leads to challenges in comparing results and merging data for meta-analyses. Genome-wide imputation of untyped markers allows us to address these issues in a direct fashion. Methods 384 Caucasian American liver donors were genotyped using Illumina 650Y (Ilmn650Y arrays, from which we also derived genotypes from the Ilmn317K array. On these data, we compared two imputation methods: MACH and BEAGLE. We imputed 2.5 million HapMap Release22 SNPs, and conducted GWAS on ~40,000 liver mRNA expression traits (eQTL analysis. In addition, 200 Caucasian American and 200 African American subjects were genotyped using the Affymetrix 500 K array plus a custom 164 K fill-in chip. We then imputed the HapMap SNPs and quantified the accuracy by randomly masking observed SNPs. Results MACH and BEAGLE perform similarly with respect to imputation accuracy. The Ilmn650Y results in excellent imputation performance, and it outperforms Affx500K or Ilmn317K sets. For Caucasian Americans, 90% of the HapMap SNPs were imputed at 98% accuracy. As expected, imputation of poorly tagged SNPs (untyped SNPs in weak LD with typed markers was not as successful. It was more challenging to impute genotypes in the African American population, given (1 shorter LD blocks and (2 admixture with Caucasian populations in this population. To address issue (2, we pooled HapMap CEU and YRI data as an imputation reference set, which greatly improved overall performance. The approximate 40,000 phenotypes scored in these populations provide a path to determine empirically how the power to detect associations is affected by the imputation procedures. That is, at a fixed false discovery rate, the number of cis

  20. Airports Geographic Information System -

    Data.gov (United States)

    Department of Transportation — The Airports Geographic Information System maintains the airport and aeronautical data required to meet the demands of the Next Generation National Airspace System....

  1. Applying an efficient K-nearest neighbor search to forest attribute imputation

    Science.gov (United States)

    Andrew O. Finley; Ronald E. McRoberts; Alan R. Ek

    2006-01-01

    This paper explores the utility of an efficient nearest neighbor (NN) search algorithm for applications in multi-source kNN forest attribute imputation. The search algorithm reduces the number of distance calculations between a given target vector and each reference vector, thereby, decreasing the time needed to discover the NN subset. Results of five trials show gains...

  2. Limitations in Using Multiple Imputation to Harmonize Individual Participant Data for Meta-Analysis.

    Science.gov (United States)

    Siddique, Juned; de Chavez, Peter J; Howe, George; Cruden, Gracelyn; Brown, C Hendricks

    2018-02-01

    Individual participant data (IPD) meta-analysis is a meta-analysis in which the individual-level data for each study are obtained and used for synthesis. A common challenge in IPD meta-analysis is when variables of interest are measured differently in different studies. The term harmonization has been coined to describe the procedure of placing variables on the same scale in order to permit pooling of data from a large number of studies. Using data from an IPD meta-analysis of 19 adolescent depression trials, we describe a multiple imputation approach for harmonizing 10 depression measures across the 19 trials by treating those depression measures that were not used in a study as missing data. We then apply diagnostics to address the fit of our imputation model. Even after reducing the scale of our application, we were still unable to produce accurate imputations of the missing values. We describe those features of the data that made it difficult to harmonize the depression measures and provide some guidelines for using multiple imputation for harmonization in IPD meta-analysis.

  3. Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

    Science.gov (United States)

    Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

    2009-01-01

    Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....

  4. Mapping change of older forest with nearest-neighbor imputation and Landsat time-series

    Science.gov (United States)

    Janet L. Ohmann; Matthew J. Gregory; Heather M. Roberts; Warren B. Cohen; Robert E. Kennedy; Zhiqiang. Yang

    2012-01-01

    The Northwest Forest Plan (NWFP), which aims to conserve late-successional and old-growth forests (older forests) and associated species, established new policies on federal lands in the Pacific Northwest USA. As part of monitoring for the NWFP, we tested nearest-neighbor imputation for mapping change in older forest, defined by threshold values for forest attributes...

  5. Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data.

    Science.gov (United States)

    Rahman, Shah Atiqur; Huang, Yuxiao; Claassen, Jan; Heintzman, Nathaniel; Kleinberg, Samantha

    2015-12-01

    Most clinical and biomedical data contain missing values. A patient's record may be split across multiple institutions, devices may fail, and sensors may not be worn at all times. While these missing values are often ignored, this can lead to bias and error when the data are mined. Further, the data are not simply missing at random. Instead the measurement of a variable such as blood glucose may depend on its prior values as well as that of other variables. These dependencies exist across time as well, but current methods have yet to incorporate these temporal relationships as well as multiple types of missingness. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on three biological datasets (simulated and actual Type 1 diabetes datasets, and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Multiple imputation to account for missing data in a survey: estimating the prevalence of osteoporosis.

    Science.gov (United States)

    Kmetic, Andrew; Joseph, Lawrence; Berger, Claudie; Tenenhouse, Alan

    2002-07-01

    Nonresponse bias is a concern in any epidemiologic survey in which a subset of selected individuals declines to participate. We reviewed multiple imputation, a widely applicable and easy to implement Bayesian methodology to adjust for nonresponse bias. To illustrate the method, we used data from the Canadian Multicentre Osteoporosis Study, a large cohort study of 9423 randomly selected Canadians, designed in part to estimate the prevalence of osteoporosis. Although subjects were randomly selected, only 42% of individuals who were contacted agreed to participate fully in the study. The study design included a brief questionnaire for those invitees who declined further participation in order to collect information on the major risk factors for osteoporosis. These risk factors (which included age, sex, previous fractures, family history of osteoporosis, and current smoking status) were then used to estimate the missing osteoporosis status for nonparticipants using multiple imputation. Both ignorable and nonignorable imputation models are considered. Our results suggest that selection bias in the study is of concern, but only slightly, in very elderly (age 80+ years), both women and men. Epidemiologists should consider using multiple imputation more often than is current practice.

  7. Learning-Based Adaptive Imputation Methodwith kNN Algorithm for Missing Power Data

    Directory of Open Access Journals (Sweden)

    Minkyung Kim

    2017-10-01

    Full Text Available This paper proposes a learning-based adaptive imputation method (LAI for imputing missing power data in an energy system. This method estimates the missing power data by using the pattern that appears in the collected data. Here, in order to capture the patterns from past power data, we newly model a feature vector by using past data and its variations. The proposed LAI then learns the optimal length of the feature vector and the optimal historical length, which are significant hyper parameters of the proposed method, by utilizing intentional missing data. Based on a weighted distance between feature vectors representing a missing situation and past situation, missing power data are estimated by referring to the k most similar past situations in the optimal historical length. We further extend the proposed LAI to alleviate the effect of unexpected variation in power data and refer to this new approach as the extended LAI method (eLAI. The eLAI selects a method between linear interpolation (LI and the proposed LAI to improve accuracy under unexpected variations. Finally, from a simulation under various energy consumption profiles, we verify that the proposed eLAI achieves about a 74% reduction of the average imputation error in an energy system, compared to the existing imputation methods.

  8. Statistical Analysis of a Class: Monte Carlo and Multiple Imputation Spreadsheet Methods for Estimation and Extrapolation

    Science.gov (United States)

    Fish, Laurel J.; Halcoussis, Dennis; Phillips, G. Michael

    2017-01-01

    The Monte Carlo method and related multiple imputation methods are traditionally used in math, physics and science to estimate and analyze data and are now becoming standard tools in analyzing business and financial problems. However, few sources explain the application of the Monte Carlo method for individuals and business professionals who are…

  9. Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information

    Science.gov (United States)

    Poyatos, Rafael; Sus, Oliver; Badiella, Llorenç; Mencuccini, Maurizio; Martínez-Vilalta, Jordi

    2018-05-01

    The ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. At the same time, they offer specific challenges in terms of data imputation. Here we compare statistical imputation approaches, using varying levels of environmental information, for five plant traits (leaf biomass to sapwood area ratio, leaf nitrogen content, maximum tree height, leaf mass per area and wood density) in a spatially explicit plant trait dataset of temperate and Mediterranean tree species (Ecological and Forest Inventory of Catalonia, IEFC, dataset for Catalonia, north-east Iberian Peninsula, 31 900 km2). We simulated gaps at different missingness levels (10-80 %) in a complete trait matrix, and we used overall trait means, species means, k nearest neighbours (kNN), ordinary and regression kriging, and multivariate imputation using chained equations (MICE) to impute missing trait values. We assessed these methods in terms of their accuracy and of their ability to preserve trait distributions, multi-trait correlation structure and bivariate trait relationships. The relatively good performance of mean and species mean imputations in terms of accuracy masked a poor representation of trait distributions and multivariate trait structure. Species identity improved MICE imputations for all traits, whereas forest structure and topography improved imputations for some traits. No method performed best consistently for the five studied traits, but, considering all traits and performance metrics, MICE informed by relevant ecological variables gave the best results. However, at higher missingness (> 30 %), species mean imputations and regression kriging tended to outperform MICE for some traits. MICE informed by relevant ecological variables allowed us to fill the gaps in

  10. Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information

    Directory of Open Access Journals (Sweden)

    R. Poyatos

    2018-05-01

    Full Text Available The ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. At the same time, they offer specific challenges in terms of data imputation. Here we compare statistical imputation approaches, using varying levels of environmental information, for five plant traits (leaf biomass to sapwood area ratio, leaf nitrogen content, maximum tree height, leaf mass per area and wood density in a spatially explicit plant trait dataset of temperate and Mediterranean tree species (Ecological and Forest Inventory of Catalonia, IEFC, dataset for Catalonia, north-east Iberian Peninsula, 31 900 km2. We simulated gaps at different missingness levels (10–80 % in a complete trait matrix, and we used overall trait means, species means, k nearest neighbours (kNN, ordinary and regression kriging, and multivariate imputation using chained equations (MICE to impute missing trait values. We assessed these methods in terms of their accuracy and of their ability to preserve trait distributions, multi-trait correlation structure and bivariate trait relationships. The relatively good performance of mean and species mean imputations in terms of accuracy masked a poor representation of trait distributions and multivariate trait structure. Species identity improved MICE imputations for all traits, whereas forest structure and topography improved imputations for some traits. No method performed best consistently for the five studied traits, but, considering all traits and performance metrics, MICE informed by relevant ecological variables gave the best results. However, at higher missingness (> 30 %, species mean imputations and regression kriging tended to outperform MICE for some traits. MICE informed by relevant ecological variables

  11. Combination of individual tree detection and area-based approach in imputation of forest variables using airborne laser data

    Science.gov (United States)

    Vastaranta, Mikko; Kankare, Ville; Holopainen, Markus; Yu, Xiaowei; Hyyppä, Juha; Hyyppä, Hannu

    2012-01-01

    The two main approaches to deriving forest variables from laser-scanning data are the statistical area-based approach (ABA) and individual tree detection (ITD). With ITD it is feasible to acquire single tree information, as in field measurements. Here, ITD was used for measuring training data for the ABA. In addition to automatic ITD (ITD auto), we tested a combination of ITD auto and visual interpretation (ITD visual). ITD visual had two stages: in the first, ITD auto was carried out and in the second, the results of the ITD auto were visually corrected by interpreting three-dimensional laser point clouds. The field data comprised 509 circular plots ( r = 10 m) that were divided equally for testing and training. ITD-derived forest variables were used for training the ABA and the accuracies of the k-most similar neighbor ( k-MSN) imputations were evaluated and compared with the ABA trained with traditional measurements. The root-mean-squared error (RMSE) in the mean volume was 24.8%, 25.9%, and 27.2% with the ABA trained with field measurements, ITD auto, and ITD visual, respectively. When ITD methods were applied in acquiring training data, the mean volume, basal area, and basal area-weighted mean diameter were underestimated in the ABA by 2.7-9.2%. This project constituted a pilot study for using ITD measurements as training data for the ABA. Further studies are needed to reduce the bias and to determine the accuracy obtained in imputation of species-specific variables. The method could be applied in areas with sparse road networks or when the costs of fieldwork must be minimized.

  12. Estimation of Tree Lists from Airborne Laser Scanning Using Tree Model Clustering and k-MSN Imputation

    Directory of Open Access Journals (Sweden)

    Jörgen Wallerman

    2013-04-01

    Full Text Available Individual tree crowns may be delineated from airborne laser scanning (ALS data by segmentation of surface models or by 3D analysis. Segmentation of surface models benefits from using a priori knowledge about the proportions of tree crowns, which has not yet been utilized for 3D analysis to any great extent. In this study, an existing surface segmentation method was used as a basis for a new tree model 3D clustering method applied to ALS returns in 104 circular field plots with 12 m radius in pine-dominated boreal forest (64°14'N, 19°50'E. For each cluster below the tallest canopy layer, a parabolic surface was fitted to model a tree crown. The tree model clustering identified more trees than segmentation of the surface model, especially smaller trees below the tallest canopy layer. Stem attributes were estimated with k-Most Similar Neighbours (k-MSN imputation of the clusters based on field-measured trees. The accuracy at plot level from the k-MSN imputation (stem density root mean square error or RMSE 32.7%; stem volume RMSE 28.3% was similar to the corresponding results from the surface model (stem density RMSE 33.6%; stem volume RMSE 26.1% with leave-one-out cross-validation for one field plot at a time. Three-dimensional analysis of ALS data should also be evaluated in multi-layered forests since it identified a larger number of small trees below the tallest canopy layer.

  13. The use of multiple imputation for the accurate measurements of individual feed intake by electronic feeders.

    Science.gov (United States)

    Jiao, S; Tiezzi, F; Huang, Y; Gray, K A; Maltecca, C

    2016-02-01

    Obtaining accurate individual feed intake records is the key first step in achieving genetic progress toward more efficient nutrient utilization in pigs. Feed intake records collected by electronic feeding systems contain errors (erroneous and abnormal values exceeding certain cutoff criteria), which are due to feeder malfunction or animal-feeder interaction. In this study, we examined the use of a novel data-editing strategy involving multiple imputation to minimize the impact of errors and missing values on the quality of feed intake data collected by an electronic feeding system. Accuracy of feed intake data adjustment obtained from the conventional linear mixed model (LMM) approach was compared with 2 alternative implementations of multiple imputation by chained equation, denoted as MI (multiple imputation) and MICE (multiple imputation by chained equation). The 3 methods were compared under 3 scenarios, where 5, 10, and 20% feed intake error rates were simulated. Each of the scenarios was replicated 5 times. Accuracy of the alternative error adjustment was measured as the correlation between the true daily feed intake (DFI; daily feed intake in the testing period) or true ADFI (the mean DFI across testing period) and the adjusted DFI or adjusted ADFI. In the editing process, error cutoff criteria are used to define if a feed intake visit contains errors. To investigate the possibility that the error cutoff criteria may affect any of the 3 methods, the simulation was repeated with 2 alternative error cutoff values. Multiple imputation methods outperformed the LMM approach in all scenarios with mean accuracies of 96.7, 93.5, and 90.2% obtained with MI and 96.8, 94.4, and 90.1% obtained with MICE compared with 91.0, 82.6, and 68.7% using LMM for DFI. Similar results were obtained for ADFI. Furthermore, multiple imputation methods consistently performed better than LMM regardless of the cutoff criteria applied to define errors. In conclusion, multiple imputation

  14. The Effect of Geographic Units of Analysis on Measuring Geographic Variation in Medical Services Utilization

    Directory of Open Access Journals (Sweden)

    Agnus M. Kim

    2016-07-01

    Full Text Available Objectives: We aimed to evaluate the effect of geographic units of analysis on measuring geographic variation in medical services utilization. For this purpose, we compared geographic variations in the rates of eight major procedures in administrative units (districts and new areal units organized based on the actual health care use of the population in Korea. Methods: To compare geographic variation in geographic units of analysis, we calculated the age–sex standardized rates of eight major procedures (coronary artery bypass graft surgery, percutaneous transluminal coronary angioplasty, surgery after hip fracture, knee-replacement surgery, caesarean section, hysterectomy, computed tomography scan, and magnetic resonance imaging scan from the National Health Insurance database in Korea for the 2013 period. Using the coefficient of variation, the extremal quotient, and the systematic component of variation, we measured geographic variation for these eight procedures in districts and new areal units. Results: Compared with districts, new areal units showed a reduction in geographic variation. Extremal quotients and inter-decile ratios for the eight procedures were lower in new areal units. While the coefficient of variation was lower for most procedures in new areal units, the pattern of change of the systematic component of variation between districts and new areal units differed among procedures. Conclusions: Geographic variation in medical service utilization could vary according to the geographic unit of analysis. To determine how geographic characteristics such as population size and number of geographic units affect geographic variation, further studies are needed.

  15. Living at a Geographically Higher Elevation Is Associated with Lower Risk of Metabolic Syndrome: Prospective Analysis of the SUN Cohort

    Directory of Open Access Journals (Sweden)

    Amaya Lopez-Pascual

    2017-01-01

    Full Text Available Living in a geographically higher altitude affects oxygen availability. The possible connection between environmental factors and the development of metabolic syndrome (MetS feature is not fully understood, being the available epidemiological evidence still very limited. The aim of the present study was to evaluate the longitudinal association between altitude and incidence of MetS and each of its components in a prospective Spanish cohort, The Seguimiento Universidad de Navarra (SUN project. Our study included 6860 highly educated subjects (university graduates free from any MetS criteria at baseline. The altitude of residence was imputed with the postal code of each individual subject residence according to the data of the Spanish National Cartographic Institute and participants were categorized into tertiles. MetS was defined according to the harmonized definition. Cox proportional hazards models were used to assess the association between the altitude of residence and the risk of MetS during follow-up. After a median follow-up period of 10 years, 462 incident cases of MetS were identified. When adjusting for potential confounders, subjects in the highest category of altitude (>456 m exhibited a significantly lower risk of developing MetS compared to those in the lowest tertile (<122 m of altitude of residence [Model 2: Hazard ratio = 0.75 (95% Confidence interval: 0.58–0.97; p for trend = 0.029]. Living at geographically higher altitude was associated with a lower risk of developing MetS in the SUN project. Our findings suggest that geographical elevation may be an important factor linked to metabolic diseases.

  16. Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.

    Science.gov (United States)

    Sehgal, Muhammad Shoaib B; Gondal, Iqbal; Dooley, Laurence S

    2005-05-15

    Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods. The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE

  17. Factors associated with low birth weight in Nepal using multiple imputation

    Directory of Open Access Journals (Sweden)

    Usha Singh

    2017-02-01

    Full Text Available Abstract Background Survey data from low income countries on birth weight usually pose a persistent problem. The studies conducted on birth weight have acknowledged missing data on birth weight, but they are not included in the analysis. Furthermore, other missing data presented on determinants of birth weight are not addressed. Thus, this study tries to identify determinants that are associated with low birth weight (LBW using multiple imputation to handle missing data on birth weight and its determinants. Methods The child dataset from Nepal Demographic and Health Survey (NDHS, 2011 was utilized in this study. A total of 5,240 children were born between 2006 and 2011, out of which 87% had at least one measured variable missing and 21% had no recorded birth weight. All the analyses were carried out in R version 3.1.3. Transform-then impute method was applied to check for interaction between explanatory variables and imputed missing data. Survey package was applied to each imputed dataset to account for survey design and sampling method. Survey logistic regression was applied to identify the determinants associated with LBW. Results The prevalence of LBW was 15.4% after imputation. Women with the highest autonomy on their own health compared to those with health decisions involving husband or others (adjusted odds ratio (OR 1.87, 95% confidence interval (95% CI = 1.31, 2.67, and husband and women together (adjusted OR 1.57, 95% CI = 1.05, 2.35 were less likely to give birth to LBW infants. Mothers using highly polluting cooking fuels (adjusted OR 1.49, 95% CI = 1.03, 2.22 were more likely to give birth to LBW infants than mothers using non-polluting cooking fuels. Conclusion The findings of this study suggested that obtaining the prevalence of LBW from only the sample of measured birth weight and ignoring missing data results in underestimation.

  18. Inclusion of Population-specific Reference Panel from India to the 1000 Genomes Phase 3 Panel Improves Imputation Accuracy.

    Science.gov (United States)

    Ahmad, Meraj; Sinha, Anubhav; Ghosh, Sreya; Kumar, Vikrant; Davila, Sonia; Yajnik, Chittaranjan S; Chandak, Giriraj R

    2017-07-27

    Imputation is a computational method based on the principle of haplotype sharing allowing enrichment of genome-wide association study datasets. It depends on the haplotype structure of the population and density of the genotype data. The 1000 Genomes Project led to the generation of imputation reference panels which have been used globally. However, recent studies have shown that population-specific panels provide better enrichment of genome-wide variants. We compared the imputation accuracy using 1000 Genomes phase 3 reference panel and a panel generated from genome-wide data on 407 individuals from Western India (WIP). The concordance of imputed variants was cross-checked with next-generation re-sequencing data on a subset of genomic regions. Further, using the genome-wide data from 1880 individuals, we demonstrate that WIP works better than the 1000 Genomes phase 3 panel and when merged with it, significantly improves the imputation accuracy throughout the minor allele frequency range. We also show that imputation using only South Asian component of the 1000 Genomes phase 3 panel works as good as the merged panel, making it computationally less intensive job. Thus, our study stresses that imputation accuracy using 1000 Genomes phase 3 panel can be further improved by including population-specific reference panels from South Asia.

  19. Accuracy of hemoglobin A1c imputation using fasting plasma glucose in diabetes research using electronic health records data

    Directory of Open Access Journals (Sweden)

    Stanley Xu

    2014-05-01

    Full Text Available In studies that use electronic health record data, imputation of important data elements such as Glycated hemoglobin (A1c has become common. However, few studies have systematically examined the validity of various imputation strategies for missing A1c values. We derived a complete dataset using an incident diabetes population that has no missing values in A1c, fasting and random plasma glucose (FPG and RPG, age, and gender. We then created missing A1c values under two assumptions: missing completely at random (MCAR and missing at random (MAR. We then imputed A1c values, compared the imputed values to the true A1c values, and used these data to assess the impact of A1c on initiation of antihyperglycemic therapy. Under MCAR, imputation of A1c based on FPG 1 estimated a continuous A1c within ± 1.88% of the true A1c 68.3% of the time; 2 estimated a categorical A1c within ± one category from the true A1c about 50% of the time. Including RPG in imputation slightly improved the precision but did not improve the accuracy. Under MAR, including gender and age in addition to FPG improved the accuracy of imputed continuous A1c but not categorical A1c. Moreover, imputation of up to 33% of missing A1c values did not change the accuracy and precision and did not alter the impact of A1c on initiation of antihyperglycemic therapy. When using A1c values as a predictor variable, a simple imputation algorithm based only on age, sex, and fasting plasma glucose gave acceptable results.

  20. Comparison of results from different imputation techniques for missing data from an anti-obesity drug trial

    DEFF Research Database (Denmark)

    Jørgensen, Anders W.; Lundstrøm, Lars H; Wetterslev, Jørn

    2014-01-01

    BACKGROUND: In randomised trials of medical interventions, the most reliable analysis follows the intention-to-treat (ITT) principle. However, the ITT analysis requires that missing outcome data have to be imputed. Different imputation techniques may give different results and some may lead to bias...... of handling missing data in a 60-week placebo controlled anti-obesity drug trial on topiramate. METHODS: We compared an analysis of complete cases with datasets where missing body weight measurements had been replaced using three different imputation methods: LOCF, baseline carried forward (BOCF) and MI...

  1. iVAR: a program for imputing missing data in multivariate time series using vector autoregressive models.

    Science.gov (United States)

    Liu, Siwei; Molenaar, Peter C M

    2014-12-01

    This article introduces iVAR, an R program for imputing missing data in multivariate time series on the basis of vector autoregressive (VAR) models. We conducted a simulation study to compare iVAR with three methods for handling missing data: listwise deletion, imputation with sample means and variances, and multiple imputation ignoring time dependency. The results showed that iVAR produces better estimates for the cross-lagged coefficients than do the other three methods. We demonstrate the use of iVAR with an empirical example of time series electrodermal activity data and discuss the advantages and limitations of the program.

  2. Imputing forest carbon stock estimates from inventory plots to a nationally continuous coverage

    Directory of Open Access Journals (Sweden)

    Wilson Barry Tyler

    2013-01-01

    Full Text Available Abstract The U.S. has been providing national-scale estimates of forest carbon (C stocks and stock change to meet United Nations Framework Convention on Climate Change (UNFCCC reporting requirements for years. Although these currently are provided as national estimates by pool and year to meet greenhouse gas monitoring requirements, there is growing need to disaggregate these estimates to finer scales to enable strategic forest management and monitoring activities focused on various ecosystem services such as C storage enhancement. Through application of a nearest-neighbor imputation approach, spatially extant estimates of forest C density were developed for the conterminous U.S. using the U.S.’s annual forest inventory. Results suggest that an existing forest inventory plot imputation approach can be readily modified to provide raster maps of C density across a range of pools (e.g., live tree to soil organic carbon and spatial scales (e.g., sub-county to biome. Comparisons among imputed maps indicate strong regional differences across C pools. The C density of pools closely related to detrital input (e.g., dead wood is often highest in forests suffering from recent mortality events such as those in the northern Rocky Mountains (e.g., beetle infestations. In contrast, live tree carbon density is often highest on the highest quality forest sites such as those found in the Pacific Northwest. Validation results suggest strong agreement between the estimates produced from the forest inventory plots and those from the imputed maps, particularly when the C pool is closely associated with the imputation model (e.g., aboveground live biomass and live tree basal area, with weaker agreement for detrital pools (e.g., standing dead trees. Forest inventory imputed plot maps provide an efficient and flexible approach to monitoring diverse C pools at national (e.g., UNFCCC and regional scales (e.g., Reducing Emissions from Deforestation and Forest

  3. Concentrations and geographical variations of selected toxic elements in meat from semi-domesticated reindeer (Rangifer tarandus tarandus L.) in mid- and northern Norway: evaluation of risk assessment.

    Science.gov (United States)

    Hassan, Ammar Ali; Brustad, Magritt; Sandanger, Torkjel M

    2012-05-01

    Meat samples (n = 100) from semi-domesticated reindeer (Rangifer tarandus tarandus L.) were randomly collected from 10 grazing districts distributed over four Norwegian counties in 2008 and 2009. The main aim was to study concentrations and geographical variations in selected toxic elements; cadmium (Cd), lead (Pb), arsenic (As), copper (Cu), nickel (Ni) and vanadium (V) in order to assess the risk associated with reindeer meat consumption. Sample solutions were analysed using an inductively coupled plasma high resolution mass spectrometer (ICP-HRMS), whereas analysis of variance (ANOVA) was used for statistical analyses. Geographical variations in element concentrations were revealed, with As and Cd demonstrating the largest geographical differences. No clear geographical gradient was observed except for the east-west downward gradient for As. The As concentrations were highest in the vicinity of the Russian border, and only Cd was shown to increase with age (p < 0.05). Sex had no significant effect on the concentration of the studied elements. The concentrations of all the studied elements in reindeer meat were generally low and considerably below the maximum levels (ML) available for toxic elements set by the European Commission (EC). Thus, reindeer meat is not likely to be a significant contributor to the human body burden of toxic elements.

  4. Evaluation of Subcutaneous Proleukin (interleukin-2) in a Randomized International Trial (ESPRIT): geographical and gender differences in the baseline characteristics of participants.

    Science.gov (United States)

    Pett, S L; Wand, H; Law, M G; Arduino, R; Lopez, J C; Knysz, B; Pereira, L C; Pollack, S; Reiss, P; Tambussi, G

    2006-01-01

    ESPRIT, is a phase III, open-label, randomized, international clinical trial evaluating the effects of subcutaneous recombinant interleukin-2 (rIL-2) plus antiretroviral therapy (ART) versus ART alone on HIV-disease progression and death in HIV-1-infected individuals with CD4+ T-cells > or =300 cells/microL. To describe the baseline characteristics of participants randomized to ESPRIT overall and by geographic location. Baseline characteristics of randomized participants were summarized by region. 4,150 patients were enrolled in ESPRIT from 254 sites in 25 countries. 41%, 27%, 16%, 11%, and 5% were enrolled in Europe, North America, South America, Asia, and Australia, respectively. The median age was 40 years, 81% were men, and 76%, 11%, and 9% were Caucasian, Asian, and African American or African, respectively. 44% of women enrolled (n = 769) were enrolled in Thailand and Argentina. Overall, 55% and 38% of the cohort acquired HIV through male homosexual and heterosexual contact, respectively. 25% had a prior history of AIDS-defining illness; Pneumocystis jirovecii pneumonia, M. tuberculosis, and esophageal candida were most commonly reported. Median nadir and baseline CD4+ T-cell counts were 199 and 458 cells/muL, respectively. 6% and 13% were hepatitis B or C virus coinfected, respectively. Median duration of antiretroviral therapy (ART) was 4.2 years; the longest median duration was in Australia (5.2 years) and the shortest was in Asia (2.3 years). 17%, 13%, and 69% of participants began ART before 1995, between 1996 and 1997, and from 1998 onward, respectively. 86% used ART from two or more ART classes, with 49% using a protease inhibitor-based regimen and 46% using a nonnucleoside reverse transcriptase inhibitor-based regimen. 78% had plasma HIV RNA below detection (ESPRIT has enrolled a diverse population of HIV-infected individuals including large populations of women and patients of African-American/African and Asian ethnicity often underrepresented in HIV

  5. Imputation of variants from the 1000 Genomes Project modestly improves known associations and can identify low-frequency variant-phenotype associations undetected by HapMap based imputation.

    Science.gov (United States)

    Wood, Andrew R; Perry, John R B; Tanaka, Toshiko; Hernandez, Dena G; Zheng, Hou-Feng; Melzer, David; Gibbs, J Raphael; Nalls, Michael A; Weedon, Michael N; Spector, Tim D; Richards, J Brent; Bandinelli, Stefania; Ferrucci, Luigi; Singleton, Andrew B; Frayling, Timothy M

    2013-01-01

    Genome-wide association (GWA) studies have been limited by the reliance on common variants present on microarrays or imputable from the HapMap Project data. More recently, the completion of the 1000 Genomes Project has provided variant and haplotype information for several million variants derived from sequencing over 1,000 individuals. To help understand the extent to which more variants (including low frequency (1% ≤ MAF 1000 Genomes imputation, respectively, and 9 and 11 that reached a stricter, likely conservative, threshold of P1000 Genomes genotype data modestly improved the strength of known associations. Of 20 associations detected at P1000 Genomes imputed data and one was nominally more strongly associated in HapMap imputed data. We also detected an association between a low frequency variant and phenotype that was previously missed by HapMap based imputation approaches. An association between rs112635299 and alpha-1 globulin near the SERPINA gene represented the known association between rs28929474 (MAF = 0.007) and alpha1-antitrypsin that predisposes to emphysema (P = 2.5×10(-12)). Our data provide important proof of principle that 1000 Genomes imputation will detect novel, low frequency-large effect associations.

  6. Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations

    DEFF Research Database (Denmark)

    Dassonneville, R; Brøndum, Rasmus Froberg; Druet, T

    2011-01-01

    The purpose of this study was to investigate the imputation error and loss of reliability of direct genomic values (DGV) or genomically enhanced breeding values (GEBV) when using genotypes imputed from a 3,000-marker single nucleotide polymorphism (SNP) panel to a 50,000-marker SNP panel. Data...... of missing markers and prediction of breeding values were performed using 2 different reference populations in each country: either a national reference population or a combined EuroGenomics reference population. Validation for accuracy of imputation and genomic prediction was done based on national test...... with a national reference data set gave an absolute loss of 0.05 in mean reliability of GEBV in the French study, whereas a loss of 0.03 was obtained for reliability of DGV in the Nordic study. When genotypes were imputed using the EuroGenomics reference, a loss of 0.02 in mean reliability of GEBV was detected...

  7. Semiautomatic imputation of activity travel diaries : use of global positioning system traces, prompted recall, and context-sensitive learning algorithms

    NARCIS (Netherlands)

    Moiseeva, A.; Jessurun, A.J.; Timmermans, H.J.P.; Stopher, P.

    2016-01-01

    Anastasia Moiseeva, Joran Jessurun and Harry Timmermans (2010), ‘Semiautomatic Imputation of Activity Travel Diaries: Use of Global Positioning System Traces, Prompted Recall, and Context-Sensitive Learning Algorithms’, Transportation Research Record: Journal of the Transportation Research Board,

  8. Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population

    Directory of Open Access Journals (Sweden)

    Danai Jattawa

    2016-04-01

    Full Text Available The objective of this study was to investigate the accuracy of imputation from low density (LDC to moderate density SNP chips (MDC in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244 from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570, GGP26K (n = 540 and GGP80K (n = 134 chips. After checking for single nucleotide polymorphism (SNP quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912 and a test group (n = 332. The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652. The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm, FImpute 2.2 (combined family- and population-based algorithms and Findhap 4 (combined family- and population-based algorithms. Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94% than Findhap (84.64% and Beagle (76.79%. Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73% or low (80% imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart. Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information.

  9. Inference for multivariate regression model based on multiply imputed synthetic data generated via posterior predictive sampling

    Science.gov (United States)

    Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.

    2017-06-01

    The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.

  10. Using mi impute chained to fit ANCOVA models in randomized trials with censored dependent and independent variables

    DEFF Research Database (Denmark)

    Andersen, Andreas; Rieckmann, Andreas

    2016-01-01

    In this article, we illustrate how to use mi impute chained with intreg to fit an analysis of covariance analysis of censored and nondetectable immunological concentrations measured in a randomized pretest–posttest design.......In this article, we illustrate how to use mi impute chained with intreg to fit an analysis of covariance analysis of censored and nondetectable immunological concentrations measured in a randomized pretest–posttest design....

  11. Practical considerations for sensitivity analysis after multiple imputation applied to epidemiological studies with incomplete data

    Science.gov (United States)

    2012-01-01

    Background Multiple Imputation as usually implemented assumes that data are Missing At Random (MAR), meaning that the underlying missing data mechanism, given the observed data, is independent of the unobserved data. To explore the sensitivity of the inferences to departures from the MAR assumption, we applied the method proposed by Carpenter et al. (2007). This approach aims to approximate inferences under a Missing Not At random (MNAR) mechanism by reweighting estimates obtained after multiple imputation where the weights depend on the assumed degree of departure from the MAR assumption. Methods The method is illustrated with epidemiological data from a surveillance system of hepatitis C virus (HCV) infection in France during the 2001–2007 period. The subpopulation studied included 4343 HCV infected patients who reported drug use. Risk factors for severe liver disease were assessed. After performing complete-case and multiple imputation analyses, we applied the sensitivity analysis to 3 risk factors of severe liver disease: past excessive alcohol consumption, HIV co-infection and infection with HCV genotype 3. Results In these data, the association between severe liver disease and HIV was underestimated, if given the observed data the chance of observing HIV status is high when this is positive. Inference for two other risk factors were robust to plausible local departures from the MAR assumption. Conclusions We have demonstrated the practical utility of, and advocate, a pragmatic widely applicable approach to exploring plausible departures from the MAR assumption post multiple imputation. We have developed guidelines for applying this approach to epidemiological studies. PMID:22681630

  12. Imputing historical statistics, soils information, and other land-use data to crop area

    Science.gov (United States)

    Perry, C. R., Jr.; Willis, R. W.; Lautenschlager, L.

    1982-01-01

    In foreign crop condition monitoring, satellite acquired imagery is routinely used. To facilitate interpretation of this imagery, it is advantageous to have estimates of the crop types and their extent for small area units, i.e., grid cells on a map represent, at 60 deg latitude, an area nominally 25 by 25 nautical miles in size. The feasibility of imputing historical crop statistics, soils information, and other ancillary data to crop area for a province in Argentina is studied.

  13. Construction and application of a Korean reference panel for imputing classical alleles and amino acids of human leukocyte antigen genes.

    Science.gov (United States)

    Kim, Kwangwoo; Bang, So-Young; Lee, Hye-Soon; Bae, Sang-Cheol

    2014-01-01

    Genetic variations of human leukocyte antigen (HLA) genes within the major histocompatibility complex (MHC) locus are strongly associated with disease susceptibility and prognosis for many diseases, including many autoimmune diseases. In this study, we developed a Korean HLA reference panel for imputing classical alleles and amino acid residues of several HLA genes. An HLA reference panel has potential for use in identifying and fine-mapping disease associations with the MHC locus in East Asian populations, including Koreans. A total of 413 unrelated Korean subjects were analyzed for single nucleotide polymorphisms (SNPs) at the MHC locus and six HLA genes, including HLA-A, -B, -C, -DRB1, -DPB1, and -DQB1. The HLA reference panel was constructed by phasing the 5,858 MHC SNPs, 233 classical HLA alleles, and 1,387 amino acid residue markers from 1,025 amino acid positions as binary variables. The imputation accuracy of the HLA reference panel was assessed by measuring concordance rates between imputed and genotyped alleles of the HLA genes from a subset of the study subjects and East Asian HapMap individuals. Average concordance rates were 95.6% and 91.1% at 2-digit and 4-digit allele resolutions, respectively. The imputation accuracy was minimally affected by SNP density of a test dataset for imputation. In conclusion, the Korean HLA reference panel we developed was highly suitable for imputing HLA alleles and amino acids from MHC SNPs in East Asians, including Koreans.

  14. Construction and application of a Korean reference panel for imputing classical alleles and amino acids of human leukocyte antigen genes.

    Directory of Open Access Journals (Sweden)

    Kwangwoo Kim

    Full Text Available Genetic variations of human leukocyte antigen (HLA genes within the major histocompatibility complex (MHC locus are strongly associated with disease susceptibility and prognosis for many diseases, including many autoimmune diseases. In this study, we developed a Korean HLA reference panel for imputing classical alleles and amino acid residues of several HLA genes. An HLA reference panel has potential for use in identifying and fine-mapping disease associations with the MHC locus in East Asian populations, including Koreans. A total of 413 unrelated Korean subjects were analyzed for single nucleotide polymorphisms (SNPs at the MHC locus and six HLA genes, including HLA-A, -B, -C, -DRB1, -DPB1, and -DQB1. The HLA reference panel was constructed by phasing the 5,858 MHC SNPs, 233 classical HLA alleles, and 1,387 amino acid residue markers from 1,025 amino acid positions as binary variables. The imputation accuracy of the HLA reference panel was assessed by measuring concordance rates between imputed and genotyped alleles of the HLA genes from a subset of the study subjects and East Asian HapMap individuals. Average concordance rates were 95.6% and 91.1% at 2-digit and 4-digit allele resolutions, respectively. The imputation accuracy was minimally affected by SNP density of a test dataset for imputation. In conclusion, the Korean HLA reference panel we developed was highly suitable for imputing HLA alleles and amino acids from MHC SNPs in East Asians, including Koreans.

  15. Design of a bovine low-density SNP array optimized for imputation.

    Directory of Open Access Journals (Sweden)

    Didier Boichard

    Full Text Available The Illumina BovineLD BeadChip was designed to support imputation to higher density genotypes in dairy and beef breeds by including single-nucleotide polymorphisms (SNPs that had a high minor allele frequency as well as uniform spacing across the genome except at the ends of the chromosome where densities were increased. The chip also includes SNPs on the Y chromosome and mitochondrial DNA loci that are useful for determining subspecies classification and certain paternal and maternal breed lineages. The total number of SNPs was 6,909. Accuracy of imputation to Illumina BovineSNP50 genotypes using the BovineLD chip was over 97% for most dairy and beef populations. The BovineLD imputations were about 3 percentage points more accurate than those from the Illumina GoldenGate Bovine3K BeadChip across multiple populations. The improvement was greatest when neither parent was genotyped. The minor allele frequencies were similar across taurine beef and dairy breeds as was the proportion of SNPs that were polymorphic. The new BovineLD chip should facilitate low-cost genomic selection in taurine beef and dairy cattle.

  16. Imputation of microsatellite alleles from dense SNP genotypes for parental verification

    Directory of Open Access Journals (Sweden)

    Matthew eMcclure

    2012-08-01

    Full Text Available Microsatellite (MS markers have recently been used for parental verification and are still the international standard despite higher cost, error rate, and turnaround time compared with Single Nucleotide Polymorphisms (SNP-based assays. Despite domestic and international interest from producers and research communities, no viable means currently exist to verify parentage for an individual unless all familial connections were analyzed using the same DNA marker type (MS or SNP. A simple and cost-effective method was devised to impute MS alleles from SNP haplotypes within breeds. For some MS, imputation results may allow inference across breeds. A total of 347 dairy cattle representing 4 dairy breeds (Brown Swiss, Guernsey, Holstein, and Jersey were used to generate reference haplotypes. This approach has been verified (>98% accurate for imputing the International Society of Animal Genetics (ISAG recommended panel of 12 MS for cattle parentage verification across a validation set of 1,307 dairy animals.. Implementation of this method will allow producers and breed associations to transition to SNP-based parentage verification utilizing MS genotypes from historical data on parents where SNP genotypes are missing. This approach may be applicable to additional cattle breeds and other species that wish to migrate from MS- to SNP- based parental verification.

  17. TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION.

    Science.gov (United States)

    Allen, Genevera I; Tibshirani, Robert

    2010-06-01

    Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable , meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal , in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.

  18. Data Editing and Imputation in Business Surveys Using “R”

    Directory of Open Access Journals (Sweden)

    Elena Romascanu

    2014-06-01

    Full Text Available Purpose – Missing data are a recurring problem that can cause bias or lead to inefficient analyses. The objective of this paper is a direct comparison between the two statistical software features R and SPSS, in order to take full advantage of the existing automated methods for data editing process and imputation in business surveys (with a proper design of consistency rules as a partial alternative to the manual editing of data. Approach – The comparison of different methods on editing surveys data, in R with the ‘editrules’ and ‘survey’ packages because inside those, exist commonly used transformations in official statistics, as visualization of missing values pattern using ‘Amelia’ and ‘VIM’ packages, imputation approaches for longitudinal data using ‘VIMGUI’ and a comparison of another statistical software performance on the same features, such as SPSS. Findings – Data on business statistics received by NIS’s (National Institute of Statistics are not ready to be used for direct analysis due to in-record inconsistencies, errors and missing values from the collected data sets. The appropriate automatic methods from R packages, offers the ability to set the erroneous fields in edit-violating records, to verify the results after the imputation of missing values providing for users a flexible, less time consuming approach and easy to perform automation in R than in SPSS Macros syntax situations, when macros are very handy.

  19. Evaluating outcome-correlated recruitment and geographic recruitment bias in a respondent-driven sample of people who inject drugs in Tijuana, Mexico.

    Science.gov (United States)

    Rudolph, Abby E; Gaines, Tommi L; Lozada, Remedios; Vera, Alicia; Brouwer, Kimberly C

    2014-12-01

    Respondent-driven sampling's (RDS) widespread use and reliance on untested assumptions suggests a need for new exploratory/diagnostic tests. We assessed geographic recruitment bias and outcome-correlated recruitment among 1,048 RDS-recruited people who inject drugs (Tijuana, Mexico). Surveys gathered demographics, drug/sex behaviors, activity locations, and recruiter-recruit pairs. Simulations assessed geographic and network clustering of active syphilis (RPR titers ≥1:8). Gender-specific predicted probabilities were estimated using logistic regression with GEE and robust standard errors. Active syphilis prevalence was 7 % (crude: men = 5.7 % and women = 16.6 %; RDS-adjusted: men = 6.7 % and women = 7.6 %). Syphilis clustered in the Zona Norte, a neighborhood known for drug and sex markets. Network simulations revealed geographic recruitment bias and non-random recruitment by syphilis status. Gender-specific prevalence estimates accounting for clustering were highest among those living/working/injecting/buying drugs in the Zona Norte and directly/indirectly connected to syphilis cases (men: 15.9 %, women: 25.6 %) and lowest among those with neither exposure (men: 3.0 %, women: 6.1 %). Future RDS analyses should assess/account for network and spatial dependencies.

  20. Level of academic and didactic competencies among students as a measure to evaluate geographical education and preparation of students for the demands of the modern labour market

    Directory of Open Access Journals (Sweden)

    Cichoń Małgorzata

    2018-03-01

    Full Text Available Young people, regardless of their social environment, place of residence or work, are looking for values and key competencies that enable achieving goals in life. Therefore, an appropriate education system is important, which in the conditions of changing reality will meet these requirements effectively. The contemporary employer is interested in four groups of key competencies, such as intellectual, professional, personal and interpersonal. Geography is a field with great potential for the development of various competencies. In this context, questions about adjusting geographical education to the expectations of employers are justified. Therefore, the aim of the study is to assess the strengths and weaknesses of the current development of competencies and qualifications at the geography speciality of the Faculty of Geographical and Geological Sciences, Adam Mickiewicz University in Poznań, Poland. The reference points included a report on research carried out among 200 employers in 2012, as well as surveys among students graduating from master‘s studies on the assessment of the level of their competencies and qualifications. It was determined that the strength of the current geographical education at the faculty is to prepare mainly specialists with broad general and professional knowledge, and high self-esteem in terms of cooperation in the group and communication. The area of development for the geographical education are intellectual competencies, above all independent thinking and prioritising. The last year geography students fall out the most in terms of personal competencies. The authors suggest building students‘ awareness because, as the above results show, they are not fully aware of what expectations they may face in the labour market. It is worth modifying the study program so as to put more emphasis on soft competencies and support the development of various forms of extra activities of students. Attention was also paid to

  1. Quick, “Imputation-free” meta-analysis with proxy-SNPs

    Directory of Open Access Journals (Sweden)

    Meesters Christian

    2012-09-01

    Full Text Available Abstract Background Meta-analysis (MA is widely used to pool genome-wide association studies (GWASes in order to a increase the power to detect strong or weak genotype effects or b as a result verification method. As a consequence of differing SNP panels among genotyping chips, imputation is the method of choice within GWAS consortia to avoid losing too many SNPs in a MA. YAMAS (Yet Another Meta Analysis Software, however, enables cross-GWAS conclusions prior to finished and polished imputation runs, which eventually are time-consuming. Results Here we present a fast method to avoid forfeiting SNPs present in only a subset of studies, without relying on imputation. This is accomplished by using reference linkage disequilibrium data from 1,000 Genomes/HapMap projects to find proxy-SNPs together with in-phase alleles for SNPs missing in at least one study. MA is conducted by combining association effect estimates of a SNP and those of its proxy-SNPs. Our algorithm is implemented in the MA software YAMAS. Association results from GWAS analysis applications can be used as input files for MA, tremendously speeding up MA compared to the conventional imputation approach. We show that our proxy algorithm is well-powered and yields valuable ad hoc results, possibly providing an incentive for follow-up studies. We propose our method as a quick screening step prior to imputation-based MA, as well as an additional main approach for studies without available reference data matching the ethnicities of study participants. As a proof of principle, we analyzed six dbGaP Type II Diabetes GWAS and found that the proxy algorithm clearly outperforms naïve MA on the p-value level: for 17 out of 23 we observe an improvement on the p-value level by a factor of more than two, and a maximum improvement by a factor of 2127. Conclusions YAMAS is an efficient and fast meta-analysis program which offers various methods, including conventional MA as well as inserting proxy

  2. Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa.

    Directory of Open Access Journals (Sweden)

    Katya L Masconi

    Full Text Available Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation.Data from the Cape Town Bellville-South cohort served as the basis for this study. Imputation methods and models were identified via recent systematic reviews. Models' discrimination was assessed and compared using C-statistic and non-parametric methods, before and after recalibration through simple intercept adjustment.The study sample consisted of 1256 individuals, of whom 173 were excluded due to previously diagnosed diabetes. Of the final 1083 individuals, 329 (30.4% had missing data. Family history had the highest proportion of missing data (25%. Imputation of the outcome, undiagnosed diabetes, was highest in stochastic regression imputation (163 individuals. Overall, deletion resulted in the lowest model performances while simple imputation yielded the highest C-statistic for the Cambridge Diabetes Risk model, Kuwaiti Risk model, Omani Diabetes Risk model and Rotterdam Predictive model. Multiple imputation only yielded the highest C-statistic for the Rotterdam Predictive model, which were matched by simpler imputation methods.Deletion was confirmed as a poor technique for handling missing data. However, despite the emphasized disadvantages of simpler imputation methods, this study showed that implementing these methods results in similar predictive utility for undiagnosed diabetes when compared to multiple imputation.

  3. Use of Multiple Imputation Method to Improve Estimation of Missing Baseline Serum Creatinine in Acute Kidney Injury Research

    Science.gov (United States)

    Peterson, Josh F.; Eden, Svetlana K.; Moons, Karel G.; Ikizler, T. Alp; Matheny, Michael E.

    2013-01-01

    Summary Background and objectives Baseline creatinine (BCr) is frequently missing in AKI studies. Common surrogate estimates can misclassify AKI and adversely affect the study of related outcomes. This study examined whether multiple imputation improved accuracy of estimating missing BCr beyond current recommendations to apply assumed estimated GFR (eGFR) of 75 ml/min per 1.73 m2 (eGFR 75). Design, setting, participants, & measurements From 41,114 unique adult admissions (13,003 with and 28,111 without BCr data) at Vanderbilt University Hospital between 2006 and 2008, a propensity score model was developed to predict likelihood of missing BCr. Propensity scoring identified 6502 patients with highest likelihood of missing BCr among 13,003 patients with known BCr to simulate a “missing” data scenario while preserving actual reference BCr. Within this cohort (n=6502), the ability of various multiple-imputation approaches to estimate BCr and classify AKI were compared with that of eGFR 75. Results All multiple-imputation methods except the basic one more closely approximated actual BCr than did eGFR 75. Total AKI misclassification was lower with multiple imputation (full multiple imputation + serum creatinine) (9.0%) than with eGFR 75 (12.3%; Pcreatinine) (15.3%) versus eGFR 75 (40.5%; P<0.001). Multiple imputation improved specificity and positive predictive value for detecting AKI at the expense of modestly decreasing sensitivity relative to eGFR 75. Conclusions Multiple imputation can improve accuracy in estimating missing BCr and reduce misclassification of AKI beyond currently proposed methods. PMID:23037980

  4. Comparing strategies for selection of low-density SNPs for imputation-mediated genomic prediction in U. S. Holsteins.

    Science.gov (United States)

    He, Jun; Xu, Jiaqi; Wu, Xiao-Lin; Bauck, Stewart; Lee, Jungjae; Morota, Gota; Kachman, Stephen D; Spangler, Matthew L

    2018-04-01

    SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821-0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825-0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction.

  5. Missing Value Imputation Improves Mortality Risk Prediction Following Cardiac Surgery: An Investigation of an Australian Patient Cohort.

    Science.gov (United States)

    Karim, Md Nazmul; Reid, Christopher M; Tran, Lavinia; Cochrane, Andrew; Billah, Baki

    2017-03-01

    The aim of this study was to evaluate the impact of missing values on the prediction performance of the model predicting 30-day mortality following cardiac surgery as an example. Information from 83,309 eligible patients, who underwent cardiac surgery, recorded in the Australia and New Zealand Society of Cardiac and Thoracic Surgeons (ANZSCTS) database registry between 2001 and 2014, was used. An existing 30-day mortality risk prediction model developed from ANZSCTS database was re-estimated using the complete cases (CC) analysis and using multiple imputation (MI) analysis. Agreement between the risks generated by the CC and MI analysis approaches was assessed by the Bland-Altman method. Performances of the two models were compared. One or more missing predictor variables were present in 15.8% of the patients in the dataset. The Bland-Altman plot demonstrated significant disagreement between the risk scores (prisk of mortality. Compared to CC analysis, MI analysis resulted in an average of 8.5% decrease in standard error, a measure of uncertainty. The MI model provided better prediction of mortality risk (observed: 2.69%; MI: 2.63% versus CC: 2.37%, Pvalues improved the 30-day mortality risk prediction following cardiac surgery. Copyright © 2016 Australian and New Zealand Society of Cardiac and Thoracic Surgeons (ANZSCTS) and the Cardiac Society of Australia and New Zealand (CSANZ). Published by Elsevier B.V. All rights reserved.

  6. Randomly and Non-Randomly Missing Renal Function Data in the Strong Heart Study: A Comparison of Imputation Methods.

    Directory of Open Access Journals (Sweden)

    Nawar Shara

    Full Text Available Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS. Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989-1991, 2 (1993-1995, and 3 (1998-1999 was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results.

  7. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.

    Science.gov (United States)

    Lazar, Cosmin; Gatto, Laurent; Ferro, Myriam; Bruley, Christophe; Burger, Thomas

    2016-04-01

    Missing values are a genuine issue in label-free quantitative proteomics. Recent works have surveyed the different statistical methods to conduct imputation and have compared them on real or simulated data sets and recommended a list of missing value imputation methods for proteomics application. Although insightful, these comparisons do not account for two important facts: (i) depending on the proteomics data set, the missingness mechanism may be of different natures and (ii) each imputation method is devoted to a specific type of missingness mechanism. As a result, we believe that the question at stake is not to find the most accurate imputation method in general but instead the most appropriate one. We describe a series of comparisons that support our views: For instance, we show that a supposedly "under-performing" method (i.e., giving baseline average results), if applied at the "appropriate" time in the data-processing pipeline (before or after peptide aggregation) on a data set with the "appropriate" nature of missing values, can outperform a blindly applied, supposedly "better-performing" method (i.e., the reference method from the state-of-the-art). This leads us to formulate few practical guidelines regarding the choice and the application of an imputation method in a proteomics context.

  8. Missing data in clinical trials: control-based mean imputation and sensitivity analysis.

    Science.gov (United States)

    Mehrotra, Devan V; Liu, Fang; Permutt, Thomas

    2017-09-01

    In some randomized (drug versus placebo) clinical trials, the estimand of interest is the between-treatment difference in population means of a clinical endpoint that is free from the confounding effects of "rescue" medication (e.g., HbA1c change from baseline at 24 weeks that would be observed without rescue medication regardless of whether or when the assigned treatment was discontinued). In such settings, a missing data problem arises if some patients prematurely discontinue from the trial or initiate rescue medication while in the trial, the latter necessitating the discarding of post-rescue data. We caution that the commonly used mixed-effects model repeated measures analysis with the embedded missing at random assumption can deliver an exaggerated estimate of the aforementioned estimand of interest. This happens, in part, due to implicit imputation of an overly optimistic mean for "dropouts" (i.e., patients with missing endpoint data of interest) in the drug arm. We propose an alternative approach in which the missing mean for the drug arm dropouts is explicitly replaced with either the estimated mean of the entire endpoint distribution under placebo (primary analysis) or a sequence of increasingly more conservative means within a tipping point framework (sensitivity analysis); patient-level imputation is not required. A supplemental "dropout = failure" analysis is considered in which a common poor outcome is imputed for all dropouts followed by a between-treatment comparison using quantile regression. All analyses address the same estimand and can adjust for baseline covariates. Three examples and simulation results are used to support our recommendations. Copyright © 2017 John Wiley & Sons, Ltd.

  9. Imputation of Baseline LDL Cholesterol Concentration in Patients with Familial Hypercholesterolemia on Statins or Ezetimibe.

    Science.gov (United States)

    Ruel, Isabelle; Aljenedil, Sumayah; Sadri, Iman; de Varennes, Émilie; Hegele, Robert A; Couture, Patrick; Bergeron, Jean; Wanneh, Eric; Baass, Alexis; Dufour, Robert; Gaudet, Daniel; Brisson, Diane; Brunham, Liam R; Francis, Gordon A; Cermakova, Lubomira; Brophy, James M; Ryomoto, Arnold; Mancini, G B John; Genest, Jacques

    2018-02-01

    Familial hypercholesterolemia (FH) is the most frequent genetic disorder seen clinically and is characterized by increased LDL cholesterol (LDL-C) (>95th percentile), family history of increased LDL-C, premature atherosclerotic cardiovascular disease (ASCVD) in the patient or in first-degree relatives, presence of tendinous xanthomas or premature corneal arcus, or presence of a pathogenic mutation in the LDLR , PCSK9 , or APOB genes. A diagnosis of FH has important clinical implications with respect to lifelong risk of ASCVD and requirement for intensive pharmacological therapy. The concentration of baseline LDL-C (untreated) is essential for the diagnosis of FH but is often not available because the individual is already on statin therapy. To validate a new algorithm to impute baseline LDL-C, we examined 1297 patients. The baseline LDL-C was compared with the imputed baseline obtained within 18 months of the initiation of therapy. We compared the percent reduction in LDL-C on treatment from baseline with the published percent reductions. After eliminating individuals with missing data, nonstandard doses of statins, or medications other than statins or ezetimibe, we provide data on 951 patients. The mean ± SE baseline LDL-C was 243.0 (2.2) mg/dL [6.28 (0.06) mmol/L], and the mean ± SE imputed baseline LDL-C was 244.2 (2.6) mg/dL [6.31 (0.07) mmol/L] ( P = 0.48). There was no difference in response according to the patient's sex or in percent reduction between observed and expected for individual doses or types of statin or ezetimibe. We provide a validated estimation of baseline LDL-C for patients with FH that may help clinicians in making a diagnosis. © 2017 American Association for Clinical Chemistry.

  10. Handling missing data in cluster randomized trials: A demonstration of multiple imputation with PAN through SAS

    Directory of Open Access Journals (Sweden)

    Jiangxiu Zhou

    2014-09-01

    Full Text Available The purpose of this study is to demonstrate a way of dealing with missing data in clustered randomized trials by doing multiple imputation (MI with the PAN package in R through SAS. The procedure for doing MI with PAN through SAS is demonstrated in detail in order for researchers to be able to use this procedure with their own data. An illustration of the technique with empirical data was also included. In this illustration thePAN results were compared with pairwise deletion and three types of MI: (1 Normal Model (NM-MI ignoring the cluster structure; (2 NM-MI with dummy-coded cluster variables (fixed cluster structure; and (3 a hybrid NM-MI which imputes half the time ignoring the cluster structure, and the other half including the dummy-coded cluster variables. The empirical analysis showed that using PAN and the other strategies produced comparable parameter estimates. However, the dummy-coded MI overestimated the intraclass correlation, whereas MI ignoring the cluster structure and the hybrid MI underestimated the intraclass correlation. When compared with PAN, the p-value and standard error for the treatment effect were higher with dummy-coded MI, and lower with MI ignoring the clusterstructure, the hybrid MI approach, and pairwise deletion. Previous studies have shown that NM-MI is not appropriate for handling missing data in clustered randomized trials. This approach, in addition to the pairwise deletion approach, leads to a biased intraclass correlation and faultystatistical conclusions. Imputation in clustered randomized trials should be performed with PAN. We have demonstrated an easy way for using PAN through SAS.

  11. Using the Superpopulation Model for Imputations and Variance Computation in Survey Sampling

    Directory of Open Access Journals (Sweden)

    Petr Novák

    2012-03-01

    Full Text Available This study is aimed at variance computation techniques for estimates of population characteristics based on survey sampling and imputation. We use the superpopulation regression model, which means that the target variable values for each statistical unit are treated as random realizations of a linear regression model with weighted variance. We focus on regression models with one auxiliary variable and no intercept, which have many applications and straightforward interpretation in business statistics. Furthermore, we deal with caseswhere the estimates are not independent and thus the covariance must be computed. We also consider chained regression models with auxiliary variables as random variables instead of constants.

  12. Missing Value Imputation Based on Gaussian Mixture Model for the Internet of Things

    OpenAIRE

    Yan, Xiaobo; Xiong, Weiqing; Hu, Liang; Wang, Feng; Zhao, Kuo

    2015-01-01

    This paper addresses missing value imputation for the Internet of Things (IoT). Nowadays, the IoT has been used widely and commonly by a variety of domains, such as transportation and logistics domain and healthcare domain. However, missing values are very common in the IoT for a variety of reasons, which results in the fact that the experimental data are incomplete. As a result of this, some work, which is related to the data of the IoT, can’t be carried out normally. And it leads to the red...

  13. Candidate gene analysis using imputed genotypes: cell cycle single-nucleotide polymorphisms and ovarian cancer risk

    DEFF Research Database (Denmark)

    Goode, Ellen L; Fridley, Brooke L; Vierkant, Robert A

    2009-01-01

    Polymorphisms in genes critical to cell cycle control are outstanding candidates for association with ovarian cancer risk; numerous genes have been interrogated by multiple research groups using differing tagging single-nucleotide polymorphism (SNP) sets. To maximize information gleaned from......, and rs3212891; CDK2 rs2069391, rs2069414, and rs17528736; and CCNE1 rs3218036. These results exemplify the utility of imputation in candidate gene studies and lend evidence to a role of cell cycle genes in ovarian cancer etiology, suggest a reduced set of SNPs to target in additional cases and controls....

  14. Non-imputability, criminal dangerousness and curative safety measures: myths and realities

    Directory of Open Access Journals (Sweden)

    Frank Harbottle Quirós

    2017-04-01

    Full Text Available The curative safety measures are imposed in a criminal proceeding to the non-imputable people provided that through a prognosis it is concluded in an affirmative way about its criminal dangerousness. Although this statement seems very elementary, in judicial practice several myths remain in relation to these legal institutes whose versions may vary, to a greater or lesser extent, between the different countries of the world. In this context, the present article formulates ten myths based on the experience of Costa Rica and provides an explanation that seeks to weaken or knock them down, inviting the reader to reflect on them.

  15. A suggested approach for imputation of missing dietary data for young children in daycare

    OpenAIRE

    Stevens, June; Ou, Fang-Shu; Truesdale, Kimberly P.; Zeng, Donglin; Vaughn, Amber E.; Pratt, Charlotte; Ward, Dianne S.

    2015-01-01

    Background: Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult.Objective: The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method.Design: Data were from children aged 2-5 years in the My Parenting...

  16. Trend in BMI z-score among Private Schools’ Students in Delhi using Multiple Imputation for Growth Curve Model

    Directory of Open Access Journals (Sweden)

    Vinay K Gupta

    2016-06-01

    Full Text Available Objective: The aim of the study is to assess the trend in mean BMI z-score among private schools’ students from their anthropometric records when there were missing values in the outcome. Methodology: The anthropometric measurements of student from class 1 to 12 were taken from the records of two private schools in Delhi, India from 2005 to 2010. These records comprise of an unbalanced longitudinal data that is not all the students had measurements recorded at each year. The trend in mean BMI z-score was estimated through growth curve model. Prior to that, missing values of BMI z-score were imputed through multiple imputation using the same model. A complete case analysis was also performed after excluding missing values to compare the results with those obtained from analysis of multiply imputed data. Results: The mean BMI z-score among school student significantly decreased over time in imputed data (β= -0.2030, se=0.0889, p=0.0232 after adjusting age, gender, class and school. Complete case analysis also shows a decrease in mean BMI z-score though it was not statistically significant (β= -0.2861, se=0.0987, p=0.065. Conclusions: The estimates obtained from multiple imputation analysis were better than those of complete data after excluding missing values in terms of lower standard errors. We showed that anthropometric measurements from schools records can be used to monitor the weight status of children and adolescents and multiple imputation using growth curve model can be useful while analyzing such data

  17. Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans.

    Science.gov (United States)

    Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne; Bourgeois, Stephane; Svensson, Peter J; Wadelius, Mia; Deloukas, Panos; Montgomery, Stephen B; Altman, Russ B

    2017-11-24

    Genome-wide association studies are useful for discovering genotype-phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into "gene level" effects. Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression-on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort.

  18. Multiple imputation to account for measurement error in marginal structural models

    Science.gov (United States)

    Edwards, Jessie K.; Cole, Stephen R.; Westreich, Daniel; Crane, Heidi; Eron, Joseph J.; Mathews, W. Christopher; Moore, Richard; Boswell, Stephen L.; Lesko, Catherine R.; Mugavero, Michael J.

    2015-01-01

    Background Marginal structural models are an important tool for observational studies. These models typically assume that variables are measured without error. We describe a method to account for differential and non-differential measurement error in a marginal structural model. Methods We illustrate the method estimating the joint effects of antiretroviral therapy initiation and current smoking on all-cause mortality in a United States cohort of 12,290 patients with HIV followed for up to 5 years between 1998 and 2011. Smoking status was likely measured with error, but a subset of 3686 patients who reported smoking status on separate questionnaires composed an internal validation subgroup. We compared a standard joint marginal structural model fit using inverse probability weights to a model that also accounted for misclassification of smoking status using multiple imputation. Results In the standard analysis, current smoking was not associated with increased risk of mortality. After accounting for misclassification, current smoking without therapy was associated with increased mortality [hazard ratio (HR): 1.2 (95% CI: 0.6, 2.3)]. The HR for current smoking and therapy (0.4 (95% CI: 0.2, 0.7)) was similar to the HR for no smoking and therapy (0.4; 95% CI: 0.2, 0.6). Conclusions Multiple imputation can be used to account for measurement error in concert with methods for causal inference to strengthen results from observational studies. PMID:26214338

  19. Multiple Imputation to Account for Measurement Error in Marginal Structural Models.

    Science.gov (United States)

    Edwards, Jessie K; Cole, Stephen R; Westreich, Daniel; Crane, Heidi; Eron, Joseph J; Mathews, W Christopher; Moore, Richard; Boswell, Stephen L; Lesko, Catherine R; Mugavero, Michael J

    2015-09-01

    Marginal structural models are an important tool for observational studies. These models typically assume that variables are measured without error. We describe a method to account for differential and nondifferential measurement error in a marginal structural model. We illustrate the method estimating the joint effects of antiretroviral therapy initiation and current smoking on all-cause mortality in a United States cohort of 12,290 patients with HIV followed for up to 5 years between 1998 and 2011. Smoking status was likely measured with error, but a subset of 3,686 patients who reported smoking status on separate questionnaires composed an internal validation subgroup. We compared a standard joint marginal structural model fit using inverse probability weights to a model that also accounted for misclassification of smoking status using multiple imputation. In the standard analysis, current smoking was not associated with increased risk of mortality. After accounting for misclassification, current smoking without therapy was associated with increased mortality (hazard ratio [HR]: 1.2 [95% confidence interval [CI] = 0.6, 2.3]). The HR for current smoking and therapy [0.4 (95% CI = 0.2, 0.7)] was similar to the HR for no smoking and therapy (0.4; 95% CI = 0.2, 0.6). Multiple imputation can be used to account for measurement error in concert with methods for causal inference to strengthen results from observational studies.

  20. Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans

    Directory of Open Access Journals (Sweden)

    Assaf Gottlieb

    2017-11-01

    Full Text Available Abstract Background Genome-wide association studies are useful for discovering genotype–phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into “gene level” effects. Methods Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression—on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. Results We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Conclusions Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort

  1. FCMPSO: An Imputation for Missing Data Features in Heart Disease Classification

    Science.gov (United States)

    Salleh, Mohd Najib Mohd; Ashikin Samat, Nurul

    2017-08-01

    The application of data mining and machine learning in directing clinical research into possible hidden knowledge is becoming greatly influential in medical areas. Heart Disease is a killer disease around the world, and early prevention through efficient methods can help to reduce the mortality number. Medical data may contain many uncertainties, as they are fuzzy and vague in nature. Nonetheless, imprecise features data such as no values and missing values can affect quality of classification results. Nevertheless, the other complete features are still capable to give information in certain features. Therefore, an imputation approach based on Fuzzy C-Means and Particle Swarm Optimization (FCMPSO) is developed in preprocessing stage to help fill in the missing values. Then, the complete dataset is trained in classification algorithm, Decision Tree. The experiment is trained with Heart Disease dataset and the performance is analysed using accuracy, precision, and ROC values. Results show that the performance of Decision Tree is increased after the application of FCMSPO for imputation.

  2. Local exome sequences facilitate imputation of less common variants and increase power of genome wide association studies.

    Directory of Open Access Journals (Sweden)

    Peter K Joshi

    Full Text Available The analysis of less common variants in genome-wide association studies promises to elucidate complex trait genetics but is hampered by low power to reliably detect association. We show that addition of population-specific exome sequence data to global reference data allows more accurate imputation, particularly of less common SNPs (minor allele frequency 1-10% in two very different European populations. The imputation improvement corresponds to an increase in effective sample size of 28-38%, for SNPs with a minor allele frequency in the range 1-3%.

  3. Evaluation of Ordinary Least Square (OLS) and Geographically Weighted Regression (GWR) for Water Quality Monitoring: A Case Study for the Estimation of Salinity

    Science.gov (United States)

    Nazeer, Majid; Bilal, Muhammad

    2018-04-01

    Landsat-5 Thematic Mapper (TM) dataset have been used to estimate salinity in the coastal area of Hong Kong. Four adjacent Landsat TM images were used in this study, which was atmospherically corrected using the Second Simulation of the Satellite Signal in the Solar Spectrum (6S) radiative transfer code. The atmospherically corrected images were further used to develop models for salinity using Ordinary Least Square (OLS) regression and Geographically Weighted Regression (GWR) based on in situ data of October 2009. Results show that the coefficient of determination ( R 2) of 0.42 between the OLS estimated and in situ measured salinity is much lower than that of the GWR model, which is two times higher ( R 2 = 0.86). It indicates that the GWR model has more ability than the OLS regression model to predict salinity and show its spatial heterogeneity better. It was observed that the salinity was high in Deep Bay (north-western part of Hong Kong) which might be due to the industrial waste disposal, whereas the salinity was estimated to be constant (32 practical salinity units) towards the open sea.

  4. Random Forest as an Imputation Method for Education and Psychology Research: Its Impact on Item Fit and Difficulty of the Rasch Model

    Science.gov (United States)

    Golino, Hudson F.; Gomes, Cristiano M. A.

    2016-01-01

    This paper presents a non-parametric imputation technique, named random forest, from the machine learning field. The random forest procedure has two main tuning parameters: the number of trees grown in the prediction and the number of predictors used. Fifty experimental conditions were created in the imputation procedure, with different…

  5. Volunteered Geographic Information in Wikipedia

    Science.gov (United States)

    Hardy, Darren

    2010-01-01

    Volunteered geographic information (VGI) refers to the geographic subset of online user-generated content. Through Geobrowsers and online mapping services, which use geovisualization and Web technologies to share and produce VGI, a global digital commons of geographic information has emerged. A notable example is Wikipedia, an online collaborative…

  6. In vivo evaluation of some biophysical parameters of the facial skin of Indian women. Part I: variability with age and geographical locations.

    Science.gov (United States)

    Colomb, L; Flament, F; Wagle, A; Agrawal, D

    2018-02-01

    India is a large country (a subcontinent) of about 3.3 million km 2 that covers large ranges in latitude and longitude. The last Indian census counted about 1.21 billion of inhabitants of many origins, creating a vast human diversity and skin types, the variability of which having been previously established. The present study aimed at deepening this knowledge through a set of biophysical measurements to describe, along the skin ageing process, the specificities of various Indian subjects living in different geographical locations. A total of 1204 women, aged 18-84 years, of all socio-economic status, were recruited in four Indian cities (Mumbai, Kolkata, Chennai and Delhi). Measurements of face skin colour properties, elastic properties, sebum production, skin pores and microrelief roughness were performed. With regard skin colour, this study indicates, with age, a darkening of very low amplitude that leads to an increased skin colour heterogeneity. In all subjects, at all ages, the ocular region (dark circles) presents a much darker pigmentation than the cheeks, creating a contrast that appears constant at all ages. In addition to an increased skin colour heterogeneity, a progressive alteration of the skin surface relief, increased sizes of skin pores, a loss of skin elasticity and a drop in sebum production, post-menopause, are observed. This study confirms, in Indian women, some skin ageing measurements found on women from other ethnic groups (i.e. sebum, firmness, wrinkles and pores size) and also identifies some Indian specificities: a high and constant contrast between the ocular region and the cheek colour, associated to a very slow darkening effect along the lifespan. © 2017 Society of Cosmetic Scientists and the Société Française de Cosmétologie.

  7. Emerging pests and diseases of South-east Asian cassava: a comprehensive evaluation of geographic priorities, management options and research needs.

    Science.gov (United States)

    Graziosi, Ignazio; Minato, Nami; Alvarez, Elizabeth; Ngo, Dung Tien; Hoat, Trinh Xuan; Aye, Tin Maung; Pardo, Juan Manuel; Wongtiem, Prapit; Wyckhuys, Kris Ag

    2016-06-01

    Cassava is a major staple, bio-energy and industrial crop in many parts of the developing world. In Southeast Asia, cassava is grown on >4 million ha by nearly 8 million (small-scale) farming households, under (climatic, biophysical) conditions that often prove unsuitable for many other crops. While SE Asian cassava has been virtually free of phytosanitary constraints for most of its history, a complex of invasive arthropod pests and plant diseases has recently come to affect local crops. We describe results from a region-wide monitoring effort in the 2014 dry season, covering 429 fields across five countries. We present geographic distribution and field-level incidence of the most prominent pest and disease invaders, introduce readily-available management options and research needs. Monitoring work reveals that several exotic mealybug and (red) mite species have effectively colonised SE Asia's main cassava-growing areas, occurring in respectively 70% and 54% of fields, at average field-level incidence of 27 ± 2% and 16 ± 2%. Cassava witches broom (CWB), a systemic phytoplasma disease, was reported from 64% of plots, at incidence levels of 32 ± 2%. Although all main pests and diseases are non-natives, we hypothesise that accelerating intensification of cropping systems, increased climate change and variability, and deficient crop husbandry are aggravating both organism activity and crop susceptibility. Future efforts need to consolidate local capacity to tackle current (and future) pest invaders, boost detection capacity, devise locally-appropriate integrated pest management (IPM) tactics, and transfer key concepts and technologies to SE Asia's cassava growers. Urgent action is needed to mobilise regional as well as international scientific support, to effectively tackle this phytosanitary emergency and thus safeguard the sustainability and profitability of one of Asia's key agricultural commodities. © 2016 Society of Chemical Industry. © 2016

  8. Imputation of missing genotypes within LD-blocks relying on the basic coalescent and beyond: consideration of population growth and structure.

    Science.gov (United States)

    Kabisch, Maria; Hamann, Ute; Lorenzo Bermejo, Justo

    2017-10-17

    Genotypes not directly measured in genetic studies are often imputed to improve statistical power and to increase mapping resolution. The accuracy of standard imputation techniques strongly depends on the similarity of linkage disequilibrium (LD) patterns in the study and reference populations. Here we develop a novel approach for genotype imputation in low-recombination regions that relies on the coalescent and permits to explicitly account for population demographic factors. To test the new method, study and reference haplotypes were simulated and gene trees were inferred under the basic coalescent and also considering population growth and structure. The reference haplotypes that first coalesced with study haplotypes were used as templates for genotype imputation. Computer simulations were complemented with the analysis of real data. Genotype concordance rates were used to compare the accuracies of coalescent-based and standard (IMPUTE2) imputation. Simulations revealed that, in LD-blocks, imputation accuracy relying on the basic coalescent was higher and less variable than with IMPUTE2. Explicit consideration of population growth and structure, even if present, did not practically improve accuracy. The advantage of coalescent-based over standard imputation increased with the minor allele frequency and it decreased with population stratification. Results based on real data indicated that, even in low-recombination regions, further research is needed to incorporate recombination in coalescence inference, in particular for studies with genetically diverse and admixed individuals. To exploit the full potential of coalescent-based methods for the imputation of missing genotypes in genetic studies, further methodological research is needed to reduce computer time, to take into account recombination, and to implement these methods in user-friendly computer programs. Here we provide reproducible code which takes advantage of publicly available software to facilitate

  9. Imputation by the mean score should be avoided when validating a Patient Reported Outcomes questionnaire by a Rasch model in presence of informative missing data

    LENUS (Irish Health Repository)

    Hardouin, Jean-Benoit

    2011-07-14

    Abstract Background Nowadays, more and more clinical scales consisting in responses given by the patients to some items (Patient Reported Outcomes - PRO), are validated with models based on Item Response Theory, and more specifically, with a Rasch model. In the validation sample, presence of missing data is frequent. The aim of this paper is to compare sixteen methods for handling the missing data (mainly based on simple imputation) in the context of psychometric validation of PRO by a Rasch model. The main indexes used for validation by a Rasch model are compared. Methods A simulation study was performed allowing to consider several cases, notably the possibility for the missing values to be informative or not and the rate of missing data. Results Several imputations methods produce bias on psychometrical indexes (generally, the imputation methods artificially improve the psychometric qualities of the scale). In particular, this is the case with the method based on the Personal Mean Score (PMS) which is the most commonly used imputation method in practice. Conclusions Several imputation methods should be avoided, in particular PMS imputation. From a general point of view, it is important to use an imputation method that considers both the ability of the patient (measured for example by his\\/her score), and the difficulty of the item (measured for example by its rate of favourable responses). Another recommendation is to always consider the addition of a random process in the imputation method, because such a process allows reducing the bias. Last, the analysis realized without imputation of the missing data (available case analyses) is an interesting alternative to the simple imputation in this context.

  10. Geographic Ontologies, Gazetteers and Multilingualism

    Directory of Open Access Journals (Sweden)

    Robert Laurini

    2015-01-01

    Full Text Available Different languages imply different visions of space, so that terminologies are different in geographic ontologies. In addition to their geometric shapes, geographic features have names, sometimes different in diverse languages. In addition, the role of gazetteers, as dictionaries of place names (toponyms, is to maintain relations between place names and location. The scope of geographic information retrieval is to search for geographic information not against a database, but against the whole Internet: but the Internet stores information in different languages, and it is of paramount importance not to remain stuck to a unique language. In this paper, our first step is to clarify the links between geographic objects as computer representations of geographic features, ontologies and gazetteers designed in various languages. Then, we propose some inference rules for matching not only types, but also relations in geographic ontologies with the assistance of gazetteers.

  11. Imputation of single nucleotide polymorhpism genotypes of Hereford cattle: reference panel size, family relationship and population structure

    Science.gov (United States)

    The objective of this study is to investigate single nucleotide polymorphism (SNP) genotypes imputation of Hereford cattle. Purebred Herefords were from two sources, Line 1 Hereford (N=240) and representatives of Industry Herefords (N=311). Using different reference panels of 62 and 494 males with 1...

  12. 21 CFR 1404.630 - May the Office of National Drug Control Policy impute conduct of one person to another?

    Science.gov (United States)

    2010-04-01

    ... 21 Food and Drugs 9 2010-04-01 2010-04-01 false May the Office of National Drug Control Policy impute conduct of one person to another? 1404.630 Section 1404.630 Food and Drugs OFFICE OF NATIONAL DRUG CONTROL POLICY GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1404.630...

  13. The Use of Imputed Sibling Genotypes in Sibship-Based Association Analysis: On Modeling Alternatives, Power and Model Misspecification

    NARCIS (Netherlands)

    Minica, C.C.; Dolan, C.V.; Willemsen, G.; Vink, J.M.; Boomsma, D.I.

    2013-01-01

    When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of

  14. Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research

    Directory of Open Access Journals (Sweden)

    Hardt Jochen

    2012-12-01

    Full Text Available Abstract Background Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. Methods A simulation study of a linear regression with a response Y and two predictors X1 and X2 was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10 vs. moderate correlations (r=.50 with X’s and Y. Results The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful. Conclusion More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.

  15. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    DEFF Research Database (Denmark)

    Huang, Jie; Howie, Bryan; Mccarthy, Shane

    2015-01-01

    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low de...

  16. 29 CFR 1471.630 - May the Federal Mediation and Conciliation Service impute conduct of one person to another?

    Science.gov (United States)

    2010-07-01

    ... 29 Labor 4 2010-07-01 2010-07-01 false May the Federal Mediation and Conciliation Service impute...) FEDERAL MEDIATION AND CONCILIATION SERVICE GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1471.630 May the Federal Mediation and...

  17. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    NARCIS (Netherlands)

    J. Huang (Jie); B. Howie (Bryan); S. McCarthy (Shane); Y. Memari (Yasin); K. Walter (Klaudia); J.L. Min (Josine L.); P. Danecek (Petr); G. Malerba (Giovanni); E. Trabetti (Elisabetta); H.-F. Zheng (Hou-Feng); G. Gambaro (Giovanni); J.B. Richards (Brent); R. Durbin (Richard); N.J. Timpson (Nicholas); J. Marchini (Jonathan); N. Soranzo (Nicole); S.H. Al Turki (Saeed); A. Amuzu (Antoinette); C. Anderson (Carl); R. Anney (Richard); D. Antony (Dinu); M.S. Artigas; M. Ayub (Muhammad); S. Bala (Senduran); J.C. Barrett (Jeffrey); I.E. Barroso (Inês); P.L. Beales (Philip); M. Benn (Marianne); J. Bentham (Jamie); S. Bhattacharya (Shoumo); E. Birney (Ewan); D.H.R. Blackwood (Douglas); M. Bobrow (Martin); E. Bochukova (Elena); P.F. Bolton (Patrick F.); R. Bounds (Rebecca); C. Boustred (Chris); G. Breen (Gerome); M. Calissano (Mattia); K. Carss (Keren); J.P. Casas (Juan Pablo); J.C. Chambers (John C.); R. Charlton (Ruth); K. Chatterjee (Krishna); L. Chen (Lu); A. Ciampi (Antonio); S. Cirak (Sebahattin); P. Clapham (Peter); G. Clement (Gail); G. Coates (Guy); M. Cocca (Massimiliano); D.A. Collier (David); C. Cosgrove (Catherine); T. Cox (Tony); N.J. Craddock (Nick); L. Crooks (Lucy); S. Curran (Sarah); D. Curtis (David); A. Daly (Allan); I.N.M. Day (Ian N.M.); A.G. Day-Williams (Aaron); G.V. Dedoussis (George); T. Down (Thomas); Y. Du (Yuanping); C.M. van Duijn (Cornelia); I. Dunham (Ian); T. Edkins (Ted); R. Ekong (Rosemary); P. Ellis (Peter); D.M. Evans (David); I.S. Farooqi (I. Sadaf); D.R. Fitzpatrick (David R.); P. Flicek (Paul); J. Floyd (James); A.R. Foley (A. Reghan); C.S. Franklin (Christopher S.); M. Futema (Marta); L. Gallagher (Louise); P. Gasparini (Paolo); T.R. Gaunt (Tom); M. Geihs (Matthias); D. Geschwind (Daniel); C.M.T. Greenwood (Celia); H. Griffin (Heather); D. Grozeva (Detelina); X. Guo (Xiaosen); X. Guo (Xueqin); H. Gurling (Hugh); D. Hart (Deborah); A.E. Hendricks (Audrey E.); P.A. Holmans (Peter A.); L. Huang (Liren); T. Hubbard (Tim); S.E. Humphries (Steve E.); M.E. Hurles (Matthew); P.G. Hysi (Pirro); V. Iotchkova (Valentina); A. Isaacs (Aaron); D.K. Jackson (David K.); Y. Jamshidi (Yalda); J. Johnson (Jon); C. Joyce (Chris); K.J. Karczewski (Konrad); J. Kaye (Jane); T. Keane (Thomas); J.P. Kemp (John); K. Kennedy (Karen); A. Kent (Alastair); J. Keogh (Julia); F. Khawaja (Farrah); M.E. Kleber (Marcus); M. Van Kogelenberg (Margriet); A. Kolb-Kokocinski (Anja); J.S. Kooner (Jaspal S.); G. Lachance (Genevieve); C. Langenberg (Claudia); C. Langford (Cordelia); D. Lawson (Daniel); I. Lee (Irene); E.M. van Leeuwen (Elisa); M. Lek (Monkol); R. Li (Rui); Y. Li (Yingrui); J. Liang (Jieqin); H. Lin (Hong); R. Liu (Ryan); J. Lönnqvist (Jouko); L.R. Lopes (Luis R.); M.C. Lopes (Margarida); J. Luan; D.G. MacArthur (Daniel G.); M. Mangino (Massimo); G. Marenne (Gaëlle); W. März (Winfried); J. Maslen (John); A. Matchan (Angela); I. Mathieson (Iain); P. McGuffin (Peter); A.M. McIntosh (Andrew); A.G. McKechanie (Andrew G.); A. McQuillin (Andrew); S. Metrustry (Sarah); N. Migone (Nicola); H.M. Mitchison (Hannah M.); A. Moayyeri (Alireza); J. Morris (James); R. Morris (Richard); D. Muddyman (Dawn); F. Muntoni; B.G. Nordestgaard (Børge G.); K. Northstone (Kate); M.C. O'donovan (Michael); S. O'Rahilly (Stephen); A. Onoufriadis (Alexandros); K. Oualkacha (Karim); M.J. Owen (Michael J.); A. Palotie (Aarno); K. Panoutsopoulou (Kalliope); V. Parker (Victoria); J.R. Parr (Jeremy R.); L. Paternoster (Lavinia); T. Paunio (Tiina); F. Payne (Felicity); S.J. Payne (Stewart J.); J.R.B. Perry (John); O.P.H. Pietiläinen (Olli); V. Plagnol (Vincent); R.C. Pollitt (Rebecca C.); S. Povey (Sue); M.A. Quail (Michael A.); L. Quaye (Lydia); L. Raymond (Lucy); K. Rehnström (Karola); C.K. Ridout (Cheryl K.); S.M. Ring (Susan); G.R.S. Ritchie (Graham R.S.); N. Roberts (Nicola); R.L. Robinson (Rachel L.); D.B. Savage (David); P.J. Scambler (Peter); S. Schiffels (Stephan); M. Schmidts (Miriam); N. Schoenmakers (Nadia); R.H. Scott (Richard H.); R.A. Scott (Robert); R.K. Semple (Robert K.); E. Serra (Eva); S.I. Sharp (Sally I.); A.C. Shaw (Adam C.); H.A. Shihab (Hashem A.); S.-Y. Shin (So-Youn); D. Skuse (David); K.S. Small (Kerrin); C. Smee (Carol); G.D. Smith; L. Southam (Lorraine); O. Spasic-Boskovic (Olivera); T.D. Spector (Timothy); D. St. Clair (David); B. St Pourcain (Beate); J. Stalker (Jim); E. Stevens (Elizabeth); J. Sun (Jianping); G. Surdulescu (Gabriela); J. Suvisaari (Jaana); P. Syrris (Petros); I. Tachmazidou (Ioanna); R. Taylor (Rohan); J. Tian (Jing); M.D. Tobin (Martin); D. Toniolo (Daniela); M. Traglia (Michela); A. Tybjaerg-Hansen; A.M. Valdes; A.M. Vandersteen (Anthony M.); A. Varbo (Anette); P. Vijayarangakannan (Parthiban); P.M. Visscher (Peter); L.V. Wain (Louise); J.T. Walters (James); G. Wang (Guangbiao); J. Wang (Jun); Y. Wang (Yu); K. Ward (Kirsten); E. Wheeler (Eleanor); P.H. Whincup (Peter); T. Whyte (Tamieka); H.J. Williams (Hywel J.); K.A. Williamson (Kathleen); C. Wilson (Crispian); S.G. Wilson (Scott); K. Wong (Kim); C. Xu (Changjiang); J. Yang (Jian); G. Zaza (Gianluigi); E. Zeggini (Eleftheria); F. Zhang (Feng); P. Zhang (Pingbo); W. Zhang (Weihua)

    2015-01-01

    textabstractImputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced

  18. Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels

    NARCIS (Netherlands)

    van Leeuwen, E.M.; Karssen, L.C.; Deelen, J.; Isaacs, A.; Medina-Gomez, C.; Mbarek, H.; Kanterakis, A.; Trompet, S.; Postmus, I.; Verweij, N.; van Enckevort, D.; Huffman, J.E.; White, C.C.; Feitosa, M.F.; Bartz, T.M.; Manichaikul, A.; Joshi, P.K.; Peloso, G.M.; Deelen, P.; Dijk, F.; Willemsen, G.; de Geus, E.J.C.; Milaneschi, Y.; Penninx, B.W.J.H.; Francioli, L.C.; Menelaou, A.; Pulit, S.L.; Rivadeneira, F.; Hofman, A.; Oostra, B.A.; Franco, O.H.; Mateo Leach, I.; Beekman, M.; de Craen, A.J.; Uh, H.W.; Trochet, H.; Hocking, L.J.; Porteous, D.J.; Sattar, N.; Packard, C.J.; Buckley, B.M.; Brody, J.A.; Bis, J.C.; Rotter, J.I.; Mychaleckyj, J.C.; Campbell, H.; Duan, Q.; Lange, L.A.; Wilson, J.F.; Hayward, C.; Polasek, O.; Vitart, V.; Rudan, I.; Wright, A.F.; Rich, S.S.; Psaty, B.M.; Borecki, I.B.; Kearney, P.M.; Stott, D.J.; Cupples, L.A.; Jukema, J.W.; van der Harst, P.; Sijbrands, E.J.; Hottenga, J.J.; Uitterlinden, A.G.; Swertz, M.A.; van Ommen, G.J.B; Bakker, P.I.W.; Slagboom, P.E.; Boomsma, D.I.; Wijmenga, C.; van Duijn, C.M.

    2015-01-01

    Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (∼35,000 samples) with the population-specific reference panel created

  19. 31 CFR 19.630 - May the Department of the Treasury impute conduct of one person to another?

    Science.gov (United States)

    2010-07-01

    ... 31 Money and Finance: Treasury 1 2010-07-01 2010-07-01 false May the Department of the Treasury impute conduct of one person to another? 19.630 Section 19.630 Money and Finance: Treasury Office of the Secretary of the Treasury GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles...

  20. Using geographic information systems

    International Nuclear Information System (INIS)

    Winsor, R.W.

    1997-01-01

    A true Geographic Information System (GIS) is a computer mapping system with spatial analysis ability and cartographic accuracy that will offer many different projections. GIS has evolved to become an everyday tool for a wide range of users including oil companies, worldwide. Other systems are designed to allow oil and gas companies to keep their upstream data in the same format. Among these are the Public Petroleum Data Model developed by Gulf Canada, Digitech and Applied Terravision Systems of Calgary, the system developed and marketed by the Petrotechnical Open Software Corporation in the United States, and the Mercury projects by IBM. These have been developed in an effort to define an industry standard. The advantages and disadvantages of open and closed systems were discussed. Factors to consider when choosing a GIS system such as overall performance, area of use and query complexity, were reviewed. 3 figs

  1. Evaluation of Subcutaneous Proleukin (interleukin-2) in a Randomized International Trial (ESPRIT): geographical and gender differences in the baseline characteristics of participants

    NARCIS (Netherlands)

    Pett, S. L.; Wand, H.; Law, M. G.; Arduino, R.; Lopez, J. C.; Knysz, B.; Pereira, L. C.; Pollack, S.; Reiss, P.; Tambussi, G.

    2006-01-01

    BACKGROUND: ESPRIT, is a phase III, open-label, randomized, international clinical trial evaluating the effects of subcutaneous recombinant interleukin-2 (rIL-2) plus antiretroviral therapy (ART) versus ART alone on HIV-disease progression and death in HIV-1-infected individuals with CD4+ T-cells >

  2. A Nonparametric, Multiple Imputation-Based Method for the Retrospective Integration of Data Sets

    Science.gov (United States)

    Carrig, Madeline M.; Manrique-Vallier, Daniel; Ranby, Krista W.; Reiter, Jerome P.; Hoyle, Rick H.

    2015-01-01

    Complex research questions often cannot be addressed adequately with a single data set. One sensible alternative to the high cost and effort associated with the creation of large new data sets is to combine existing data sets containing variables related to the constructs of interest. The goal of the present research was to develop a flexible, broadly applicable approach to the integration of disparate data sets that is based on nonparametric multiple imputation and the collection of data from a convenient, de novo calibration sample. We demonstrate proof of concept for the approach by integrating three existing data sets containing items related to the extent of problematic alcohol use and associations with deviant peers. We discuss both necessary conditions for the approach to work well and potential strengths and weaknesses of the method compared to other data set integration approaches. PMID:26257437

  3. Impute DC link (IDCL) cell based power converters and control thereof

    Science.gov (United States)

    Divan, Deepakraj M.; Prasai, Anish; Hernendez, Jorge; Moghe, Rohit; Iyer, Amrit; Kandula, Rajendra Prasad

    2016-04-26

    Power flow controllers based on Imputed DC Link (IDCL) cells are provided. The IDCL cell is a self-contained power electronic building block (PEBB). The IDCL cell may be stacked in series and parallel to achieve power flow control at higher voltage and current levels. Each IDCL cell may comprise a gate drive, a voltage sharing module, and a thermal management component in order to facilitate easy integration of the cell into a variety of applications. By providing direct AC conversion, the IDCL cell based AC/AC converters reduce device count, eliminate the use of electrolytic capacitors that have life and reliability issues, and improve system efficiency compared with similarly rated back-to-back inverter system.

  4. GACT: a Genome build and Allele definition Conversion Tool for SNP imputation and meta-analysis in genetic association studies.

    Science.gov (United States)

    Sulovari, Arvis; Li, Dawei

    2014-07-19

    Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions. In this study, we have developed a tool, GACT, which stands for Genome build and Allele definition Conversion Tool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate > 0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs. GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of genotypes from SNP-arrays or deep

  5. [Geographic data for Neotropical bats (Chiroptera)].

    Science.gov (United States)

    Noguera-Urbano, Elkin A; Escalante, Tania

    2014-03-01

    The global effort to digitize biodiversity occurrence data from collections, museums and other institutions has stimulated the development of important tools to improve the knowledge and conservation of biodiversity. The Global Biodiversity Information Facility (GBIF) enables and opens access to biodiversity data of 321 million of records, from 379 host institutions. Neotropical bats are a highly diverse and specialized group, and the geographic information about them is increasing since few years ago, but there are a few reports about this topic. The aim of this study was to analyze the number of digital records in GBIF of Neotropical bats with distribution in 21 American countries, evaluating their nomenclatural and geographical consistence at scale of country. Moreover, we evaluated the gaps of information on 1 degrees latitude x 1 degrees longitude grids cells. There were over 1/2 million records, but 58% of them have no latitude and longitude data; and 52% full fit nomenclatural and geographic evaluation. We estimated that there are no records in 54% of the analyzed area; the principal gaps are in biodiversity hotspots like the Colombian and Brazilian Amazonia and Southern Venezuela. In conclusion, our study suggests that available data on GBIF have nomenclatural and geographic biases. GBIF data represent partially the bat species richness and the main gaps in information are in South America.

  6. Evaluating imputation algorithms for low-depth genotyping-by-sequencing (GBS) data

    Science.gov (United States)

    Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordabl...

  7. ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

    Directory of Open Access Journals (Sweden)

    Kamatani Naoyuki

    2011-05-01

    Full Text Available Abstract Background Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required. Results We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo. Conclusions ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.

  8. GEOGRAPHIC NAMES INFORMATION SYSTEM (GNIS) ...

    Science.gov (United States)

    The Geographic Names Information System (GNIS), developed by the U.S. Geological Survey in cooperation with the U.S. Board on Geographic Names (BGN), contains information about physical and cultural geographic features in the United States and associated areas, both current and historical, but not including roads and highways. The database also contains geographic names in Antarctica. The database holds the Federally recognized name of each feature and defines the location of the feature by state, county, USGS topographic map, and geographic coordinates. Other feature attributes include names or spellings other than the official name, feature designations, feature class, historical and descriptive information, and for some categories of features the geometric boundaries. The database assigns a unique feature identifier, a random number, that is a key for accessing, integrating, or reconciling GNIS data with other data sets. The GNIS is our Nation's official repository of domestic geographic feature names information.

  9. Coloring geographical threshold graphs

    Energy Technology Data Exchange (ETDEWEB)

    Bradonjic, Milan [Los Alamos National Laboratory; Percus, Allon [Los Alamos National Laboratory; Muller, Tobias [EINDHOVEN UNIV. OF TECH

    2008-01-01

    We propose a coloring algorithm for sparse random graphs generated by the geographical threshold graph (GTG) model, a generalization of random geometric graphs (RGG). In a GTG, nodes are distributed in a Euclidean space, and edges are assigned according to a threshold function involving the distance between nodes as well as randomly chosen node weights. The motivation for analyzing this model is that many real networks (e.g., wireless networks, the Internet, etc.) need to be studied by using a 'richer' stochastic model (which in this case includes both a distance between nodes and weights on the nodes). Here, we analyze the GTG coloring algorithm together with the graph's clique number, showing formally that in spite of the differences in structure between GTG and RGG, the asymptotic behavior of the chromatic number is identical: {chi}1n 1n n / 1n n (1 + {omicron}(1)). Finally, we consider the leading corrections to this expression, again using the coloring algorithm and clique number to provide bounds on the chromatic number. We show that the gap between the lower and upper bound is within C 1n n / (1n 1n n){sup 2}, and specify the constant C.

  10. Grapes from the geographical areas of the Black Sea: Agroclimatic growing conditions and evaluation of stable isotopes compositions in scientific study

    Directory of Open Access Journals (Sweden)

    Kolesnov Alexander

    2016-01-01

    Full Text Available The report considers the agroclimatic conditions in the Black Sea districts of cultivation and processing of grapes - the Black Sea Lowland, the Crimean Peninsula and the South-west coastal areas of the Greater Caucasus. The IRMS/SIRA techniques - Flash combustion (FC-IRMS/SIRA & Isotopic equilibration (EQ-IRMS/SIRA - were first applied for the evaluation of carbon and oxygen isotopes ratios in the components of grapes from the Crimean Peninsula. The 13C/12C ratios were studied by the FC-IRMS/SIRA in carbohydrates and organic acids in authentic samples of 8 grape varieties from the 2015 harvest. The EQ-IRMS/SIRA was applied to measure the 18O/16O ratios in intracellular water of grapes. The measured δ13CVPDB value ranges from − 25.01 to − 21.01‰ (for carbohydrates, and from − 25.09 to − 21.30‰ (for organic acids. To evaluate the extent of biological isotope fractionation the 18O/16O ratios were measured in ground water and water of atmospheric precipitates from the Crimean Peninsula. Compared to ground (δ18OVSMOW from − 10.85 to − 8.14‰ and atmospheric (average δ18OVSMOW− 2.85‰ waters, the intracellular water of Crimean grape varieties is found to be enriched with 18O isotope. The δ18OVSMOW value of the grape intracellular water varies from 2.34 to 5.29‰ according to agroclimatic conditions of the season in 2015.

  11. Imputation of the rare HOXB13 G84E mutation and cancer risk in a large population-based cohort.

    Directory of Open Access Journals (Sweden)

    Thomas J Hoffmann

    2015-01-01

    Full Text Available An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into large well-phenotyped cohorts with existing genome-wide genotype data using large sequenced referenced panels. The success of this approach hinges on the accuracy of rare variant imputation, which remains controversial. For example, a recent study suggested that one cannot adequately impute the HOXB13 G84E mutation associated with prostate cancer risk (carrier frequency of 0.0034 in European ancestry participants in the 1000 Genomes Project. We show that by utilizing the 1000 Genomes Project data plus an enriched reference panel of mutation carriers we were able to accurately impute the G84E mutation into a large cohort of 83,285 non-Hispanic White participants from the Kaiser Permanente Research Program on Genes, Environment and Health Genetic Epidemiology Research on Adult Health and Aging cohort. Imputation authenticity was confirmed via a novel classification and regression tree method, and then empirically validated analyzing a subset of these subjects plus an additional 1,789 men from Kaiser specifically genotyped for the G84E mutation (r2 = 0.57, 95% CI = 0.37–0.77. We then show the value of this approach by using the imputed data to investigate the impact of the G84E mutation on age-specific prostate cancer risk and on risk of fourteen other cancers in the cohort. The age-specific risk of prostate cancer among G84E mutation carriers was higher than among non-carriers. Risk estimates from Kaplan-Meier curves were 36.7% versus 13.6% by age 72, and 64.2% versus 24.2% by age 80, for G84E mutation carriers and non-carriers, respectively (p = 3.4x10-12. The G84E mutation was also associated with an increase in risk for the fourteen other most common cancers considered collectively (p = 5.8x10-4 and more so in cases diagnosed with multiple cancer types, both those including and not including prostate cancer, strongly suggesting

  12. Determinants of Dentists' Geographic Distribution.

    Science.gov (United States)

    Beazoglou, Tryfon J.; And Others

    1992-01-01

    A model for explaining the geographic distribution of dentists' practice locations is presented and applied to particular market areas in Connecticut. Results show geographic distribution is significantly related to a few key variables, including demography, disposable income, and housing prices. Implications for helping students make practice…

  13. Comparison of Imputation Methods for Handling Missing Categorical Data with Univariate Pattern|| Una comparación de métodos de imputación de variables categóricas con patrón univariado

    Directory of Open Access Journals (Sweden)

    Torres Munguía, Juan Armando

    2014-06-01

    Full Text Available This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random forests. Results showed that the most effective methods for dealing with missing categorical data in most of the scenarios assessed in this paper were hot-deck and polytomous regression approaches. || El presente estudio examina la estimación de proporciones muestrales en la presencia de valores faltantes en una variable categórica. Se utiliza una encuesta de consumo de tabaco (Encuesta Nacional de Adicciones de México 2011 para crear bases de datos simuladas pero reales con 5% y 15% de valores perdidos para cada mecanismo de no respuesta MCAR, MAR y MNAR. Se evalúa el desempeño de seis métodos para tratar la falta de respuesta: listwise, imputación de moda, imputación aleatoria, hot-deck, imputación por regresión politómica y árboles de clasificación. Los resultados de las simulaciones indican que los métodos más efectivos para el tratamiento de la no respuesta en variables categóricas, bajo los escenarios simulados, son hot-deck y la regresión politómica.

  14. Geographic information systems: introduction.

    Science.gov (United States)

    Calistri, Paolo; Conte, Annamaria; Freier, Jerome E; Ward, Michael P

    2007-01-01

    The recent exponential growth of the science and technology of geographic information systems (GIS) has made a tremendous contribution to epidemiological analysis and has led to the development of new powerful tools for the surveillance of animal diseases. GIS, spatial analysis and remote sensing provide valuable methods to collect and manage information for epidemiological surveys. Spatial patterns and trends of disease can be correlated with climatic and environmental information, thus contributing to a better understanding of the links between disease processes and explanatory spatial variables. Until recently, these tools were underexploited in the field of veterinary public health, due to the prohibitive cost of hardware and the complexity of GIS software that required a high level of expertise. The revolutionary developments in computer performance of the last decade have not only reduced the costs of equipment but have made available easy-to-use Web-based software which in turn have meant that GIS are more widely accessible by veterinary services at all levels. At the same time, the increased awareness of the possibilities offered by these tools has created new opportunities for decision-makers to enhance their planning, analysis and monitoring capabilities. These technologies offer a new way of sharing and accessing spatial and non-spatial data across groups and institutions. The series of papers included in this compilation aim to: - define the state of the art in the use of GIS in veterinary activities - identify priority needs in the development of new GIS tools at the international level for the surveillance of animal diseases and zoonoses - define practical proposals for their implementation. The topics addressed are presented in the following order in this book: - importance of GIS for the monitoring of animal diseases and zoonoses - GIS application in surveillance activities - spatial analysis in veterinary epidemiology - data collection and remote

  15. Geographic assistance of decontamination strategy elaboration

    International Nuclear Information System (INIS)

    Davydchuk, V.; Arapis, G.

    1996-01-01

    Those who elaborates the strategy of decontamination of vast territories is to take into consideration the heterogeneity of such elements of landscape as relief, lithology, humidity and types of soils and, vegetation, both on local and regional level. Geographic assistance includes evaluation of efficacy of decontamination technologies in different natural conditions, identification of areas of their effective application and definition of ecological damage, estimation of balances of the radionuclides in the landscapes to create background of the decontamination strategy

  16. Soil microbial communities: Influence of geographic location and hydrocarbon pollutants

    CSIR Research Space (South Africa)

    Maila, MP

    2006-02-01

    Full Text Available The importance and relevance of the geographical origin of the soil sample and the hydrocarbons in determining the functional or species diversity within different bacterial communities was evaluated using the community level physiological profiles...

  17. Geographic information system for pigweed distribution in the US Southeast

    Science.gov (United States)

    In the southeastern United States, pigweeds have become troublesome weeds in agricultural systems. To implement management strategies to control them, agriculturalists need information on areas affected by pigweeds. Geographic information systems (GIS) afford users the ability to evaluate agricult...

  18. Treatments of Missing Values in Large National Data Affect Conclusions: The Impact of Multiple Imputation on Arthroplasty Research.

    Science.gov (United States)

    Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Su, Edwin P; Grauer, Jonathan N

    2018-03-01

    Despite the advantages of large, national datasets, one continuing concern is missing data values. Complete case analysis, where only cases with complete data are analyzed, is commonly used rather than more statistically rigorous approaches such as multiple imputation. This study characterizes the potential selection bias introduced using complete case analysis and compares the results of common regressions using both techniques following unicompartmental knee arthroplasty. Patients undergoing unicompartmental knee arthroplasty were extracted from the 2005 to 2015 National Surgical Quality Improvement Program. As examples, the demographics of patients with and without missing preoperative albumin and hematocrit values were compared. Missing data were then treated with both complete case analysis and multiple imputation (an approach that reproduces the variation and associations that would have been present in a full dataset) and the conclusions of common regressions for adverse outcomes were compared. A total of 6117 patients were included, of which 56.7% were missing at least one value. Younger, female, and healthier patients were more likely to have missing preoperative albumin and hematocrit values. The use of complete case analysis removed 3467 patients from the study in comparison with multiple imputation which included all 6117 patients. The 2 methods of handling missing values led to differing associations of low preoperative laboratory values with commonly studied adverse outcomes. The use of complete case analysis can introduce selection bias and may lead to different conclusions in comparison with the statistically rigorous multiple imputation approach. Joint surgeons should consider the methods of handling missing values when interpreting arthroplasty research. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. Assessment of Consequences of Replacement of System of the Uniform Tax on Imputed Income Patent System of the Taxation

    Directory of Open Access Journals (Sweden)

    Galina A. Manokhina

    2012-11-01

    Full Text Available The article highlights the main questions concerning possible consequences of replacement of nowadays operating system in the form of a single tax in reference to imputed income with patent system of the taxation. The main advantages and drawbacks of new system of the taxation are shown, including the opinion that not the replacement of one special mode of the taxation with another is more effective, but the introduction of patent a taxation system as an auxilary system.

  20. Department of Geograph

    African Journals Online (AJOL)

    USER

    2016-11-18

    Nov 18, 2016 ... which were obtained from Forestry Management Evaluation and Coordination Unit and were entered and use to develop a flood risk information system. ... end of the study, maps of flood vulnerable areas in the river basin ... from storing and managing hydrological ... at one time an arm of the Atlantic Ocean.

  1. Handling missing data for the identification of charged particles in a multilayer detector: A comparison between different imputation methods

    Energy Technology Data Exchange (ETDEWEB)

    Riggi, S., E-mail: sriggi@oact.inaf.it [INAF - Osservatorio Astrofisico di Catania (Italy); Riggi, D. [Keras Strategy - Milano (Italy); Riggi, F. [Dipartimento di Fisica e Astronomia - Università di Catania (Italy); INFN, Sezione di Catania (Italy)

    2015-04-21

    Identification of charged particles in a multilayer detector by the energy loss technique may also be achieved by the use of a neural network. The performance of the network becomes worse when a large fraction of information is missing, for instance due to detector inefficiencies. Algorithms which provide a way to impute missing information have been developed over the past years. Among the various approaches, we focused on normal mixtures’ models in comparison with standard mean imputation and multiple imputation methods. Further, to account for the intrinsic asymmetry of the energy loss data, we considered skew-normal mixture models and provided a closed form implementation in the Expectation-Maximization (EM) algorithm framework to handle missing patterns. The method has been applied to a test case where the energy losses of pions, kaons and protons in a six-layers’ Silicon detector are considered as input neurons to a neural network. Results are given in terms of reconstruction efficiency and purity of the various species in different momentum bins.

  2. RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning.

    Directory of Open Access Journals (Sweden)

    Ji-Sung Kim

    2018-04-01

    Full Text Available Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest, support vector machines, and gradient-boosted decision trees. RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error, precision, recall, and area under the curve for receiver operating characteristic plots (all p < 10-9. We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1 a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2 uneven accessibility and subjective importance of prophylactic health, (3 possible variation in lifestyle, such as dietary habits, and (4 differences in background genetic variation which predispose to diseases.

  3. Imputation-based analysis of association studies: candidate regions and quantitative traits.

    Directory of Open Access Journals (Sweden)

    Bertrand Servin

    2007-07-01

    Full Text Available We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest with genotype data at tag SNPs collected on a phenotyped study sample, to estimate ("impute" unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene, the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html.

  4. A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method.

    Science.gov (United States)

    Yang, Jun-He; Cheng, Ching-Hsue; Chan, Chia-Pan

    2017-01-01

    Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir's water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir's water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.

  5. A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method

    Directory of Open Access Journals (Sweden)

    Jun-He Yang

    2017-01-01

    Full Text Available Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir’s water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir’s water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.

  6. Multiple imputation of rainfall missing data in the Iberian Mediterranean context

    Science.gov (United States)

    Miró, Juan Javier; Caselles, Vicente; Estrela, María José

    2017-11-01

    Given the increasing need for complete rainfall data networks, in recent years have been proposed diverse methods for filling gaps in observed precipitation series, progressively more advanced that traditional approaches to overcome the problem. The present study has consisted in validate 10 methods (6 linear, 2 non-linear and 2 hybrid) that allow multiple imputation, i.e., fill at the same time missing data of multiple incomplete series in a dense network of neighboring stations. These were applied for daily and monthly rainfall in two sectors in the Júcar River Basin Authority (east Iberian Peninsula), which is characterized by a high spatial irregularity and difficulty of rainfall estimation. A classification of precipitation according to their genetic origin was applied as pre-processing, and a quantile-mapping adjusting as post-processing technique. The results showed in general a better performance for the non-linear and hybrid methods, highlighting that the non-linear PCA (NLPCA) method outperforms considerably the Self Organizing Maps (SOM) method within non-linear approaches. On linear methods, the Regularized Expectation Maximization method (RegEM) was the best, but far from NLPCA. Applying EOF filtering as post-processing of NLPCA (hybrid approach) yielded the best results.

  7. Multiple imputation for estimating the risk of developing dementia and its impact on survival.

    Science.gov (United States)

    Yu, Binbing; Saczynski, Jane S; Launer, Lenore

    2010-10-01

    Dementia, Alzheimer's disease in particular, is one of the major causes of disability and decreased quality of life among the elderly and a leading obstacle to successful aging. Given the profound impact on public health, much research has focused on the age-specific risk of developing dementia and the impact on survival. Early work has discussed various methods of estimating age-specific incidence of dementia, among which the illness-death model is popular for modeling disease progression. In this article we use multiple imputation to fit multi-state models for survival data with interval censoring and left truncation. This approach allows semi-Markov models in which survival after dementia depends on onset age. Such models can be used to estimate the cumulative risk of developing dementia in the presence of the competing risk of dementia-free death. Simulations are carried out to examine the performance of the proposed method. Data from the Honolulu Asia Aging Study are analyzed to estimate the age-specific and cumulative risks of dementia and to examine the effect of major risk factors on dementia onset and death.

  8. Analysis of Case-Control Association Studies: SNPs, Imputation and Haplotypes

    KAUST Repository

    Chatterjee, Nilanjan

    2009-11-01

    Although prospective logistic regression is the standard method of analysis for case-control data, it has been recently noted that in genetic epidemiologic studies one can use the "retrospective" likelihood to gain major power by incorporating various population genetics model assumptions such as Hardy-Weinberg-Equilibrium (HWE), gene-gene and gene-environment independence. In this article we review these modern methods and contrast them with the more classical approaches through two types of applications (i) association tests for typed and untyped single nucleotide polymorphisms (SNPs) and (ii) estimation of haplotype effects and haplotype-environment interactions in the presence of haplotype-phase ambiguity. We provide novel insights to existing methods by construction of various score-tests and pseudo-likelihoods. In addition, we describe a novel two-stage method for analysis of untyped SNPs that can use any flexible external algorithm for genotype imputation followed by a powerful association test based on the retrospective likelihood. We illustrate applications of the methods using simulated and real data. © Institute of Mathematical Statistics, 2009.

  9. Analysis of Case-Control Association Studies: SNPs, Imputation and Haplotypes

    KAUST Repository

    Chatterjee, Nilanjan; Chen, Yi-Hau; Luo, Sheng; Carroll, Raymond J.

    2009-01-01

    Although prospective logistic regression is the standard method of analysis for case-control data, it has been recently noted that in genetic epidemiologic studies one can use the "retrospective" likelihood to gain major power by incorporating various population genetics model assumptions such as Hardy-Weinberg-Equilibrium (HWE), gene-gene and gene-environment independence. In this article we review these modern methods and contrast them with the more classical approaches through two types of applications (i) association tests for typed and untyped single nucleotide polymorphisms (SNPs) and (ii) estimation of haplotype effects and haplotype-environment interactions in the presence of haplotype-phase ambiguity. We provide novel insights to existing methods by construction of various score-tests and pseudo-likelihoods. In addition, we describe a novel two-stage method for analysis of untyped SNPs that can use any flexible external algorithm for genotype imputation followed by a powerful association test based on the retrospective likelihood. We illustrate applications of the methods using simulated and real data. © Institute of Mathematical Statistics, 2009.

  10. RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

    KAUST Repository

    Kim, Ji-Sung

    2018-04-26

    Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest, support vector machines, and gradient-boosted decision trees). RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error), precision, recall, and area under the curve for receiver operating characteristic plots (all p < 10-9). We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1) a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2) uneven accessibility and subjective importance of prophylactic health, (3) possible variation in lifestyle, such as dietary habits, and (4) differences in background genetic variation which predispose to diseases.

  11. NEPR Geographic Zone Map 2015

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This geographic zone map was created by interpreting satellite and aerial imagery, seafloor topography (bathymetry model), and the new NEPR Benthic Habitat Map...

  12. Ecoscapes: Geographical Patternings of Relations

    Directory of Open Access Journals (Sweden)

    Aimar Ventsel

    2012-06-01

    Full Text Available Book review of the publication Ecoscapes: Geographical Patternings of Relations. Edited by Gary Backhaus and John Murungi. Lanham, Boulder, New York, Toronto, Oxford, Lexington Books, 2006, xxxiii+241 pp.

  13. Ecoscapes: Geographical Patternings of Relations

    Directory of Open Access Journals (Sweden)

    Aimar Ventsel

    2013-01-01

    Full Text Available Book review of the publication Ecoscapes: Geographical Patternings of Relations. Edited by Gary Backhaus and John Murungi. Lanham, Boulder, New York, Toronto, Oxford, Lexington Books, 2006, xxxiii+241 pp.

  14. Geographic variation in gorilla limb bones.

    Science.gov (United States)

    Jabbour, Rebecca S; Pearman, Tessa L

    2016-06-01

    Gorilla systematics has received increased attention over recent decades from primatologists, conservationists, and paleontologists. Studies of geographic variation in DNA, skulls, and teeth have led to new taxonomic proposals, such as recognition of two gorilla species, Gorilla gorilla (western gorilla) and Gorilla beringei (eastern gorilla). Postcranial differences between mountain gorillas (G. beringei beringei) and western lowland gorillas (G. g. gorilla) have a long history of study, but differences between the limb bones of the eastern and western species have not yet been examined with an emphasis on geographic variation within each species. In addition, proposals for recognition of the Cross River gorilla as Gorilla gorilla diehli and gorillas from Tshiaberimu and Kahuzi as G. b. rex-pymaeorum have not been evaluated in the context of geographic variation in the forelimb and hindlimb skeletons. Forty-three linear measurements were collected from limb bones of 266 adult gorillas representing populations of G. b. beringei, Gorilla beringei graueri, G. g. gorilla, and G. g. diehli in order to investigate geographic diversity. Skeletal elements included the humerus, radius, third metacarpal, third proximal hand phalanx, femur, tibia, calcaneus, first metatarsal, third metatarsal, and third proximal foot phalanx. Comparisons of means and principal components analyses clearly differentiate eastern and western gorillas, indicating that eastern gorillas have absolutely and relatively smaller hands and feet, among other differences. Gorilla subspecies and populations cluster consistently by species, although G. g. diehli may be similar to the eastern gorillas in having small hands and feet. The subspecies of G. beringei are distinguished less strongly and by different variables than the two gorilla species. Populations of G. b. graueri are variable, and Kahuzi and Tshiaberimu specimens do not cluster together. Results support the possible influence of

  15. A geographical perspective

    International Nuclear Information System (INIS)

    Elhance, A.P.

    1991-01-01

    This chapter attempts to elucidate the various spatial dimensions of the problem of achieving a nuclear weapons agreement in South Asia. The contention here is that geography, more so than other factors, lies at the heart of all past conflicts and hostilities within and between the two potential nuclear powers in the region, Pakistan and India. The hypothesis that these two countries are destined to be irrevocably interlinked in social, cultural, economic, military-strategic and political arenas is addressed. The likelihood that Kashmir and Punjab are constant sources of most bilateral tensions between these countries is assessed. The primary objective of this discussion is to evaluate effects of geography on the achievement and verification of a nuclear agreement in South Asia

  16. Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation

    Science.gov (United States)

    Wang, Chaolong; Zhan, Xiaowei; Liang, Liming; Abecasis, Gonçalo R.; Lin, Xihong

    2015-01-01

    Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources. PMID:26027497

  17. An imputation/copula-based stochastic individual tree growth model for mixed species Acadian forests: a case study using the Nova Scotia permanent sample plot network

    Directory of Open Access Journals (Sweden)

    John A. KershawJr

    2017-09-01

    Full Text Available Background A novel approach to modelling individual tree growth dynamics is proposed. The approach combines multiple imputation and copula sampling to produce a stochastic individual tree growth and yield projection system. Methods The Nova Scotia, Canada permanent sample plot network is used as a case study to develop and test the modelling approach. Predictions from this model are compared to predictions from the Acadian variant of the Forest Vegetation Simulator, a widely used statistical individual tree growth and yield model. Results Diameter and height growth rates were predicted with error rates consistent with those produced using statistical models. Mortality and ingrowth error rates were higher than those observed for diameter and height, but also were within the bounds produced by traditional approaches for predicting these rates. Ingrowth species composition was very poorly predicted. The model was capable of reproducing a wide range of stand dynamic trajectories and in some cases reproduced trajectories that the statistical model was incapable of reproducing. Conclusions The model has potential to be used as a benchmarking tool for evaluating statistical and process models and may provide a mechanism to separate signal from noise and improve our ability to analyze and learn from large regional datasets that often have underlying flaws in sample design.

  18. Multiple sclerosis: a geographical hypothesis.

    Science.gov (United States)

    Carlyle, I P

    1997-12-01

    Multiple sclerosis remains a rare neurological disease of unknown aetiology, with a unique distribution, both geographically and historically. Rare in equatorial regions, it becomes increasingly common in higher latitudes; historically, it was first clinically recognized in the early nineteenth century. A hypothesis, based on geographical reasoning, is here proposed: that the disease is the result of a specific vitamin deficiency. Different individuals suffer the deficiency in separate and often unique ways. Evidence to support the hypothesis exists in cultural considerations, in the global distribution of the disease, and in its historical prevalence.

  19. A Geographical Heuristic Routing Protocol for VANETs

    Science.gov (United States)

    Urquiza-Aguiar, Luis; Tripp-Barba, Carolina; Aguilar Igartua, Mónica

    2016-01-01

    Vehicular ad hoc networks (VANETs) leverage the communication system of Intelligent Transportation Systems (ITS). Recently, Delay-Tolerant Network (DTN) routing protocols have increased their popularity among the research community for being used in non-safety VANET applications and services like traffic reporting. Vehicular DTN protocols use geographical and local information to make forwarding decisions. However, current proposals only consider the selection of the best candidate based on a local-search. In this paper, we propose a generic Geographical Heuristic Routing (GHR) protocol that can be applied to any DTN geographical routing protocol that makes forwarding decisions hop by hop. GHR includes in its operation adaptations simulated annealing and Tabu-search meta-heuristics, which have largely been used to improve local-search results in discrete optimization. We include a complete performance evaluation of GHR in a multi-hop VANET simulation scenario for a reporting service. Our study analyzes all of the meaningful configurations of GHR and offers a statistical analysis of our findings by means of MANOVA tests. Our results indicate that the use of a Tabu list contributes to improving the packet delivery ratio by around 5% to 10%. Moreover, if Tabu is used, then the simulated annealing routing strategy gets a better performance than the selection of the best node used with carry and forwarding (default operation). PMID:27669254

  20. Geographic Analysis of the Radiation Oncology Workforce

    International Nuclear Information System (INIS)

    Aneja, Sanjay; Smith, Benjamin D.; Gross, Cary P.; Wilson, Lynn D.; Haffty, Bruce G.; Roberts, Kenneth; Yu, James B.

    2012-01-01

    Purpose: To evaluate trends in the geographic distribution of the radiation oncology (RO) workforce. Methods and Materials: We used the 1995 and 2007 versions of the Area Resource File to map the ratio of RO to the population aged 65 years or older (ROR) within different health service areas (HSA) within the United States. We used regression analysis to find associations between population variables and 2007 ROR. We calculated Gini coefficients for ROR to assess the evenness of RO distribution and compared that with primary care physicians and total physicians. Results: There was a 24% increase in the RO workforce from 1995 to 2007. The overall growth in the RO workforce was less than that of primary care or the overall physician workforce. The mean ROR among HSAs increased by more than one radiation oncologist per 100,000 people aged 65 years or older, from 5.08 per 100,000 to 6.16 per 100,000. However, there remained consistent geographic variability concerning RO distribution, specifically affecting the non-metropolitan HSAs. Regression analysis found higher ROR in HSAs that possessed higher education (p = 0.001), higher income (p < 0.001), lower unemployment rates (p < 0.001), and higher minority population (p = 0.022). Gini coefficients showed RO distribution less even than for both primary care physicians and total physicians (0.326 compared with 0.196 and 0.292, respectively). Conclusions: Despite a modest growth in the RO workforce, there exists persistent geographic maldistribution of radiation oncologists allocated along socioeconomic and racial lines. To solve problems surrounding the RO workforce, issues concerning both gross numbers and geographic distribution must be addressed.

  1. Geographic Analysis of the Radiation Oncology Workforce

    Energy Technology Data Exchange (ETDEWEB)

    Aneja, Sanjay [Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT (United States); Cancer Outcomes, Policy, and Effectiveness Research Center at Yale, New Haven, CT (United States); Smith, Benjamin D. [University of Texas M. D. Anderson Cancer Center, Houston, TX (United States); Gross, Cary P. [Cancer Outcomes, Policy, and Effectiveness Research Center at Yale, New Haven, CT (United States); Department of General Internal Medicine, Yale University School of Medicine, New Haven, CT (United States); Wilson, Lynn D. [Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT (United States); Haffty, Bruce G. [Cancer Institute of New Jersey, New Brunswick, NJ (United States); Roberts, Kenneth [Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT (United States); Yu, James B., E-mail: james.b.yu@yale.edu [Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT (United States); Cancer Outcomes, Policy, and Effectiveness Research Center at Yale, New Haven, CT (United States)

    2012-04-01

    Purpose: To evaluate trends in the geographic distribution of the radiation oncology (RO) workforce. Methods and Materials: We used the 1995 and 2007 versions of the Area Resource File to map the ratio of RO to the population aged 65 years or older (ROR) within different health service areas (HSA) within the United States. We used regression analysis to find associations between population variables and 2007 ROR. We calculated Gini coefficients for ROR to assess the evenness of RO distribution and compared that with primary care physicians and total physicians. Results: There was a 24% increase in the RO workforce from 1995 to 2007. The overall growth in the RO workforce was less than that of primary care or the overall physician workforce. The mean ROR among HSAs increased by more than one radiation oncologist per 100,000 people aged 65 years or older, from 5.08 per 100,000 to 6.16 per 100,000. However, there remained consistent geographic variability concerning RO distribution, specifically affecting the non-metropolitan HSAs. Regression analysis found higher ROR in HSAs that possessed higher education (p = 0.001), higher income (p < 0.001), lower unemployment rates (p < 0.001), and higher minority population (p = 0.022). Gini coefficients showed RO distribution less even than for both primary care physicians and total physicians (0.326 compared with 0.196 and 0.292, respectively). Conclusions: Despite a modest growth in the RO workforce, there exists persistent geographic maldistribution of radiation oncologists allocated along socioeconomic and racial lines. To solve problems surrounding the RO workforce, issues concerning both gross numbers and geographic distribution must be addressed.

  2. A Note on the Effect of Data Clustering on the Multiple-Imputation Variance Estimator: A Theoretical Addendum to the Lewis et al. article in JOS 2014

    Directory of Open Access Journals (Sweden)

    He Yulei

    2016-03-01

    Full Text Available Multiple imputation is a popular approach to handling missing data. Although it was originally motivated by survey nonresponse problems, it has been readily applied to other data settings. However, its general behavior still remains unclear when applied to survey data with complex sample designs, including clustering. Recently, Lewis et al. (2014 compared single- and multiple-imputation analyses for certain incomplete variables in the 2008 National Ambulatory Medicare Care Survey, which has a nationally representative, multistage, and clustered sampling design. Their study results suggested that the increase of the variance estimate due to multiple imputation compared with single imputation largely disappears for estimates with large design effects. We complement their empirical research by providing some theoretical reasoning. We consider data sampled from an equally weighted, single-stage cluster design and characterize the process using a balanced, one-way normal random-effects model. Assuming that the missingness is completely at random, we derive analytic expressions for the within- and between-multiple-imputation variance estimators for the mean estimator, and thus conveniently reveal the impact of design effects on these variance estimators. We propose approximations for the fraction of missing information in clustered samples, extending previous results for simple random samples. We discuss some generalizations of this research and its practical implications for data release by statistical agencies.

  3. Using imputed genotype data in the joint score tests for genetic association and gene-environment interactions in case-control studies.

    Science.gov (United States)

    Song, Minsun; Wheeler, William; Caporaso, Neil E; Landi, Maria Teresa; Chatterjee, Nilanjan

    2018-03-01

    Genome-wide association studies (GWAS) are now routinely imputed for untyped single nucleotide polymorphisms (SNPs) based on various powerful statistical algorithms for imputation trained on reference datasets. The use of predicted allele counts for imputed SNPs as the dosage variable is known to produce valid score test for genetic association. In this paper, we investigate how to best handle imputed SNPs in various modern complex tests for genetic associations incorporating gene-environment interactions. We focus on case-control association studies where inference for an underlying logistic regression model can be performed using alternative methods that rely on varying degree on an assumption of gene-environment independence in the underlying population. As increasingly large-scale GWAS are being performed through consortia effort where it is preferable to share only summary-level information across studies, we also describe simple mechanisms for implementing score tests based on standard meta-analysis of "one-step" maximum-likelihood estimates across studies. Applications of the methods in simulation studies and a dataset from GWAS of lung cancer illustrate ability of the proposed methods to maintain type-I error rates for the underlying testing procedures. For analysis of imputed SNPs, similar to typed SNPs, the retrospective methods can lead to considerable efficiency gain for modeling of gene-environment interactions under the assumption of gene-environment independence. Methods are made available for public use through CGEN R software package. © 2017 WILEY PERIODICALS, INC.

  4. Improving accuracy of genomic prediction in Brangus cattle by adding animals with imputed low-density SNP genotypes.

    Science.gov (United States)

    Lopes, F B; Wu, X-L; Li, H; Xu, J; Perkins, T; Genho, J; Ferretti, R; Tait, R G; Bauck, S; Rosa, G J M

    2018-02-01

    Reliable genomic prediction of breeding values for quantitative traits requires the availability of sufficient number of animals with genotypes and phenotypes in the training set. As of 31 October 2016, there were 3,797 Brangus animals with genotypes and phenotypes. These Brangus animals were genotyped using different commercial SNP chips. Of them, the largest group consisted of 1,535 animals genotyped by the GGP-LDV4 SNP chip. The remaining 2,262 genotypes were imputed to the SNP content of the GGP-LDV4 chip, so that the number of animals available for training the genomic prediction models was more than doubled. The present study showed that the pooling of animals with both original or imputed 40K SNP genotypes substantially increased genomic prediction accuracies on the ten traits. By supplementing imputed genotypes, the relative gains in genomic prediction accuracies on estimated breeding values (EBV) were from 12.60% to 31.27%, and the relative gain in genomic prediction accuracies on de-regressed EBV was slightly small (i.e. 0.87%-18.75%). The present study also compared the performance of five genomic prediction models and two cross-validation methods. The five genomic models predicted EBV and de-regressed EBV of the ten traits similarly well. Of the two cross-validation methods, leave-one-out cross-validation maximized the number of animals at the stage of training for genomic prediction. Genomic prediction accuracy (GPA) on the ten quantitative traits was validated in 1,106 newly genotyped Brangus animals based on the SNP effects estimated in the previous set of 3,797 Brangus animals, and they were slightly lower than GPA in the original data. The present study was the first to leverage currently available genotype and phenotype resources in order to harness genomic prediction in Brangus beef cattle. © 2018 Blackwell Verlag GmbH.

  5. Changes at the National Geographic Society

    Science.gov (United States)

    Schwille, Kathleen

    2016-01-01

    For more than 125 years, National Geographic has explored the planet, unlocking its secrets and sharing them with the world. For almost thirty of those years, National Geographic has been committed to K-12 educators and geographic education through its Network of Alliances. As National Geographic begins a new chapter, they remain committed to the…

  6. Geographical differences in food allergy.

    Science.gov (United States)

    Bartra, Joan; García-Moral, Alba; Enrique, Ernesto

    2016-06-01

    Food allergy represents a health problem worldwide and leads to life-threatening reactions and even impairs quality of life. Epidemiological data during the past decades is very heterogeneous because of the use of different diagnostic procedures, and most studies have only been performed in specific geographical areas. The aim of this article is to review the available data on the geographical distribution of food allergies at the food source and molecular level and to link food allergy patterns to the aeroallergen influence in each area. Systematic reviews, meta-analysis, studies performed within the EuroPrevall Project and EAACI position papers regarding food allergy were analysed. The prevalence of food allergy sensitization differs between geographical areas, probably as a consequence of differences among populations, their habits and the influence of the cross-reactivity of aeroallergens and other sources of allergens. Geographical differences in food allergy are clearly evident at the allergenic molecular level, which seems to be directly influenced by the aeroallergens of each region and associated with specific clinical patterns.

  7. Educational Geographers and Applied Geography.

    Science.gov (United States)

    Frazier, John W.

    1979-01-01

    Describes the development of applied geography programs and restructuring of curricula with an emphasis on new technique and methodology courses, though retaining the liberal arts role. Educational geographers can help the programs to succeed through curriculum analysis, auditing, advising students, and liaison with other geography sources. (CK)

  8. Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the Arctic.

    Science.gov (United States)

    Hopke, P K; Liu, C; Rubin, D B

    2001-03-01

    Many chemical and environmental data sets are complicated by the existence of fully missing values or censored values known to lie below detection thresholds. For example, week-long samples of airborne particulate matter were obtained at Alert, NWT, Canada, between 1980 and 1991, where some of the concentrations of 24 particulate constituents were coarsened in the sense of being either fully missing or below detection limits. To facilitate scientific analysis, it is appealing to create complete data by filling in missing values so that standard complete-data methods can be applied. We briefly review commonly used strategies for handling missing values and focus on the multiple-imputation approach, which generally leads to valid inferences when faced with missing data. Three statistical models are developed for multiply imputing the missing values of airborne particulate matter. We expect that these models are useful for creating multiple imputations in a variety of incomplete multivariate time series data sets.

  9. The significant surface-water connectivity of "geographically isolated wetlands"

    Science.gov (United States)

    Calhoun, Aram J.K.; Mushet, David M.; Alexander, Laurie C.; DeKeyser, Edward S.; Fowler, Laurie; Lane, Charles R.; Lang, Megan W.; Rains, Mark C.; Richter, Stephen; Walls, Susan

    2017-01-01

    We evaluated the current literature, coupled with our collective research expertise, on surface-water connectivity of wetlands considered to be “geographically isolated” (sensu Tiner Wetlands 23:494–516, 2003a) to critically assess the scientific foundation of grouping wetlands based on the singular condition of being surrounded by uplands. The most recent research on wetlands considered to be “geographically isolated” shows the difficulties in grouping an ecological resource that does not reliably indicate lack of surface water connectivity in order to meet legal, regulatory, or scientific needs. Additionally, the practice of identifying “geographically isolated wetlands” based on distance from a stream can result in gross overestimates of the number of wetlands lacking ecologically important surface-water connections. Our findings do not support use of the overly simplistic label of “geographically isolated wetlands”. Wetlands surrounded by uplands vary in function and surface-water connections based on wetland landscape setting, context, climate, and geographic region and should be evaluated as such. We found that the “geographically isolated” grouping does not reflect our understanding of the hydrologic variability of these wetlands and hence does not benefit conservation of the Nation’s diverse wetland resources. Therefore, we strongly discourage use of categorizations that provide overly simplistic views of surface-water connectivity of wetlands fully embedded in upland landscapes.

  10. Development of new historical global Nitrogen fertilizer map and the evaluation of their impacts on terrestrial N cycling and the evaluation of their impacts on terrestrial N cycling

    Science.gov (United States)

    Nishina, K.; Ito, A.; Hayashi, S.

    2015-12-01

    The use of synthetic nitrogen fertilizer was rapidly growing up after the birth of Haber-Bosch process in the early 20th century. The recent N loading derived from these sources on terrestrial ecosystems was estimated 2 times higher than biogenic N fixation in terrestrial ecosystems (Gruber et al., 2009). However, there are still large uncertainties in cumulative N impacts on terrestrial impact at global scale. In this study, to assess historical N impacts at global scale, we made a new global N fertilizer input map, which was a spatial-temporal explicit map (during 1960-2010) and considered the fraction of NH4+ and NO3- in the N fertilizer inputs. With the developed N fertilizer map, we evaluated historical N20 cycling changes by land-use changes and N depositions in N cycling using ecosystem model 'VISIT'. Prior to the downscaling processes for global N fertilizer map, we applied the statistical data imputation to FAOSTAT data due to there existing many missing data especially in developing countries. For the data imputation, we used multiple data imputation method proposed by Honaker & King (2010). The statistics of various types of synthetic fertilizer consumption are available in FAOSTAT, which can be sorted by the content of NH4+ and NO3-, respectively. To downscaling the country by country N fertilizer consumptions data to the 0.5˚x 0.5˚ grid-based map, we used historical land-use map in Earthstat (Rumankutty et al., 1999). Before the assignment of N fertilizer in each grid, we weighted the double cropping regions to be more N fertilizer input on to these regions. Using M3-Crops Data (Monfreda et al., 2008), we picked up the dominant cropping species in each grid cell. After that, we used Crop Calendar in SAGE dataset (Sacks et al., 2010) and determined schedule of N fertilizer input in each grid cell using dominant crop calendar. Base fertilizer was set to be 7 days before transplanting and second fertilizer to be 30 days after base fertilizer application

  11. An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency.

    Science.gov (United States)

    Guo, Wei-Li; Huang, De-Shuang

    2017-08-22

    Transcription factors (TFs) are DNA-binding proteins that have a central role in regulating gene expression. Identification of DNA-binding sites of TFs is a key task in understanding transcriptional regulation, cellular processes and disease. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) enables genome-wide identification of in vivo TF binding sites. However, it is still difficult to map every TF in every cell line owing to cost and biological material availability, which poses an enormous obstacle for integrated analysis of gene regulation. To address this problem, we propose a novel computational approach, TFBSImpute, for predicting additional TF binding profiles by leveraging information from available ChIP-seq TF binding data. TFBSImpute fuses the dataset to a 3-mode tensor and imputes missing TF binding signals via simultaneous completion of multiple TF binding matrices with positional consistency. We show that signals predicted by our method achieve overall similarity with experimental data and that TFBSImpute significantly outperforms baseline approaches, by assessing the performance of imputation methods against observed ChIP-seq TF binding profiles. Besides, motif analysis shows that TFBSImpute preforms better in capturing binding motifs enriched in observed data compared with baselines, indicating that the higher performance of TFBSImpute is not simply due to averaging related samples. We anticipate that our approach will constitute a useful complement to experimental mapping of TF binding, which is beneficial for further study of regulation mechanisms and disease.

  12. IL FENOMENO VOLUNTEERED GEOGRAPHIC INFORMATION

    Directory of Open Access Journals (Sweden)

    Flavio Lupia

    2014-12-01

    Full Text Available The contribution addresses the phenomenon of Voluntereed Geographic Informationexplaining these new and burgeoning sources of information offers multidisciplinary scientists an unprecedented opportunity to conduct research on a variety of topics at multiple spatial and temporal scales. In particular the contribution refers to two COST Actions which have been recently activated on the subject which areparticularly relevant for the growing of the European scientific community.

  13. Imputation of genotypes from low density (50,000 markers) to high density (700,000 markers) of cows from research herds in Europe, North America, and Australasia using 2 reference populations

    DEFF Research Database (Denmark)

    Pryce, J E; Johnston, J; Hayes, B J

    2014-01-01

    detection in genome-wide association studies and the accuracy of genomic selection may increase when the low-density genotypes are imputed to higher density. Genotype data were available from 10 research herds: 5 from Europe [Denmark, Germany, Ireland, the Netherlands, and the United Kingdom (UK)], 2 from...... reference populations. Although it was not possible to use a combined reference population, which would probably result in the highest accuracies of imputation, differences arising from using 2 high-density reference populations on imputing 50,000-marker genotypes of 583 animals (from the UK) were...... information exploited. The UK animals were also included in the North American data set (n = 1,579) that was imputed to high density using a reference population of 2,018 bulls. After editing, 591,213 genotypes on 5,999 animals from 10 research herds remained. The correlation between imputed allele...

  14. Geographic Names Information System (GNIS) Structures

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  15. Geographic Names Information System (GNIS) Historical Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  16. Geographic Names Information System (GNIS) Admin Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  17. Geographic Names Information System (GNIS) Hydrography Lines

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  18. Geographic Names Information System (GNIS) Cultural Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  19. Geographic Names Information System (GNIS) Landform Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  20. Geographic Names Information System (GNIS) Hydrography Points

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  1. Geographic Names Information System (GNIS) Community Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  2. Geographic Names Information System (GNIS) Transportation Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  3. Geographic Names Information System (GNIS) Antarctica Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  4. On Matrix Sampling and Imputation of Context Questionnaires with Implications for the Generation of Plausible Values in Large-Scale Assessments

    Science.gov (United States)

    Kaplan, David; Su, Dan

    2016-01-01

    This article presents findings on the consequences of matrix sampling of context questionnaires for the generation of plausible values in large-scale assessments. Three studies are conducted. Study 1 uses data from PISA 2012 to examine several different forms of missing data imputation within the chained equations framework: predictive mean…

  5. GRIMP: A web- and grid-based tool for high-speed analysis of large-scale genome-wide association using imputed data.

    NARCIS (Netherlands)

    K. Estrada Gil (Karol); A. Abuseiris (Anis); F.G. Grosveld (Frank); A.G. Uitterlinden (André); T.A. Knoch (Tobias); F. Rivadeneira Ramirez (Fernando)

    2009-01-01

    textabstractThe current fast growth of genome-wide association studies (GWAS) combined with now common computationally expensive imputation requires the online access of large user groups to high-performance computing resources capable of analyzing rapidly and efficiently millions of genetic

  6. A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data

    NARCIS (Netherlands)

    Y.J. Kim (Young Jin); J. Lee (Juyoung); B.-J. Kim (Bong-Jo); T. Park (Taesung); G.R. Abecasis (Gonçalo); M.A.A. De Almeida (Marcio); D. Altshuler (David); J.L. Asimit (Jennifer L.); G. Atzmon (Gil); M. Barber (Mathew); A. Barzilai (Ari); N.L. Beer (Nicola L.); G.I. Bell (Graeme I.); J. Below (Jennifer); T. Blackwell (Tom); J. Blangero (John); M. Boehnke (Michael); D.W. Bowden (Donald W.); N.P. Burtt (Noël); J.C. Chambers (John); H. Chen (Han); P. Chen (Ping); P.S. Chines (Peter); S. Choi (Sungkyoung); C. Churchhouse (Claire); P. Cingolani (Pablo); B.K. Cornes (Belinda); N.J. Cox (Nancy); A.G. Day-Williams (Aaron); A. Duggirala (Aparna); J. Dupuis (Josée); T. Dyer (Thomas); S. Feng (Shuang); J. Fernandez-Tajes (Juan); T. Ferreira (Teresa); T.E. Fingerlin (Tasha E.); J. Flannick (Jason); J.C. Florez (Jose); P. Fontanillas (Pierre); T.M. Frayling (Timothy); C. Fuchsberger (Christian); E. Gamazon (Eric); K. Gaulton (Kyle); S. Ghosh (Saurabh); B. Glaser (Benjamin); A.L. Gloyn (Anna); R.L. Grossman (Robert L.); J. Grundstad (Jason); C. Hanis (Craig); A. Heath (Allison); H. Highland (Heather); M. Horikoshi (Momoko); I.-S. Huh (Ik-Soo); J.R. Huyghe (Jeroen R.); M.K. Ikram (Kamran); K.A. Jablonski (Kathleen); Y. Jun (Yang); N. Kato (Norihiro); J. Kim (Jayoun); Y.J. Kim (Young Jin); B.-J. Kim (Bong-Jo); J. Lee (Juyoung); C.R. King (C. Ryan); J.S. Kooner (Jaspal S.); M.-S. Kwon (Min-Seok); H.K. Im (Hae Kyung); M. Laakso (Markku); K.K.-Y. Lam (Kevin Koi-Yau); J. Lee (Jaehoon); S. Lee (Selyeong); S. Lee (Sungyoung); D.M. Lehman (Donna M.); H. Li (Heng); C.M. Lindgren (Cecilia); X. Liu (Xuanyao); O.E. Livne (Oren E.); A.E. Locke (Adam E.); A. Mahajan (Anubha); J.B. Maller (Julian B.); A.K. Manning (Alisa K.); T.J. Maxwell (Taylor J.); A. Mazoure (Alexander); M.I. McCarthy (Mark); J.B. Meigs (James B.); B. Min (Byungju); K.L. Mohlke (Karen); A.P. Morris (Andrew); S. Musani (Solomon); Y. Nagai (Yoshihiko); M.C.Y. Ng (Maggie C.Y.); D. Nicolae (Dan); S. Oh (Sohee); N.D. Palmer (Nicholette); T. Park (Taesung); T.I. Pollin (Toni I.); I. Prokopenko (Inga); D. Reich (David); M.A. Rivas (Manuel); L.J. Scott (Laura); M. Seielstad (Mark); Y.S. Cho (Yoon Shin); X. Sim (Xueling); R. Sladek (Rob); P. Smith (Philip); I. Tachmazidou (Ioanna); E.S. Tai (Shyong); Y.Y. Teo (Yik Ying); T.M. Teslovich (Tanya M.); J. Torres (Jason); V. Trubetskoy (Vasily); S.M. Willems (Sara); A.L. Williams (Amy L.); J.G. Wilson (James); S. Wiltshire (Steven); S. Won (Sungho); A.R. Wood (Andrew); W. Xu (Wang); J. Yoon (Joon); M. Zawistowski (Matthew); E. Zeggini (Eleftheria); W. Zhang (Weihua); S. Zöllner (Sebastian)

    2015-01-01

    textabstractBackground: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the

  7. 5 CFR 536.303 - Geographic conversion.

    Science.gov (United States)

    2010-01-01

    ... after geographic conversion is the employee's existing payable rate of basic pay in effect immediately before the action. (b) Geographic conversion when a retained rate employee's official worksite is changed... 5 Administrative Personnel 1 2010-01-01 2010-01-01 false Geographic conversion. 536.303 Section...

  8. Experimental effects of climate messages vary geographically

    Science.gov (United States)

    Zhang, Baobao; van der Linden, Sander; Mildenberger, Matto; Marlon, Jennifer R.; Howe, Peter D.; Leiserowitz, Anthony

    2018-05-01

    Social science scholars routinely evaluate the efficacy of diverse climate frames using local convenience or nationally representative samples1-5. For example, previous research has focused on communicating the scientific consensus on climate change, which has been identified as a `gateway' cognition to other key beliefs about the issue6-9. Importantly, although these efforts reveal average public responsiveness to particular climate frames, they do not describe variation in message effectiveness at the spatial and political scales relevant for climate policymaking. Here we use a small-area estimation method to map geographical variation in public responsiveness to information about the scientific consensus as part of a large-scale randomized national experiment (n = 6,301). Our survey experiment finds that, on average, public perception of the consensus increases by 16 percentage points after message exposure. However, substantial spatial variation exists across the United States at state and local scales. Crucially, responsiveness is highest in more conservative parts of the country, leading to national convergence in perceptions of the climate science consensus across diverse political geographies. These findings not only advance a geographical understanding of how the public engages with information about scientific agreement, but will also prove useful for policymakers, practitioners and scientists engaged in climate change mitigation and adaptation.

  9. Sirenomelia in Argentina: Prevalence, geographic clusters and temporal trends analysis.

    Science.gov (United States)

    Groisman, Boris; Liascovich, Rosa; Gili, Juan Antonio; Barbero, Pablo; Bidondo, María Paz

    2016-07-01

    Sirenomelia is a severe malformation of the lower body characterized by a single medial lower limb and a variable combination of visceral abnormalities. Given that Sirenomelia is a very rare birth defect, epidemiological studies are scarce. The aim of this study is to evaluate prevalence, geographic clusters and time trends of sirenomelia in Argentina, using data from the National Network of Congenital Anomalies of Argentina (RENAC) from November 2009 until December 2014. This is a descriptive study using data from the RENAC, a hospital-based surveillance system for newborns affected with major morphological congenital anomalies. We calculated sirenomelia prevalence throughout the period, searched for geographical clusters, and evaluated time trends. The prevalence of confirmed cases of sirenomelia throughout the period was 2.35 per 100,000 births. Cluster analysis showed no statistically significant geographical aggregates. Time-trends analysis showed that the prevalence was higher in years 2009 to 2010. The observed prevalence was higher than the observed in previous epidemiological studies in other geographic regions. We observed a likely real increase in the initial period of our study. We used strict diagnostic criteria, excluding cases that only had clinical diagnosis of sirenomelia. Therefore, real prevalence could be even higher. This study did not show any geographic clusters. Because etiology of sirenomelia has not yet been established, studies of epidemiological features of this defect may contribute to define its causes. Birth Defects Research (Part A) 106:604-611, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  10. Natural Scales in Geographical Patterns

    Science.gov (United States)

    Menezes, Telmo; Roth, Camille

    2017-04-01

    Human mobility is known to be distributed across several orders of magnitude of physical distances, which makes it generally difficult to endogenously find or define typical and meaningful scales. Relevant analyses, from movements to geographical partitions, seem to be relative to some ad-hoc scale, or no scale at all. Relying on geotagged data collected from photo-sharing social media, we apply community detection to movement networks constrained by increasing percentiles of the distance distribution. Using a simple parameter-free discontinuity detection algorithm, we discover clear phase transitions in the community partition space. The detection of these phases constitutes the first objective method of characterising endogenous, natural scales of human movement. Our study covers nine regions, ranging from cities to countries of various sizes and a transnational area. For all regions, the number of natural scales is remarkably low (2 or 3). Further, our results hint at scale-related behaviours rather than scale-related users. The partitions of the natural scales allow us to draw discrete multi-scale geographical boundaries, potentially capable of providing key insights in fields such as epidemiology or cultural contagion where the introduction of spatial boundaries is pivotal.

  11. OUTDOOR EDUCATION AND GEOGRAPHICAL EDUCATION

    Directory of Open Access Journals (Sweden)

    ANDREA GUARAN

    2016-01-01

    Full Text Available This paper focuses on the reflection on the relationship between values and methodological principles of Outdoor Education and spatial and geographical education perspectives, especially in pre-school and primary school, which relates to the age between 3 and 10 years. Outdoor Education is an educational practice that is already rooted in the philosophical thought of the 16th and the 17th centuries, from John Locke to Jean-Jacques Rousseau, and in the pedagogical thought, in particular Friedrich Fröbel, and it has now a quite stable tradition in Northern Europe countries. In Italy, however, there are still few experiences and they usually do not have a systematic and structural modality, but rather a temporarily and experimentally outdoor organization. In the first part, this paper focuses on the reasons that justify a particular attention to educational paths that favour outdoors activities, providing also a definition of outdoor education and highlighting its values. It is also essential to understand that educational programs in open spaces, such as a forest or simply the schoolyard, surely offers the possibility to learn geographical situations. Therefore, the question that arises is how to finalize the best stimulus that the spatial location guarantees for the acquisition of knowledge, skills and abilities about space and geography.

  12. Geographic profiling and animal foraging.

    Science.gov (United States)

    Le Comber, Steven C; Nicholls, Barry; Rossmo, D Kim; Racey, Paul A

    2006-05-21

    Geographic profiling was originally developed as a statistical tool for use in criminal cases, particularly those involving serial killers and rapists. It is designed to help police forces prioritize lists of suspects by using the location of crime scenes to identify the areas in which the criminal is most likely to live. Two important concepts are the buffer zone (criminals are less likely to commit crimes in the immediate vicinity of their home) and distance decay (criminals commit fewer crimes as the distance from their home increases). In this study, we show how the techniques of geographic profiling may be applied to animal data, using as an example foraging patterns in two sympatric colonies of pipistrelle bats, Pipistrellus pipistrellus and P. pygmaeus, in the northeast of Scotland. We show that if model variables are fitted to known roost locations, these variables may be used as numerical descriptors of foraging patterns. We go on to show that these variables can be used to differentiate patterns of foraging in these two species.

  13. Estimating Classification Errors Under Edit Restrictions in Composite Survey-Register Data Using Multiple Imputation Latent Class Modelling (MILC

    Directory of Open Access Journals (Sweden)

    Boeschoten Laura

    2017-12-01

    Full Text Available Both registers and surveys can contain classification errors. These errors can be estimated by making use of a composite data set. We propose a new method based on latent class modelling to estimate the number of classification errors across several sources while taking into account impossible combinations with scores on other variables. Furthermore, the latent class model, by multiply imputing a new variable, enhances the quality of statistics based on the composite data set. The performance of this method is investigated by a simulation study, which shows that whether or not the method can be applied depends on the entropy R2 of the latent class model and the type of analysis a researcher is planning to do. Finally, the method is applied to public data from Statistics Netherlands.

  14. Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels

    Science.gov (United States)

    van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Deelen, Joris; Isaacs, Aaron; Medina-Gomez, Carolina; Mbarek, Hamdi; Kanterakis, Alexandros; Trompet, Stella; Postmus, Iris; Verweij, Niek; van Enckevort, David J.; Huffman, Jennifer E.; White, Charles C.; Feitosa, Mary F.; Bartz, Traci M.; Manichaikul, Ani; Joshi, Peter K.; Peloso, Gina M.; Deelen, Patrick; van Dijk, Freerk; Willemsen, Gonneke; de Geus, Eco J.; Milaneschi, Yuri; Penninx, Brenda W.J.H.; Francioli, Laurent C.; Menelaou, Androniki; Pulit, Sara L.; Rivadeneira, Fernando; Hofman, Albert; Oostra, Ben A.; Franco, Oscar H.; Leach, Irene Mateo; Beekman, Marian; de Craen, Anton J.M.; Uh, Hae-Won; Trochet, Holly; Hocking, Lynne J.; Porteous, David J.; Sattar, Naveed; Packard, Chris J.; Buckley, Brendan M.; Brody, Jennifer A.; Bis, Joshua C.; Rotter, Jerome I.; Mychaleckyj, Josyf C.; Campbell, Harry; Duan, Qing; Lange, Leslie A.; Wilson, James F.; Hayward, Caroline; Polasek, Ozren; Vitart, Veronique; Rudan, Igor; Wright, Alan F.; Rich, Stephen S.; Psaty, Bruce M.; Borecki, Ingrid B.; Kearney, Patricia M.; Stott, David J.; Adrienne Cupples, L.; Neerincx, Pieter B.T.; Elbers, Clara C.; Francesco Palamara, Pier; Pe'er, Itsik; Abdellaoui, Abdel; Kloosterman, Wigard P.; van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F.J.; Stoneking, Mark; de Knijff, Peter; Kayser, Manfred; Veldink, Jan H.; van den Berg, Leonard H.; Byelas, Heorhiy; den Dunnen, Johan T.; Dijkstra, Martijn; Amin, Najaf; Joeri van der Velde, K.; van Setten, Jessica; Kattenberg, Mathijs; van Schaik, Barbera D.C.; Bot, Jan; Nijman, Isaäc J.; Mei, Hailiang; Koval, Vyacheslav; Ye, Kai; Lameijer, Eric-Wubbo; Moed, Matthijs H.; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Sunyaev, Shamil R.; Sohail, Mashaal; Hormozdiari, Fereydoun; Marschall, Tobias; Schönhuth, Alexander; Guryev, Victor; Suchiman, H. Eka D.; Wolffenbuttel, Bruce H.; Platteel, Mathieu; Pitts, Steven J.; Potluri, Shobha; Cox, David R.; Li, Qibin; Li, Yingrui; Du, Yuanping; Chen, Ruoyan; Cao, Hongzhi; Li, Ning; Cao, Sujie; Wang, Jun; Bovenberg, Jasper A.; Jukema, J. Wouter; van der Harst, Pim; Sijbrands, Eric J.; Hottenga, Jouke-Jan; Uitterlinden, Andre G.; Swertz, Morris A.; van Ommen, Gert-Jan B.; de Bakker, Paul I.W.; Eline Slagboom, P.; Boomsma, Dorret I.; Wijmenga, Cisca; van Duijn, Cornelia M.

    2015-01-01

    Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (~35,000 samples) with the population-specific reference panel created by the Genome of the Netherlands Project and perform association testing with blood lipid levels. We report the discovery of five novel associations at four loci (P value <6.61 × 10−4), including a rare missense variant in ABCA6 (rs77542162, p.Cys1359Arg, frequency 0.034), which is predicted to be deleterious. The frequency of this ABCA6 variant is 3.65-fold increased in the Dutch and its effect (βLDL-C=0.135, βTC=0.140) is estimated to be very similar to those observed for single variants in well-known lipid genes, such as LDLR. PMID:25751400

  15. ROMANIA: GEOGRAPHICAL AND GEOPOLITICAL POSITION

    Directory of Open Access Journals (Sweden)

    Ciprian Beniamin Benea

    2016-12-01

    Full Text Available The paper intends to bring to the reader’s attention the importance of understanding the role education plays in creating a good geopolitical position for a state which has a good geographical position, and which is well endowed in natural resources. The case of Romania is the main focus of the paper. There is presented a peculiar strange situation of a country (Romania which is very well located from geographical point of view but which is incapable to exploit its natural endowments and special location. One reason for this situation is the fact that most people living in present Romania belong to a category named in this paper ‘individuals’. Individuals are not aware of their country’s geography and history, let alone its possible future development possibilities. They do not know the role their country could play, and living in an atomized society, they choose emigration as the easiest way to escape harsh social and economic environment. Contrary to this attitude is that of a citizen, a man conscious about his country’s potential, and which is dedicated to work hardly together with his fellows in order to promote national interests in a peaceful manner. Even there was found remnants of an ancient city close to present day Romanian territory – proves of well endowed environment – moral and psychological factors have contributed after 1990 in an crucial manner to push Romania from its civilization path back to the archaic spirit, from active urban spirit to rural mentality. In such a situation it is not uncommon for a nation to lose its means for projecting power, which could promote the value and the importance of a geographical position – transportation; rural mentality has nothing to do with modern transportation as they are technical tools with geopolitical essence for controlling space. It is a well known fact that transportation and geopolitics are closely interrelated. Furthermore, social dissolution in post communist

  16. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

    Directory of Open Access Journals (Sweden)

    Momoko Horikoshi

    2015-07-01

    Full Text Available Reference panels from the 1000 Genomes (1000G Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS, supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI at genome-wide significance, and two for fasting glucose (FG, none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3 and FG (GCK and G6PC2. The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.

  17. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

    Science.gov (United States)

    Horikoshi, Momoko; Mӓgi, Reedik; van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; Hӓgg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S; Winkler, Thomas W; Willems, Sara M; Pervjakova, Natalia; Esko, Tõnu; Beekman, Marian; Nelson, Christopher P; Willenborg, Christina; Wiltshire, Steven; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K E; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R; Groves, Christopher J; Bennett, Amanda J; Lehtimӓki, Terho; Viikari, Jorma S; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M; Herder, Christian; Grallert, Harald; Müller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M; Karssen, Lennart C; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J; de Craen, Anton J M; Deelen, Joris; Havulinna, Aki S; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D; Samani, Nilesh J; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M; Slagboom, P Eline; Metspalu, Andres; van Duijn, Cornelia M; Eriksson, Johan G; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T; Power, Chris; Penninx, Brenda W J H; de Geus, Eco; Smit, Johannes H; Boomsma, Dorret I; Pedersen, Nancy L; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I; Morris, Andrew P

    2015-07-01

    Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.

  18. Geographical Gradients in Argentinean Terrestrial Mammal Species Richness and Their Environmental Correlates

    Science.gov (United States)

    Márquez, Ana L.; Real, Raimundo; Kin, Marta S.; Guerrero, José Carlos; Galván, Betina; Barbosa, A. Márcia; Olivero, Jesús; Palomo, L. Javier; Vargas, J. Mario; Justo, Enrique

    2012-01-01

    We analysed the main geographical trends of terrestrial mammal species richness (SR) in Argentina, assessing how broad-scale environmental variation (defined by climatic and topographic variables) and the spatial form of the country (defined by spatial filters based on spatial eigenvector mapping (SEVM)) influence the kinds and the numbers of mammal species along these geographical trends. We also evaluated if there are pure geographical trends not accounted for by the environmental or spatial factors. The environmental variables and spatial filters that simultaneously correlated with the geographical variables and SR were considered potential causes of the geographic trends. We performed partial correlations between SR and the geographical variables, maintaining the selected explanatory variables statistically constant, to determine if SR was fully explained by them or if a significant residual geographic pattern remained. All groups and subgroups presented a latitudinal gradient not attributable to the spatial form of the country. Most of these trends were not explained by climate. We used a variation partitioning procedure to quantify the pure geographic trend (PGT) that remained unaccounted for. The PGT was larger for latitudinal than for longitudinal gradients. This suggests that historical or purely geographical causes may also be relevant drivers of these geographical gradients in mammal diversity. PMID:23028254

  19. Geographical Gradients in Argentinean Terrestrial Mammal Species Richness and Their Environmental Correlates

    Directory of Open Access Journals (Sweden)

    Ana L. Márquez

    2012-01-01

    Full Text Available We analysed the main geographical trends of terrestrial mammal species richness (SR in Argentina, assessing how broad-scale environmental variation (defined by climatic and topographic variables and the spatial form of the country (defined by spatial filters based on spatial eigenvector mapping (SEVM influence the kinds and the numbers of mammal species along these geographical trends. We also evaluated if there are pure geographical trends not accounted for by the environmental or spatial factors. The environmental variables and spatial filters that simultaneously correlated with the geographical variables and SR were considered potential causes of the geographic trends. We performed partial correlations between SR and the geographical variables, maintaining the selected explanatory variables statistically constant, to determine if SR was fully explained by them or if a significant residual geographic pattern remained. All groups and subgroups presented a latitudinal gradient not attributable to the spatial form of the country. Most of these trends were not explained by climate. We used a variation partitioning procedure to quantify the pure geographic trend (PGT that remained unaccounted for. The PGT was larger for latitudinal than for longitudinal gradients. This suggests that historical or purely geographical causes may also be relevant drivers of these geographical gradients in mammal diversity.

  20. Representations built from a true geographic database

    DEFF Research Database (Denmark)

    Bodum, Lars

    2005-01-01

    the whole world in 3d and with a spatial reference given by geographic coordinates. Built on top of this is a customised viewer, based on the Xith(Java) scenegraph. The viewer reads the objects directly from the database and solves the question about Level-Of-Detail on buildings, orientation in relation...... a representation based on geographic and geospatial principles. The system GRIFINOR, developed at 3DGI, Aalborg University, DK, is capable of creating this object-orientation and furthermore does this on top of a true Geographic database. A true Geographic database can be characterized as a database that can cover...

  1. The Oklahoma Geographic Information Retrieval System

    Science.gov (United States)

    Blanchard, W. A.

    1982-01-01

    The Oklahoma Geographic Information Retrieval System (OGIRS) is a highly interactive data entry, storage, manipulation, and display software system for use with geographically referenced data. Although originally developed for a project concerned with coal strip mine reclamation, OGIRS is capable of handling any geographically referenced data for a variety of natural resource management applications. A special effort has been made to integrate remotely sensed data into the information system. The timeliness and synoptic coverage of satellite data are particularly useful attributes for inclusion into the geographic information system.

  2. CONTEMPORARY TRENDS IN GEOGRAPHICAL EDUCATION

    Directory of Open Access Journals (Sweden)

    M. Wasileva

    2017-01-01

    Full Text Available The geography includes rich, diverse and comprehensive themes that give us an understanding of our changing environment and interconnected world. It includes the study of the physical environment and resources; cultures, economies and societies; people and places; and global development and civic participation. As a subject, geography is particularly valuable because it provides information for exploring contemporary issues from a different perspective. This geographical information affects us all at work and in our daily lives and helps us make informed decisions that shape our future. All these facts result in a wide discussion on many topical issues in contemporary geography didactics. Subjects of research are the new geography and economics curriculum as well as construction of modern learning process. The paper presents briefly some of the current trends and key issues of geodidactics. As central notions we consider and analyze the training/educational goals, geography curriculum, target groups and environment of geography training, training methods as well as the information sources used in geography education. We adhere that all the above-mentioned finds its reflection in planning, analysis and assessment of education and thus in its quality and effectiveness.

  3. Geographical variation in cardiovascular incidence: results from the British Women's Heart and Health Study

    Directory of Open Access Journals (Sweden)

    Ebrahim Shah

    2010-11-01

    Full Text Available Abstract Background Prevalence of cardiovascular disease (CVD in women shows regional variations not explained by common risk factors. Analysis of CVD incidence will provide insight into whether there is further divergence between regions with increasing age. Methods Seven-year follow-up data on 2685 women aged 59-80 (mean 69 at baseline from 23 towns in the UK were available from the British Women's Heart and Health Study. Time to fatal or non-fatal CVD was analyzed using Cox regression with adjustment for risk factors, using multiple imputation for missing values. Results Compared to South England, CVD incidence is similar in North England (HR 1.05 (95% CI 0.84, 1.31 and Scotland (0.93 (0.68, 1.27, but lower in Midlands/Wales (0.85 (0.64, 1.12. Event severity influenced regional variation, with South England showing lower fatal incident CVD than other regions, but higher non-fatal incident CVD. Kaplan-Meier plots suggested that regional divergence in CVD occurred before baseline (before mean baseline age of 69. Conclusions In women, regional differences in CVD early in adult life do not further diverge in later life. This may be due to regional differences in early detection, survivorship of women entering the study, or event severity. Targeting health care resources for CVD by geographic variation may not be appropriate for older age-groups.

  4. Geographical National Condition and Complex System

    Directory of Open Access Journals (Sweden)

    WANG Jiayao

    2016-01-01

    Full Text Available The significance of studying the complex system of geographical national conditions lies in rationally expressing the complex relationships of the “resources-environment-ecology-economy-society” system. Aiming to the problems faced by the statistical analysis of geographical national conditions, including the disunity of research contents, the inconsistency of range, the uncertainty of goals, etc.the present paper conducted a range of discussions from the perspectives of concept, theory and method, and designed some solutions based on the complex system theory and coordination degree analysis methods.By analyzing the concepts of geographical national conditions, geographical national conditions survey and geographical national conditions statistical analysis, as well as investigating the relationships between theirs, the statistical contents and the analytical range of geographical national conditions are clarified and defined. This investigation also clarifies the goals of the statistical analysis by analyzing the basic characteristics of the geographical national conditions and the complex system, and the consistency between the analysis of the degree of coordination and statistical analyses. It outlines their goals, proposes a concept for the complex system of geographical national conditions, and it describes the concept. The complex system theory provides new theoretical guidance for the statistical analysis of geographical national conditions. The degree of coordination offers new approaches on how to undertake the analysis based on the measurement method and decision-making analysis scheme upon which the complex system of geographical national conditions is based. It analyzes the overall trend via the degree of coordination of the complex system on a macro level, and it determines the direction of remediation on a micro level based on the degree of coordination among various subsystems and of single systems. These results establish

  5. Different methods for analysing and imputation missing values in wind speed series; La problematica de la calidad de la informacion en series de velocidad del viento-metodologias de analisis y imputacion de datos faltantes

    Energy Technology Data Exchange (ETDEWEB)

    Ferreira, A. M.

    2004-07-01

    This study concerns about different methods for analysing and imputation missing values in wind speed series. The algorithm EM and a methodology derivated from the sequential hot deck have been utilized. Series with missing values imputed are compared with original and complete series, using several criteria, such the wind potential; and appears to exist a significant goodness of fit between the estimates and real values. (Author)

  6. Socioeconomic and geographic inequalities in adolescent smoking: a multilevel cross-sectional study of 15 year olds in Scotland.

    Science.gov (United States)

    Levin, K A; Dundas, R; Miller, M; McCartney, G

    2014-04-01

    The objective of the study was to present socioeconomic and geographic inequalities in adolescent smoking in Scotland. The international literature suggests there is no obvious pattern in the geography of adolescent smoking, with rural areas having a higher prevalence than urban areas in some countries, and a lower prevalence in others. These differences are most likely due to substantive differences in rurality between countries in terms of their social, built and cultural geography. Previous studies in the UK have shown an association between lower socioeconomic status and smoking. The Scottish Health Behaviour in School-aged Children study surveyed 15 year olds in schools across Scotland between March and June of 2010. We ran multilevel logistic regressions using Markov chain Monte Carlo method and adjusting for age, school type, family affluence, area level deprivation and rurality. We imputed missing rurality and deprivation data using multivariate imputation by chained equations, and re-analysed the data (N = 3577), comparing findings. Among boys, smoking was associated only with area-level deprivation. This relationship appeared to have a quadratic S-shape, with those living in the second most deprived quintile having highest odds of smoking. Among girls, however, odds of smoking increased with deprivation at individual and area-level, with an approximate dose-response relationship for both. Odds of smoking were higher for girls living in remote and rural parts of Scotland than for those living in urban areas. Schools in rural areas were no more or less homogenous than schools in urban areas in terms of smoking prevalence. We discuss possible social and cultural explanations for the high prevalence of boys' and girls' smoking in low SES neighbourhoods and of girls' smoking in rural areas. We consider possible differences in the impact of recent tobacco policy changes, primary socialization, access and availability, retail outlet density and the home

  7. Conceptual Model of Dynamic Geographic Environment

    Directory of Open Access Journals (Sweden)

    Martínez-Rosales Miguel Alejandro

    2014-04-01

    Full Text Available In geographic environments, there are many and different types of geographic entities such as automobiles, trees, persons, buildings, storms, hurricanes, etc. These entities can be classified into two groups: geographic objects and geographic phenomena. By its nature, a geographic environment is dynamic, thus, it’s static modeling is not sufficient. Considering the dynamics of geographic environment, a new type of geographic entity called event is introduced. The primary target is a modeling of geographic environment as an event sequence, because in this case the semantic relations are much richer than in the case of static modeling. In this work, the conceptualization of this model is proposed. It is based on the idea to process each entity apart instead of processing the environment as a whole. After that, the so called history of each entity and its spatial relations to other entities are defined to describe the whole environment. The main goal is to model systems at a conceptual level that make use of spatial and temporal information, so that later it can serve as the semantic engine for such systems.

  8. 25 CFR 571.10 - Geographical location.

    Science.gov (United States)

    2010-04-01

    ... 25 Indians 2 2010-04-01 2010-04-01 false Geographical location. 571.10 Section 571.10 Indians NATIONAL INDIAN GAMING COMMISSION, DEPARTMENT OF THE INTERIOR COMPLIANCE AND ENFORCEMENT PROVISIONS MONITORING AND INVESTIGATIONS Subpoenas and Depositions § 571.10 Geographical location. The attendance of...

  9. The evolution of cooperation on geographical networks

    Science.gov (United States)

    Li, Yixiao; Wang, Yi; Sheng, Jichuan

    2017-11-01

    We study evolutionary public goods game on geographical networks, i.e., complex networks which are located on a geographical plane. The geographical feature effects in two ways: In one way, the geographically-induced network structure influences the overall evolutionary dynamics, and, in the other way, the geographical length of an edge influences the cost when the two players at the two ends interact. For the latter effect, we design a new cost function of cooperators, which simply assumes that the longer the distance between two players, the higher cost the cooperator(s) of them have to pay. In this study, network substrates are generated by a previous spatial network model with a cost-benefit parameter controlling the network topology. Our simulations show that the greatest promotion of cooperation is achieved in the intermediate regime of the parameter, in which empirical estimates of various railway networks fall. Further, we investigate how the distribution of edges' geographical costs influences the evolutionary dynamics and consider three patterns of the distribution: an approximately-equal distribution, a diverse distribution, and a polarized distribution. For normal geographical networks which are generated using intermediate values of the cost-benefit parameter, a diverse distribution hinders the evolution of cooperation, whereas a polarized distribution lowers the threshold value of the amplification factor for cooperation in public goods game. These results are helpful for understanding the evolution of cooperation on real-world geographical networks.

  10. Hierarchical spatial organization of geographical networks

    International Nuclear Information System (INIS)

    Travencolo, Bruno A N; Costa, Luciano da F

    2008-01-01

    In this work, we propose a hierarchical extension of the polygonality index as the means to characterize geographical planar networks. By considering successive neighborhoods around each node, it is possible to obtain more complete information about the spatial order of the network at progressive spatial scales. The potential of the methodology is illustrated with respect to synthetic and real geographical networks

  11. Future Prospects for Geographical Education in Slovenia

    Science.gov (United States)

    Resnic Planinc, Tatjana

    2011-01-01

    This paper deals with future prospects for geographical education in Slovenia, with special emphasis on the development and aims of the didactics of geography. The author discusses the past development of geographical curricula and of competencies of geography teachers, and the education of future teachers of the subject in Slovenia. Her ideas are…

  12. Socioeconomic Development Inequalities among Geographic Units ...

    African Journals Online (AJOL)

    Socio-economic development inequality among geographic units is a phenomenon common in both the developed and developing countries. Regional inequality may result in dissension among geographic units of the same state due to the imbalance in socio-economic development. This study examines the inequality ...

  13. Composing Models of Geographic Physical Processes

    Science.gov (United States)

    Hofer, Barbara; Frank, Andrew U.

    Processes are central for geographic information science; yet geographic information systems (GIS) lack capabilities to represent process related information. A prerequisite to including processes in GIS software is a general method to describe geographic processes independently of application disciplines. This paper presents such a method, namely a process description language. The vocabulary of the process description language is derived formally from mathematical models. Physical processes in geography can be described in two equivalent languages: partial differential equations or partial difference equations, where the latter can be shown graphically and used as a method for application specialists to enter their process models. The vocabulary of the process description language comprises components for describing the general behavior of prototypical geographic physical processes. These process components can be composed by basic models of geographic physical processes, which is shown by means of an example.

  14. Isolation of Microsporum gypseum in soil samples from different geographical regions of Brazil, evaluation of the extracellular proteolytic enzymes activities (keratinase and elastase and molecular sequencing of selected strains

    Directory of Open Access Journals (Sweden)

    Mauro Cintra Giudice

    2012-09-01

    Full Text Available A survey of Microsporum gypseum was conducted in soil samples in different geographical regions of Brazil. The isolation of dermatophyte from soil samples was performed by hair baiting technique and the species were identified by morphology studies. We analyzed 692 soil samples and the recuperating rate was 19.2%. The activities of keratinase and elastase were quantitatively performed in 138 samples. The sequencing of the ITS region of rDNA was performed in representatives samples. M. gypseum isolates showed significant quantitative differences in the expression of both keratinase and elastase, but no significant correlation was observed between these enzymes. The sequencing of the representative samples revealed the presence of two teleomorphic species of M. gypseum (Arthroderma gypseum and A. incurvatum. The enzymatic activities may play an important role in the pathogenicity and a probable adaptation of this fungus to the animal parasitism. Using the phenotypical and molecular analysis, the Microsporum identification and their teleomorphic states will provide a useful and reliable identification system.

  15. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data.

    Science.gov (United States)

    Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.

  16. The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based methods.

    Directory of Open Access Journals (Sweden)

    Rui Martiniano

    2017-07-01

    Full Text Available We analyse new genomic data (0.05-2.95x from 14 ancient individuals from Portugal distributed from the Middle Neolithic (4200-3500 BC to the Middle Bronze Age (1740-1430 BC and impute genomewide diploid genotypes in these together with published ancient Eurasians. While discontinuity is evident in the transition to agriculture across the region, sensitive haplotype-based analyses suggest a significant degree of local hunter-gatherer contribution to later Iberian Neolithic populations. A more subtle genetic influx is also apparent in the Bronze Age, detectable from analyses including haplotype sharing with both ancient and modern genomes, D-statistics and Y-chromosome lineages. However, the limited nature of this introgression contrasts with the major Steppe migration turnovers within third Millennium northern Europe and echoes the survival of non-Indo-European language in Iberia. Changes in genomic estimates of individual height across Europe are also associated with these major cultural transitions, and ancestral components continue to correlate with modern differences in stature.

  17. Fissured and geographic tongue in Williams-Beuren syndrome

    Directory of Open Access Journals (Sweden)

    Neeta Sharma

    2014-01-01

    Full Text Available Williams-Beuren Syndrome (WBS is a rare, most often sporadic, genetic disease caused by a chromosomal microdeletion at locus 7q11.23 involving 28 genes. It is characterized by congenital heart defects, neonatal hypercalcemia, skeletal and renal abnormalities, cognitive disorder, social personality disorder, and dysmorphic facies. A number of clinical findings has been reported, but none of the studies evaluated this syndrome considering oral cavity. We here report a fissured and geographic tongue in association with WBS.

  18. Geographic Education--Where Have We Failed?

    Science.gov (United States)

    Gritzner, Charles F.

    1981-01-01

    Discusses geography's rather low status and relatively poor public image in the United States and some of the consequences. Among the world's educated industrial nations, the United States ranks among the least literate in a geographical sense. (RM)

  19. Medicare Geographic Variation - Public Use File

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Medicare Geographic Variation Public Use File provides the ability to view demographic, utilization and quality indicators at the state level (including...

  20. Geographic information system planning and monitoring best ...

    African Journals Online (AJOL)

    Poor urbanization policies, inefficient planning and monitoring technologies are evident. The consequences include some of the worst types of environmental hazards. Best urbanization practices require integrated planning approaches that result in environmental conservation. Geographic Information systems (GIS) provide ...

  1. GNIS: Geographic Names Information Systems - All features

    Data.gov (United States)

    Earth Data Analysis Center, University of New Mexico — The Geographic Names Information System (GNIS) actively seeks data from and partnerships with Government agencies at all levels and other interested organizations....

  2. Geographic Variation in Medicare Spending Dashboard

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Geographic Variation Dashboards present Medicare fee-for-service per-capita spending at the state and county level in an interactive format. We calculated the...

  3. Genome-wide association study with 1000 genomes imputation identifies signals for nine sex hormone-related phenotypes.

    Science.gov (United States)

    Ruth, Katherine S; Campbell, Purdey J; Chew, Shelby; Lim, Ee Mun; Hadlow, Narelle; Stuckey, Bronwyn G A; Brown, Suzanne J; Feenstra, Bjarke; Joseph, John; Surdulescu, Gabriela L; Zheng, Hou Feng; Richards, J Brent; Murray, Anna; Spector, Tim D; Wilson, Scott G; Perry, John R B

    2016-02-01

    Genetic factors contribute strongly to sex hormone levels, yet knowledge of the regulatory mechanisms remains incomplete. Genome-wide association studies (GWAS) have identified only a small number of loci associated with sex hormone levels, with several reproductive hormones yet to be assessed. The aim of the study was to identify novel genetic variants contributing to the regulation of sex hormones. We performed GWAS using genotypes imputed from the 1000 Genomes reference panel. The study used genotype and phenotype data from a UK twin register. We included 2913 individuals (up to 294 males) from the Twins UK study, excluding individuals receiving hormone treatment. Phenotypes were standardised for age, sex, BMI, stage of menstrual cycle and menopausal status. We tested 7,879,351 autosomal SNPs for association with levels of dehydroepiandrosterone sulphate (DHEAS), oestradiol, free androgen index (FAI), follicle-stimulating hormone (FSH), luteinizing hormone (LH), prolactin, progesterone, sex hormone-binding globulin and testosterone. Eight independent genetic variants reached genome-wide significance (P<5 × 10(-8)), with minor allele frequencies of 1.3-23.9%. Novel signals included variants for progesterone (P=7.68 × 10(-12)), oestradiol (P=1.63 × 10(-8)) and FAI (P=1.50 × 10(-8)). A genetic variant near the FSHB gene was identified which influenced both FSH (P=1.74 × 10(-8)) and LH (P=3.94 × 10(-9)) levels. A separate locus on chromosome 7 was associated with both DHEAS (P=1.82 × 10(-14)) and progesterone (P=6.09 × 10(-14)). This study highlights loci that are relevant to reproductive function and suggests overlap in the genetic basis of hormone regulation.

  4. Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study.

    Science.gov (United States)

    Cornish, R P; Macleod, J; Carpenter, J R; Tilling, K

    2017-01-01

    When an outcome variable is missing not at random (MNAR: probability of missingness depends on outcome values), estimates of the effect of an exposure on this outcome are often biased. We investigated the extent of this bias and examined whether the bias can be reduced through incorporating proxy outcomes obtained through linkage to administrative data as auxiliary variables in multiple imputation (MI). Using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) we estimated the association between breastfeeding and IQ (continuous outcome), incorporating linked attainment data (proxies for IQ) as auxiliary variables in MI models. Simulation studies explored the impact of varying the proportion of missing data (from 20 to 80%), the correlation between the outcome and its proxy (0.1-0.9), the strength of the missing data mechanism, and having a proxy variable that was incomplete. Incorporating a linked proxy for the missing outcome as an auxiliary variable reduced bias and increased efficiency in all scenarios, even when 80% of the outcome was missing. Using an incomplete proxy was similarly beneficial. High correlations (> 0.5) between the outcome and its proxy substantially reduced the missing information. Consistent with this, ALSPAC analysis showed inclusion of a proxy reduced bias and improved efficiency. Gains with additional proxies were modest. In longitudinal studies with loss to follow-up, incorporating proxies for this study outcome obtained via linkage to external sources of data as auxiliary variables in MI models can give practically important bias reduction and efficiency gains when the study outcome is MNAR.

  5. Geographical Clusters of Rape in the United States: 2000-2012

    Science.gov (United States)

    Amin, Raid; Nabors, Nicole S.; Nelson, Arlene M.; Saqlain, Murshid; Kulldorff, Martin

    2016-01-01

    Background While rape is a very serious crime and public health problem, no spatial mapping has been attempted for rape on the national scale. This paper addresses the three research questions: (1) Are reported rape cases randomly distributed across the USA, after being adjusted for population density and age, or are there geographical clusters of reported rape cases? (2) Are the geographical clusters of reported rapes still present after adjusting for differences in poverty levels? (3) Are there geographical clusters where the proportion of reported rape cases that lead to an arrest is exceptionally low or exceptionally high? Methods We studied the geographical variation of reported rape events (2003-2012) and rape arrests (2000-2012) in the 48 contiguous states of the USA. The disease Surveillance software SaTScan™ with its spatial scan statistic is used to evaluate the spatial variation in rapes. The spatial scan statistic has been widely used as a geographical surveillance tool for diseases, and we used it to identify geographical areas with clusters of reported rape and clusters of arrest rates for rape. Results The spatial scan statistic was used to identify geographical areas with exceptionally high rates of reported rape. The analyses were adjusted for age, and in secondary analyses, for both age and poverty level. We also identified geographical areas with either a low or a high proportion of reported rapes leading to an arrest. Conclusions We have identified geographical areas with exceptionally high (low) rates of reported rape. The geographical problem areas identified are prime candidates for more intensive preventive counseling and criminal prosecution efforts by public health, social service, and law enforcement agencies Geographical clusters of high rates of reported rape are prime areas in need of expanded implementation of preventive measures, such as changing attitudes in our society toward rape crimes, in addition to having the criminal

  6. Analyzing the Impacts of Alternated Number of Iterations in Multiple Imputation Method on Explanatory Factor Analysis

    Directory of Open Access Journals (Sweden)

    Duygu KOÇAK

    2017-11-01

    Full Text Available The study aims to identify the effects of iteration numbers used in multiple iteration method, one of the methods used to cope with missing values, on the results of factor analysis. With this aim, artificial datasets of different sample sizes were created. Missing values at random and missing values at complete random were created in various ratios by deleting data. For the data in random missing values, a second variable was iterated at ordinal scale level and datasets with different ratios of missing values were obtained based on the levels of this variable. The data were generated using “psych” program in R software, while “dplyr” program was used to create codes that would delete values according to predetermined conditions of missing value mechanism. Different datasets were generated by applying different iteration numbers. Explanatory factor analysis was conducted on the datasets completed and the factors and total explained variances are presented. These values were first evaluated based on the number of factors and total variance explained of the complete datasets. The results indicate that multiple iteration method yields a better performance in cases of missing values at random compared to datasets with missing values at complete random. Also, it was found that increasing the number of iterations in both missing value datasets decreases the difference in the results obtained from complete datasets.

  7. A coregionalization model can assist specification of Geographically Weighted Poisson Regression: Application to an ecological study.

    Science.gov (United States)

    Ribeiro, Manuel Castro; Sousa, António Jorge; Pereira, Maria João

    2016-05-01

    The geographical distribution of health outcomes is influenced by socio-economic and environmental factors operating on different spatial scales. Geographical variations in relationships can be revealed with semi-parametric Geographically Weighted Poisson Regression (sGWPR), a model that can combine both geographically varying and geographically constant parameters. To decide whether a parameter should vary geographically, two models are compared: one in which all parameters are allowed to vary geographically and one in which all except the parameter being evaluated are allowed to vary geographically. The model with the lower corrected Akaike Information Criterion (AICc) is selected. Delivering model selection exclusively according to the AICc might hide important details in spatial variations of associations. We propose assisting the decision by using a Linear Model of Coregionalization (LMC). Here we show how LMC can refine sGWPR on ecological associations between socio-economic and environmental variables and low birth weight outcomes in the west-north-central region of Portugal. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Design and Establishment of Quality Model of Fundamental Geographic Information Database

    Science.gov (United States)

    Ma, W.; Zhang, J.; Zhao, Y.; Zhang, P.; Dang, Y.; Zhao, T.

    2018-04-01

    In order to make the quality evaluation for the Fundamental Geographic Information Databases(FGIDB) more comprehensive, objective and accurate, this paper studies and establishes a quality model of FGIDB, which formed by the standardization of database construction and quality control, the conformity of data set quality and the functionality of database management system, and also designs the overall principles, contents and methods of the quality evaluation for FGIDB, providing the basis and reference for carry out quality control and quality evaluation for FGIDB. This paper designs the quality elements, evaluation items and properties of the Fundamental Geographic Information Database gradually based on the quality model framework. Connected organically, these quality elements and evaluation items constitute the quality model of the Fundamental Geographic Information Database. This model is the foundation for the quality demand stipulation and quality evaluation of the Fundamental Geographic Information Database, and is of great significance on the quality assurance in the design and development stage, the demand formulation in the testing evaluation stage, and the standard system construction for quality evaluation technology of the Fundamental Geographic Information Database.

  9. Estimating past hepatitis C infection risk from reported risk factor histories: implications for imputing age of infection and modeling fibrosis progression

    Directory of Open Access Journals (Sweden)

    Busch Michael P

    2007-12-01

    Full Text Available Abstract Background Chronic hepatitis C virus infection is prevalent and often causes hepatic fibrosis, which can progress to cirrhosis and cause liver cancer or liver failure. Study of fibrosis progression often relies on imputing the time of infection, often as the reported age of first injection drug use. We sought to examine the accuracy of such imputation and implications for modeling factors that influence progression rates. Methods We analyzed cross-sectional data on hepatitis C antibody status and reported risk factor histories from two large studies, the Women's Interagency HIV Study and the Urban Health Study, using modern survival analysis methods for current status data to model past infection risk year by year. We compared fitted distributions of past infection risk to reported age of first injection drug use. Results Although injection drug use appeared to be a very strong risk factor, models for both studies showed that many subjects had considerable probability of having been infected substantially before or after their reported age of first injection drug use. Persons reporting younger age of first injection drug use were more likely to have been infected after, and persons reporting older age of first injection drug use were more likely to have been infected before. Conclusion In cross-sectional studies of fibrosis progression where date of HCV infection is estimated from risk factor histories, modern methods such as multiple imputation should be used to account for the substantial uncertainty about when infection occurred. The models presented here can provide the inputs needed by such methods. Using reported age of first injection drug use as the time of infection in studies of fibrosis progression is likely to produce a spuriously strong association of younger age of infection with slower rate of progression.

  10. Geographic Names Information System (GNIS) for Lousiana, Geographic NAD83, USGS (2007) [GNIS_LA_USGS_2007

    Data.gov (United States)

    Louisiana Geographic Information Center — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  11. Application of Spatial Data Modeling Systems, Geographical Information Systems (GIS), and Transportation Routing Optimization Methods for Evaluating Integrated Deployment of Interim Spent Fuel Storage Installations and Advanced Nuclear Plants

    International Nuclear Information System (INIS)

    Mays, Gary T.; Belles, Randy; Cetiner, Mustafa Sacit; Howard, Rob L.; Liu, Cheng; Mueller, Don; Omitaomu, Olufemi A.; Peterson, Steven K.; Scaglione, John M.

    2012-01-01

    The objective of this siting study work is to support DOE in evaluating integrated advanced nuclear plant and ISFSI deployment options in the future. This study looks at several nuclear power plant growth scenarios that consider the locations of existing and planned commercial nuclear power plants integrated with the establishment of consolidated interim spent fuel storage installations (ISFSIs). This research project is aimed at providing methodologies, information, and insights that inform the process for determining and optimizing candidate areas for new advanced nuclear power generation plants and consolidated ISFSIs to meet projected US electric power demands for the future.

  12. Application of Spatial Data Modeling Systems, Geographical Information Systems (GIS), and Transportation Routing Optimization Methods for Evaluating Integrated Deployment of Interim Spent Fuel Storage Installations and Advanced Nuclear Plants

    Energy Technology Data Exchange (ETDEWEB)

    Mays, Gary T [ORNL; Belles, Randy [ORNL; Cetiner, Sacit M [ORNL; Howard, Rob L [ORNL; Liu, Cheng [ORNL; Mueller, Don [ORNL; Omitaomu, Olufemi A [ORNL; Peterson, Steven K [ORNL; Scaglione, John M [ORNL

    2012-06-01

    The objective of this siting study work is to support DOE in evaluating integrated advanced nuclear plant and ISFSI deployment options in the future. This study looks at several nuclear power plant growth scenarios that consider the locations of existing and planned commercial nuclear power plants integrated with the establishment of consolidated interim spent fuel storage installations (ISFSIs). This research project is aimed at providing methodologies, information, and insights that inform the process for determining and optimizing candidate areas for new advanced nuclear power generation plants and consolidated ISFSIs to meet projected US electric power demands for the future.

  13. Comparison of immunization strategies in geographical networks

    Energy Technology Data Exchange (ETDEWEB)

    Wang Bing; Aihara, Kazuyuki [Institute of Industrial Science, The University of Tokyo, Tokyo (Japan)] [ERATO Aihara Complexity Modelling Project, JST, Institute of Industrial Science, University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo, 153-8505 (Japan); Kim, Beom Jun, E-mail: beomjun@skku.ed [BK21 Physics Research Division and Department of Energy Science, Sungkyunkwan University, Suwon 440-746 (Korea, Republic of)] [Department of Computational Biology, School of Computer Science and Communication, Royal Institute of Technology, 100 44 Stockholm (Sweden)

    2009-10-12

    The epidemic spread and immunizations in geographically embedded scale-free (SF) and Watts-Strogatz (WS) networks are numerically investigated. We make a realistic assumption that it takes time which we call the detection time, for a vertex to be identified as infected, and implement two different immunization strategies: one is based on connection neighbors (CN) of the infected vertex with the exact information of the network structure utilized and the other is based on spatial neighbors (SN) with only geographical distances taken into account. We find that the decrease of the detection time is crucial for a successful immunization in general. Simulation results show that for both SF networks and WS networks, the SN strategy always performs better than the CN strategy, especially for more heterogeneous SF networks at long detection time. The observation is verified by checking the number of the infected nodes being immunized. We found that in geographical space, the distance preferences in the network construction process and the geographically decaying infection rate are key factors that make the SN immunization strategy outperforms the CN strategy. It indicates that even in the absence of the full knowledge of network connectivity we can still stop the epidemic spread efficiently only by using geographical information as in the SN strategy, which may have potential applications for preventing the real epidemic spread.

  14. Personality Homophily and Geographic Distance in Facebook.

    Science.gov (United States)

    Noë, Nyala; Whitaker, Roger M; Allen, Stuart M

    2018-05-24

    Personality homophily remains an understudied aspect of social networks, with the traditional focus concerning sociodemographic variables as the basis for assortativity, rather than psychological dispositions. We consider the effect of personality homophily on one of the biggest constraints to human social networks: geographic distance. We use the Big five model of personality to make predictions for each of the five facets: Openness to experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. Using a network of 313,669 Facebook users, we investigate the difference in geographic distance between homophilous pairs, in which both users scored similarly on a particular facet, and mixed pairs. In accordance with our hypotheses, we find that pairs of open and conscientious users are geographically further apart than mixed pairs. Pairs of extraverts, on the other hand, tend to be geographically closer together. We find mixed results for the Neuroticism facet, and no significant effects for the Agreeableness facet. The results are discussed in the context of personality homophily and the impact of geographic distance on social connections.

  15. Comparison of immunization strategies in geographical networks

    International Nuclear Information System (INIS)

    Wang Bing; Aihara, Kazuyuki; Kim, Beom Jun

    2009-01-01

    The epidemic spread and immunizations in geographically embedded scale-free (SF) and Watts-Strogatz (WS) networks are numerically investigated. We make a realistic assumption that it takes time which we call the detection time, for a vertex to be identified as infected, and implement two different immunization strategies: one is based on connection neighbors (CN) of the infected vertex with the exact information of the network structure utilized and the other is based on spatial neighbors (SN) with only geographical distances taken into account. We find that the decrease of the detection time is crucial for a successful immunization in general. Simulation results show that for both SF networks and WS networks, the SN strategy always performs better than the CN strategy, especially for more heterogeneous SF networks at long detection time. The observation is verified by checking the number of the infected nodes being immunized. We found that in geographical space, the distance preferences in the network construction process and the geographically decaying infection rate are key factors that make the SN immunization strategy outperforms the CN strategy. It indicates that even in the absence of the full knowledge of network connectivity we can still stop the epidemic spread efficiently only by using geographical information as in the SN strategy, which may have potential applications for preventing the real epidemic spread.

  16. Louisiana State Soil Geographic, General Soil Map, Geographic NAD83, NWRC (1998) [statsgo_soils_NWRC_1998

    Data.gov (United States)

    Louisiana Geographic Information Center — This data set contains vector line map information. The vector data contain selected base categories of geographic features, and characteristics of these features,...

  17. Thematic cartography as a geographical application

    Directory of Open Access Journals (Sweden)

    Drago Perko

    2002-12-01

    Full Text Available A thematic map may be a geographical application (tool in itself or the basis for some other geographical work. The development of Slovene thematic cartography accelerated considerably following the independence of the country in 1991. From the viewpoint of content and technology, its greatest achievements are the Geographical Atlas of Slovenia and the National Atlas of Slovenia, which are outstanding achievements at the international level and of great significance for the promotion of Slovenia and Slovene geography and cartography. However, this rapid development has been accompanied by numerous problems, for example, the ignoring of various Slovene and international conventions for the preparation of maps including United Nations resolutions, Slovene and international (SIST ISO, and copyright laws.

  18. Training for Internationalization through Domestic Geographical Dispersion

    DEFF Research Database (Denmark)

    Santangelo, Grazia D.; Stucchi, Tamara

    Traditionally created to deal with the unfriendly domestic environment, business groups (BGs) are increasingly internationalizing. However, how BGs can reconcile their strictly domestic orientation with an international dimension still remains an open question. Drawing on arguments from...... organizational learning, we seek to solve this puzzle in relation to the internationalization of Indian BGs. In particular, we argue that in heterogeneous domestic emerging markets BG’s geographical dispersion across sub-national states provides training for internationalization. To internationalize successfully......, BGs need to develop the capability of managing geographically dispersed units in institutional heterogeneous contexts. Domestic geographical dispersion would indeed help the BG dealing with different regulations, customers and infrastructures. However, there is less scope for such training as BGs...

  19. Geographical data structures supporting regional analysis

    International Nuclear Information System (INIS)

    Edwards, R.G.; Durfee, R.C.

    1978-01-01

    In recent years the computer has become a valuable aid in solving regional environmental problems. Over a hundred different geographic information systems have been developed to digitize, store, analyze, and display spatially distributed data. One important aspect of these systems is the data structure (e.g. grids, polygons, segments) used to model the environment being studied. This paper presents eight common geographic data structures and their use in studies of coal resources, power plant siting, population distributions, LANDSAT imagery analysis, and landuse analysis

  20. Tanzanian food origins and protected geographical indications

    DEFF Research Database (Denmark)

    John, Innocensia Festo; Egelyng, Henrik; Lokina, Azack

    2016-01-01

    As the world's population is constantly growing, food security will remain on the policy Agenda, particularly in Africa. At the same time, global food systems experience a new wave focusing on local foods and food sovereignty featuring high quality food products of verifiable geographical origin...... of food origin products in Tanzania that have potential for GI certification. The hypothesis was that there are origin products in Tanzania whose unique characteristics are linked to the area of production. Geographical indications can be useful policy instruments contributing to food security...... the diversity of supply of natural and unique quality products and so contribute to enhanced food security....

  1. Energy gradients and the geographic distribution of local ant diversity.

    Science.gov (United States)

    Kaspari, Michael; Ward, Philip S; Yuan, May

    2004-08-01

    Geographical diversity gradients, even among local communities, can ultimately arise from geographical differences in speciation and extinction rates. We evaluated three models--energy-speciation, energy-abundance, and area--that predict how geographic trends in net diversification rates generate trends in diversity. We sampled 96 litter ant communities from four provinces: Australia, Madagascar, North America, and South America. The energy-speciation hypothesis best predicted ant species richness by accurately predicting the slope of the temperature diversity curve, and accounting for most of the variation in diversity. The communities showed a strong latitudinal gradient in species richness as well as inter-province differences in diversity. The former vanished in the temperature-diversity residuals, suggesting that the latitudinal gradient arises primarily from higher diversification rates in the tropics. However, inter-province differences in diversity persisted in those residuals--South American communities remained more diverse than those in North America and Australia even after the effects of temperature were removed.

  2. Data Matching Imputation System

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The DMIS dataset is a flat file record of the matching of several data set collections. Primarily it consists of VTRs, dealer records, Observer data in conjunction...

  3. Who cares and how much? The imputed economic contribution to the Canadian healthcare system of middle-aged and older unpaid caregivers providing care to the elderly.

    Science.gov (United States)

    Hollander, Marcus J; Liu, Guiping; Chappell, Neena L

    2009-01-01

    Canadians provide significant amounts of unpaid care to elderly family members and friends with long-term health problems. While some information is available on the nature of the tasks unpaid caregivers perform, and the amounts of time they spend on these tasks, the contribution of unpaid caregivers is often hidden. (It is recognized that some caregiving may be for short periods of time or may entail matters better described as "help" or "assistance," such as providing transportation. However, we use caregiving to cover the full range of unpaid care provided from some basic help to personal care.) Aggregate estimates of the market costs to replace the unpaid care provided are important to governments for policy development as they provide a means to situate the contributions of unpaid caregivers within Canada's healthcare system. The purpose of this study was to obtain an assessment of the imputed costs of replacing the unpaid care provided by Canadians to the elderly. (Imputed costs is used to refer to costs that would be incurred if the care provided by an unpaid caregiver was, instead, provided by a paid caregiver, on a direct hour-for-hour substitution basis.) The economic value of unpaid care as understood in this study is defined as the cost to replace the services provided by unpaid caregivers at rates for paid care providers.

  4. Imputing Variants in HLA-DR Beta Genes Reveals That HLA-DRB1 Is Solely Associated with Rheumatoid Arthritis and Systemic Lupus Erythematosus.

    Directory of Open Access Journals (Sweden)

    Kwangwoo Kim

    Full Text Available The genetic association of HLA-DRB1 with rheumatoid arthritis (RA and systemic lupus erythematosus (SLE is well documented, but association with other HLA-DR beta genes (HLA-DRB3, HLA-DRB4 and HLA-DRB5 has not been thoroughly studied, despite their similar functions and chromosomal positions. We examined variants in all functional HLA-DR beta genes in RA and SLE patients and controls, down to the amino-acid level, to better understand disease association with the HLA-DR locus. To this end, we improved an existing HLA reference panel to impute variants in all protein-coding HLA-DR beta genes. Using the reference panel, HLA variants were inferred from high-density SNP data of 9,271 RA-control subjects and 5,342 SLE-control subjects. Disease association tests were performed by logistic regression and log-likelihood ratio tests. After imputation using the newly constructed HLA reference panel and statistical analysis, we observed that HLA-DRB1 variants better accounted for the association between MHC and susceptibility to RA and SLE than did the other three HLA-DRB variants. Moreover, there were no secondary effects in HLA-DRB3, HLA-DRB4, or HLA-DRB5 in RA or SLE. Of all the HLA-DR beta chain paralogs, those encoded by HLA-DRB1 solely or dominantly influence susceptibility to RA and SLE.

  5. Geometric algorithms for delineating geographic regions

    NARCIS (Netherlands)

    Reinbacher, I.

    2006-01-01

    Everyone of us is used to geographical regions like the south of Utrecht, the dutch Randstad, or the mountainous areas of Austria. Some of these regions have crisp, fixed boundaries like Utrecht or Austria. Others, like the dutch Randstad and the Austrian mountains, have no such boundaries and are

  6. Europeans among themselves: Geographical and linguistic stereotypes

    NARCIS (Netherlands)

    Mamadouh, V.D.; Dąbrowska, A.; Pisarek, W.; Stickel, G.

    2017-01-01

    Stereotypes can be studied from the perspective of political geography and critical geopolitics as part of geographical imaginations, in other words those geopolitical representations that help us make sense of the world around us. They necessarily frame our perception of ongoing events, and inform

  7. Using Educational Tourism in Geographical Education

    Science.gov (United States)

    Prakapiene, Dalia; Olberkyte, Loreta

    2013-01-01

    The article analyses and defines the concept of educational tourism, presents the structure of the concept and looks into the opportunities for using educational tourism in geographical education. In order to reveal such opportunities a research was carried out in the Lithuanian national and regional parks using the qualitative method of content…

  8. Geographic distribution of wild potato species

    NARCIS (Netherlands)

    Hijmans, R.J.; Spooner, D.M.

    2001-01-01

    The geographic distribution of wild potatoes (Solanaceae sect. Petota) was analyzed using a database of 6073 georeferenced observations. Wild potatoes occur in 16 countries, but 88% of the observations are from Argentina, Bolivia, Mexico, and Peru. Most species are rare and narrowly endemic: for 77

  9. Geography and Geographical Information Science: Interdisciplinary Integrators

    Science.gov (United States)

    Ellul, Claire

    2015-01-01

    To understand how Geography and Geographical Information Science (GIS) can contribute to Interdisciplinary Research (IDR), it is relevant to articulate the differences between the different types of such research. "Multidisciplinary" researchers work in a "parallel play" mode, completing work in their disciplinary work streams…

  10. Geographic pathology of Helicobacter pylori gastritis

    NARCIS (Netherlands)

    Liu, Yi; Ponsioen, Cyriel I. J.; Xiao, Shu-Dong; Tytgat, Guido N. J.; ten Kate, Fiebo J. W.

    2005-01-01

    Background and aim. Helicobacter pylori is etiologically associated with gastritis and gastric cancer. There are significant geographical differences between the clinical manifestation of H. pylori infections. The aim of this study was to compare gastric mucosal histology in relation to age among H.

  11. Execution Management Solutions for Geographically Distributed Simulations

    NARCIS (Netherlands)

    Berg, T.W. van den; Jansen, H.G.M.; Jansen, R.E.J.; Prins, L.M.

    2009-01-01

    Managing the initialization, execution control and monitoring of HLA federates is not always straightforward, especially for a geographically distributed time managed federation. Issues include pre and post run-time data distribution and run-time data collection; starting, stopping and monitoring

  12. Geographic Analysis of Neurosurgery Workforce in Korea.

    Science.gov (United States)

    Park, Hye Ran; Park, Sukh Que; Kim, Jae Hyun; Hwang, Jae Chan; Lee, Gwang Soo; Chang, Jae-Chil

    2018-01-01

    In respect of the health and safety of the public, universal access to health care is an issue of the greatest importance. The geographic distribution of doctors is one of the important factors contributing to access to health care. The aim of this study is to assess the imbalances in the geographic distribution of neurosurgeons across Korea. Population data was obtained from the National Statistical Office. We classified geographic groups into 7 metropolitan cities, 78 non-metropolitan cities, and 77 rural areas. The number of doctors and neurosurgeons per 100000 populations in each county unit was calculated using the total number of doctors and neurosurgeons at the country level from 2009 to 2015. The density levels of neurosurgeon and doctor were calculated and depicted in maps. Between 2009 and 2015, the number of neurosurgeons increased from 2002 to 2557, and the ratio of neurosurgeons per 100000 populations increased from 4.02 to 4.96. The number of neurosurgeons per 100000 populations was highest in metropolitan cities and lowest in rural areas from 2009 to 2015. A comparison of the geographic distribution of neurosurgeons in 2009 and 2015 showed an increase in the regional gap. The neurosurgeon density was affected by country unit characteristics ( p =0.000). Distribution of neurosurgeons throughout Korea is uneven. Neurosurgeons are being increasingly concentrated in a limited number of metropolitan cities. This phenomenon will need to be accounted when planning for a supply of neurosurgeons, allocation of resources and manpower, and the provision of regional neurosurgical services.

  13. Geographic disparity in kidney transplantation under KAS.

    Science.gov (United States)

    Zhou, Sheng; Massie, Allan B; Luo, Xun; Ruck, Jessica M; Chow, Eric K H; Bowring, Mary G; Bae, Sunjae; Segev, Dorry L; Gentry, Sommer E

    2017-12-12

    The Kidney Allocation System fundamentally altered kidney allocation, causing a substantial increase in regional and national sharing that we hypothesized might impact geographic disparities. We measured geographic disparity in deceased donor kidney transplant (DDKT) rate under KAS (6/1/2015-12/1/2016), and compared that with pre-KAS (6/1/2013-12/3/2014). We modeled DSA-level DDKT rates with multilevel Poisson regression, adjusting for allocation factors under KAS. Using the model we calculated a novel, improved metric of geographic disparity: the median incidence rate ratio (MIRR) of transplant rate, a measure of DSA-level variation that accounts for patient casemix and is robust to outlier values. Under KAS, MIRR was 1.75 1.81 1.86 for adults, meaning that similar candidates across different DSAs have a median 1.81-fold difference in DDKT rate. The impact of geography was greater than the impact of factors emphasized by KAS: having an EPTS score ≤20% was associated with a 1.40-fold increase (IRR =  1.35 1.40 1.45 , P geographic disparities with KAS (P = .3). Despite extensive changes to kidney allocation under KAS, geography remains a primary determinant of access to DDKT. © 2017 The American Society of Transplantation and the American Society of Transplant Surgeons.

  14. The National Geographic Society's Teaching Geography Project.

    Science.gov (United States)

    Bockenhauer, Mark H.

    1993-01-01

    Contends that the National Geographic Society's Teaching Geography Project is an inservice teacher education success story. Describes the origins, objectives, and development of the project. Summarizes the impact of the project and contends that its success is the result of the workshop format and guided practice in instructional strategies. (CFR)

  15. GEOGRAPHERS AND ECOSYSTEMS: A POINT OF VIEW

    African Journals Online (AJOL)

    are fearful of tackling it, mainly because they have never studied ecology or any of the pure sciences. Most of these geographers are trained in the arts disciplines and thus feel at a disadvantage even when confronted only by a 'jargon' which is un- familiar. They perceive themselves as being inade- quate and are unhappy ...

  16. The Geographic Extent of Global Supply Chains

    DEFF Research Database (Denmark)

    Machikita, Tomohiro; Ueki, Yasushi

    2012-01-01

    We study the extent to which inter-firm relationships are locally concentrated and what determines firm differences in geographic proximity to domestic or foreign suppliers and customers. From micro-data on selfreported customer and supplier data of firms in Indonesia, the Philippines, Thailand, ...

  17. Geographical information modelling for land resource survey

    NARCIS (Netherlands)

    Bruin, de S.

    2000-01-01

    The increasing popularity of geographical information systems (GIS) has at least three major implications for land resources survey. Firstly, GIS allows alternative and richer representation of spatial phenomena than is possible with the traditional paper map. Secondly, digital technology has

  18. Teaching Geographic Field Methods Using Paleoecology

    Science.gov (United States)

    Walsh, Megan K.

    2014-01-01

    Field-based undergraduate geography courses provide numerous pedagogical benefits including an opportunity for students to acquire employable skills in an applied context. This article presents one unique approach to teaching geographic field methods using paleoecological research. The goals of this course are to teach students key geographic…

  19. Groundwater quality mapping using geographic information system ...

    African Journals Online (AJOL)

    Spatial variations in ground water quality in the corporation area of Gulbarga City located in the northern part of Karnataka State, India, have been studied using geographic information system (GIS) technique. GIS, a tool which is used for storing, analyzing and displaying spatial data is also used for investigating ground ...

  20. Formal Ontologies and Uncertainty. In Geographical Knowledge

    Directory of Open Access Journals (Sweden)

    Matteo Caglioni

    2014-05-01

    Full Text Available Formal ontologies have proved to be a very useful tool to manage interoperability among data, systems and knowledge. In this paper we will show how formal ontologies can evolve from a crisp, deterministic framework (ontologies of hard knowledge to new probabilistic, fuzzy or possibilistic frameworks (ontologies of soft knowledge. This can considerably enlarge the application potential of formal ontologies in geographic analysis and planning, where soft knowledge is intrinsically linked to the complexity of the phenomena under study.  The paper briefly presents these new uncertainty-based formal ontologies. It then highlights how ontologies are formal tools to define both concepts and relations among concepts. An example from the domain of urban geography finally shows how the cause-to-effect relation between household preferences and urban sprawl can be encoded within a crisp, a probabilistic and a possibilistic ontology, respectively. The ontology formalism will also determine the kind of reasoning that can be developed from available knowledge. Uncertain ontologies can be seen as the preliminary phase of more complex uncertainty-based models. The advantages of moving to uncertainty-based models is evident: whether it is in the analysis of geographic space or in decision support for planning, reasoning on geographic space is almost always reasoning with uncertain knowledge of geographic phenomena.

  1. Ontology-based geographic data set integration

    NARCIS (Netherlands)

    Uitermark, H.T.J.A.; Uitermark, Harry T.; Oosterom, Peter J.M.; Mars, Nicolaas; Molenaar, Martien; Molenaar, M.

    1999-01-01

    In order to develop a system to propagate updates we investigate the semantic and spatial relationships between independently produced geographic data sets of the same region (data set integration). The goal of this system is to reduce operator intervention in update operations between corresponding

  2. Spa and Wellness Tourism in Slovakia (A Geographical Analysis

    Directory of Open Access Journals (Sweden)

    Kasagranda Anton

    2017-06-01

    Full Text Available Spa industry is presently an inherent part of Slovak tourism. For this reason, it has become a major interest of scientific and professional literature (economics, management, sociology, or geography. The main topic of this paper is the evaluation of tourism in Slovakia through a geographic analysis. This paper briefly evaluates the development and the importance of spa, spa tourism and wellness and their main research areas and issues. Furthermore, the primary sources of tourism development, the overview of spa tourism and the wellness resorts, the accommodation establishments and the visitation rate are evaluated as well. In conclusion, functional and spatial typification of spa tourism and wellness in Slovakia is presented. The structure of the paper is designed to be appropriate for a comparison with V4 countries.

  3. Quality evaluation of cortex berberidis from different geographical ...

    African Journals Online (AJOL)

    Tropical Journal of Pharmaceutical Research ... Methods: A simple, precise and accurate high performance liquid chromatography (HPLC) method was first developed for simultaneous quantification of four active ... Method validation was performed in terms of precision, repeatability, stability, accuracy, and linearity. Besides ...

  4. Geographical conceptualization of quality of life

    Directory of Open Access Journals (Sweden)

    Murgaš František

    2016-12-01

    Full Text Available The conceptualization of quality of life in terms of geography is based on two assumptions. The first assumption is that the quality of life consists of two dimensions: subjective and objective. The subjective is known as ‘well-being’, while the objective is the proposed term ‘quality of place’. The second assumption is based on the recognition that quality of life is always a spatial dimension. The concept of quality of life is closely linked with the concept of a good life; geographers enriched this concept by using the term ‘good place’ as a place in which the conditions are created for a good life. The quality of life for individuals in terms of a good place overlaps with the quality of life in society, namely the societal quality of life. The geographical conceptualisation of quality of life is applied to settlements within the city of Liberec.

  5. A Systems Perspective on Volunteered Geographic Information

    Directory of Open Access Journals (Sweden)

    Victoria Fast

    2014-12-01

    Full Text Available Volunteered geographic information (VGI is geographic information collected by way of crowdsourcing. However, the distinction between VGI as an information product and the processes that create VGI is blurred. Clearly, the environment that influences the creation of VGI is different than the information product itself, yet most literature treats them as one and the same. Thus, this research is motivated by the need to formalize and standardize the systems that support the creation of VGI. To this end, we propose a conceptual framework for VGI systems, the main components of which—project, participants, and technical infrastructure—form an environment conducive to the creation of VGI. Drawing on examples from OpenStreetMap, Ushahidi, and RinkWatch, we illustrate the pragmatic relevance of these components. Applying a system perspective to VGI allows us to better understand the components and functionality needed to effectively create VGI.

  6. Geographical information systems and computer cartography

    CERN Document Server

    Jones, Chris B

    2014-01-01

    A concise text presenting the fundamental concepts in Geographical Information Systems (GIS), emphasising an understanding of techniques in management, analysis and graphic display of spatial information. Divided into five parts - the first part reviews the development and application of GIS, followed by a summary of the characteristics and representation of geographical information. It concludes with an overview of the functions provided by typical GIS systems. Part Two introduces co-ordinate systems and map projections, describes methods for digitising map data and gives an overview of remote sensing. Part Three deals with data storage and database management, as well as specialised techniques for accessing spatial data. Spatial modelling and analytical techniques for decision making form the subject of Part Four, while the final part is concerned with graphical representation, emphasising issues of graphics technology, cartographic design and map generalisation.

  7. SOLID WASTE: PRESENCE AND THREATIN GEOGRAPHICAL SPACE

    Directory of Open Access Journals (Sweden)

    Clesley Maria Tavares do Nascimento

    2017-12-01

    Full Text Available This article deals with the trajectory of the solid waste in different historical periods, configuring them as a constructive element of geographical space. The intention to bring the theme from the timeline perspective, is marked out in the conviction of the inseparability of the categories of space and time and its importance in understanding a geographical phenomenon. The methodological support of this research relied on the documentary type of research involving literature, consultation of secondary sources such as books, academic journals, dissertations and theses on the subject. The results presented and discussed in this paper indicated that the production of waste is adjacent to historical time, reflects societies and techniques that generated them, and is a permanent part of the dialectical process of spatial formation.

  8. Discovery and fine-mapping of adiposity loci using high density imputation of genome-wide association studies in individuals of African ancestry: African Ancestry Anthropometry Genetics Consortium.

    Science.gov (United States)

    Ng, Maggie C Y; Graff, Mariaelisa; Lu, Yingchang; Justice, Anne E; Mudgal, Poorva; Liu, Ching-Ti; Young, Kristin; Yanek, Lisa R; Feitosa, Mary F; Wojczynski, Mary K; Rand, Kristin; Brody, Jennifer A; Cade, Brian E; Dimitrov, Latchezar; Duan, Qing; Guo, Xiuqing; Lange, Leslie A; Nalls, Michael A; Okut, Hayrettin; Tajuddin, Salman M; Tayo, Bamidele O; Vedantam, Sailaja; Bradfield, Jonathan P; Chen, Guanjie; Chen, Wei-Min; Chesi, Alessandra; Irvin, Marguerite R; Padhukasahasram, Badri; Smith, Jennifer A; Zheng, Wei; Allison, Matthew A; Ambrosone, Christine B; Bandera, Elisa V; Bartz, Traci M; Berndt, Sonja I; Bernstein, Leslie; Blot, William J; Bottinger, Erwin P; Carpten, John; Chanock, Stephen J; Chen, Yii-Der Ida; Conti, David V; Cooper, Richard S; Fornage, Myriam; Freedman, Barry I; Garcia, Melissa; Goodman, Phyllis J; Hsu, Yu-Han H; Hu, Jennifer; Huff, Chad D; Ingles, Sue A; John, Esther M; Kittles, Rick; Klein, Eric; Li, Jin; McKnight, Barbara; Nayak, Uma; Nemesure, Barbara; Ogunniyi, Adesola; Olshan, Andrew; Press, Michael F; Rohde, Rebecca; Rybicki, Benjamin A; Salako, Babatunde; Sanderson, Maureen; Shao, Yaming; Siscovick, David S; Stanford, Janet L; Stevens, Victoria L; Stram, Alex; Strom, Sara S; Vaidya, Dhananjay; Witte, John S; Yao, Jie; Zhu, Xiaofeng; Ziegler, Regina G; Zonderman, Alan B; Adeyemo, Adebowale; Ambs, Stefan; Cushman, Mary; Faul, Jessica D; Hakonarson, Hakon; Levin, Albert M; Nathanson, Katherine L; Ware, Erin B; Weir, David R; Zhao, Wei; Zhi, Degui; Arnett, Donna K; Grant, Struan F A; Kardia, Sharon L R; Oloapde, Olufunmilayo I; Rao, D C; Rotimi, Charles N; Sale, Michele M; Williams, L Keoki; Zemel, Babette S; Becker, Diane M; Borecki, Ingrid B; Evans, Michele K; Harris, Tamara B; Hirschhorn, Joel N; Li, Yun; Patel, Sanjay R; Psaty, Bruce M; Rotter, Jerome I; Wilson, James G; Bowden, Donald W; Cupples, L Adrienne; Haiman, Christopher A; Loos, Ruth J F; North, Kari E

    2017-04-01

    Genome-wide association studies (GWAS) have identified >300 loci associated with measures of adiposity including body mass index (BMI) and waist-to-hip ratio (adjusted for BMI, WHRadjBMI), but few have been identified through screening of the African ancestry genomes. We performed large scale meta-analyses and replications in up to 52,895 individuals for BMI and up to 23,095 individuals for WHRadjBMI from the African Ancestry Anthropometry Genetics Consortium (AAAGC) using 1000 Genomes phase 1 imputed GWAS to improve coverage of both common and low frequency variants in the low linkage disequilibrium African ancestry genomes. In the sex-combined analyses, we identified one novel locus (TCF7L2/HABP2) for WHRadjBMI and eight previously established loci at P African ancestry individuals. An additional novel locus (SPRYD7/DLEU2) was identified for WHRadjBMI when combined with European GWAS. In the sex-stratified analyses, we identified three novel loci for BMI (INTS10/LPL and MLC1 in men, IRX4/IRX2 in women) and four for WHRadjBMI (SSX2IP, CASC8, PDE3B and ZDHHC1/HSD11B2 in women) in individuals of African ancestry or both African and European ancestry. For four of the novel variants, the minor allele frequency was low (African ancestry sex-combined and sex-stratified analyses, 26 BMI loci and 17 WHRadjBMI loci contained ≤ 20 variants in the credible sets that jointly account for 99% posterior probability of driving the associations. The lead variants in 13 of these loci had a high probability of being causal. As compared to our previous HapMap imputed GWAS for BMI and WHRadjBMI including up to 71,412 and 27,350 African ancestry individuals, respectively, our results suggest that 1000 Genomes imputation showed modest improvement in identifying GWAS loci including low frequency variants. Trans-ethnic meta-analyses further improved fine mapping of putative causal variants in loci shared between the African and European ancestry populations.

  9. Do geographically isolated wetlands influence landscape functions?

    OpenAIRE

    Cohen, Matthew J.; Creed, Irena F.; Alexander, Laurie; Basu, Nandita B.; Calhoun, Aram J. K.; Craft, Christopher; D’Amico, Ellen; DeKeyser, Edward; Fowler, Laurie; Golden, Heather E.; Jawitz, James W.; Kalla, Peter; Kirkman, L. Katherine; Lane, Charles R.; Lang, Megan

    2016-01-01

    Geographically isolated wetlands (GIWs), those surrounded by uplands, exchange materials, energy, and organisms with other elements in hydrological and habitat networks, contributing to landscape functions, such as flow generation, nutrient and sediment retention, and biodiversity support. GIWs constitute most of the wetlands in many North American landscapes, provide a disproportionately large fraction of wetland edges where many functions are enhanced, and form complexes with other water bo...

  10. Geographic Analysis of Neurosurgery Workforce in Korea

    Science.gov (United States)

    Park, Hye Ran; Park, Sukh Que; Kim, Jae Hyun; Hwang, Jae Chan; Lee, Gwang Soo; Chang, Jae-Chil

    2018-01-01

    Objective In respect of the health and safety of the public, universal access to health care is an issue of the greatest importance. The geographic distribution of doctors is one of the important factors contributing to access to health care. The aim of this study is to assess the imbalances in the geographic distribution of neurosurgeons across Korea. Methods Population data was obtained from the National Statistical Office. We classified geographic groups into 7 metropolitan cities, 78 non-metropolitan cities, and 77 rural areas. The number of doctors and neurosurgeons per 100000 populations in each county unit was calculated using the total number of doctors and neurosurgeons at the country level from 2009 to 2015. The density levels of neurosurgeon and doctor were calculated and depicted in maps. Results Between 2009 and 2015, the number of neurosurgeons increased from 2002 to 2557, and the ratio of neurosurgeons per 100000 populations increased from 4.02 to 4.96. The number of neurosurgeons per 100000 populations was highest in metropolitan cities and lowest in rural areas from 2009 to 2015. A comparison of the geographic distribution of neurosurgeons in 2009 and 2015 showed an increase in the regional gap. The neurosurgeon density was affected by country unit characteristics (p=0.000). Conclusion Distribution of neurosurgeons throughout Korea is uneven. Neurosurgeons are being increasingly concentrated in a limited number of metropolitan cities. This phenomenon will need to be accounted when planning for a supply of neurosurgeons, allocation of resources and manpower, and the provision of regional neurosurgical services. PMID:29354242

  11. U Plant Geographic Zone Cleanup Prototype

    International Nuclear Information System (INIS)

    Romine, L.D.; Leary, K.D.; Lackey, M.B.; Robertson, J.R.

    2006-01-01

    The U Plant geographic zone (UPZ) occupies 0.83 square kilometers on the Hanford Site Central Plateau (200 Area). It encompasses the U Plant canyon (221-U Facility), ancillary facilities that supported the canyon, soil waste sites, and underground pipelines. The UPZ cleanup initiative coordinates the cleanup of the major facilities, ancillary facilities, waste sites, and contaminated pipelines (collectively identified as 'cleanup items') within the geographic zone. The UPZ was selected as a geographic cleanup zone prototype for resolving regulatory, technical, and stakeholder issues and demonstrating cleanup methods for several reasons: most of the area is inactive, sufficient characterization information is available to support decisions, cleanup of the high-risk waste sites will help protect the groundwater, and the zone contains a representative cross-section of the types of cleanup actions that will be required in other geographic zones. The UPZ cleanup demonstrates the first of 22 integrated zone cleanup actions on the Hanford Site Central Plateau to address threats to groundwater, the environment, and human health. The UPZ contains more than 100 individual cleanup items. Cleanup actions in the zone will be undertaken using multiple regulatory processes and decision documents. Cleanup actions will include building demolition, waste site and pipeline excavation, and the construction of multiple, large engineered barriers. In some cases, different cleanup actions may be taken at item locations that are immediately adjacent to each other. The cleanup planning and field activities for each cleanup item must be undertaken in a coordinated and cohesive manner to ensure effective execution of the UPZ cleanup initiative. The UPZ zone cleanup implementation plan (ZCIP) [1] was developed to address the need for a fundamental integration tool for UPZ cleanup. As UPZ cleanup planning and implementation moves forward, the ZCIP is intended to be a living document that will

  12. Using Educational Tourism in Geographical Education

    OpenAIRE

    PRAKAPIENĖ, Dalia; OLBERKYTĖ, Loreta

    2014-01-01

    The article analyses and defines the concept of educational tourism, presents the structure of the concept and looks into the opportunities for using educational tourism in geographical education. In order to reveal such opportunities a research was carried out in the Lithuanian national and regional parks using the qualitative method of content analysis and the quantitative method of questionnaire survey. The authors of the research identified the educational excursion activities conducted i...

  13. Globalization in history : a geographical perspective

    OpenAIRE

    Crafts, N. F. R.; Venables, Anthony

    2001-01-01

    This paper argues that a geographical perspectie is fundamental to understanding comparative economic development in the context of globalization. Central to this view is the role of agglomeration in productivity performance; size and location matter. The tools of the new economic geography are used to illuminate important epidsodes when the relative position of major eeconmies radically changed; the rise of the United States at the beginning and of East Asia at the end of the twentieth centu...

  14. PEDIATRIC FITNESS: SECULAR TRENDS AND GEOGRAPHIC VARIABILITY

    Directory of Open Access Journals (Sweden)

    Grant R. Tomkinson

    2007-06-01

    Full Text Available DESCRIPTION This book describes and discusses children's physical capacity in terms of aerobic and anaerobic power generation according to secular trends and geographic variability. PURPOSE To discuss the controversial issue of whether present day's children and adolescents are fitter than their equals of the past and whether they are fitter if they live in the more prosperous countries. AUDIENCE Pediatricians, medical practitioners, physical educators, exercise and/or sport scientists, exercise physiologists, personal trainers and graduate students in relevant fields will find this book helpful when dealing with contemporary trends and geographic variability in pediatric fitness. FEATURES The volume starts by examining the general picture on children fitness by the editors. The individual chapter's authors discuses the data gathered since the late 1950s on secular trends and geographic changeability in aerobic and anaerobic pediatric fitness performances of children and adolescents from 23 countries in Africa, Asia, Australasia, Europe, the Middle East and North America. There are chapters proposing that there is proof that there has been a world-wide decline in pediatric aerobic performance in recent decades, relative stability in anaerobic performance, and that the best performing children come from northern and central Europe. In final chapters possible causes to that end are considered, including whether weakening in aerobic performance are the result of distributional or widespread declines, and whether increases in obesity alone can explain the failure in aerobic performance. ASSESSMENT The editors have assembled a volume of Medicine and Sports Science that is necessary and essential reading for all who are interested in understanding and improving the fitness of children. The readers will find useful information in this book on secular trends and geographic variability in pediatric fitness. I believe, the book will serve as a first

  15. Deterrence and Geographical Externalities in Auto Theft

    OpenAIRE

    Marco Gonzalez-Navarro

    2013-01-01

    Understanding the degree of geographical crime displacement is crucial for the design of crime prevention policies. This paper documents changes in automobile theft risk that were generated by the plausibly exogenous introduction of Lojack, a highly effective stolen vehicle recovery device, into a number of new Ford car models in some Mexican states, but not others. Lojack-equipped vehicles in Lojack-coverage states experienced a 48 percent reduction in theft risk due to deterrence effects. H...

  16. Geographical and temporal changes of anthropometric traits in historical Yemen.

    Science.gov (United States)

    Danubio, Maria Enrica; Milia, Nicola; Coppa, Alfredo; Rufo, Fabrizio; Sanna, Emanuele

    2016-02-01

    This study investigates secular changes of anthropometric variables among four geographic groups in historical Yemen, to evaluate possible regional differences in the evolution of living standards. Nineteen somatic and cephalic measures collected by Coon in 1939, and 8 anthropometric indices in 1244 Yemenite adult males were analyzed. The individuals were divided into 10-year age groups. Within-group variations were tested by One-way ANCOVA (age as covariate). ANCOVA (controlling for age), and Forward stepwise discriminant analysis were used to evaluate and represent regional differences. ANCOVA and discriminant analysis confirmed and enhanced previous findings. At the time, the Yemenite population presented high intergroup heterogeneity. The highest mean values of height at all ages were found in the "mountain" region, which is characterized by very fertile soils and where, nowadays, most of the cereals and pulses are grown and where most livestock is raised. Within-group variations were limited and generally inconsistent in all geographic regions and concern vertical dimensions, but mean values of height never differed. The prolonged internal isolation of these groups resulted in significant regional morphometric differentiation. The main evidence comes from height which suggests that socioeconomic factors have played a role. Nevertheless, the possible better living conditions experienced by the "mountain" group, with the highest mean values of stature in all periods, did not allow the secular trend to take place in that region, too. Copyright © 2015. Published by Elsevier GmbH.

  17. Geographically isolated wetlands: Rethinking a misnomer

    Science.gov (United States)

    Mushet, David M.; Calhoun, Aram J.K.; Alexander, Laurie C.; Cohen, Matthew J.; DeKeyser, Edward S.; Fowler, Laurie G.; Lane, Charles R.; Lang, Megan W.; Rains, Mark C.; Walls, Susan

    2015-01-01

    We explore the category “geographically isolated wetlands” (GIWs; i.e., wetlands completely surrounded by uplands at the local scale) as used in the wetland sciences. As currently used, the GIW category (1) hampers scientific efforts by obscuring important hydrological and ecological differences among multiple wetland functional types, (2) aggregates wetlands in a manner not reflective of regulatory and management information needs, (3) implies wetlands so described are in some way “isolated,” an often incorrect implication, (4) is inconsistent with more broadly used and accepted concepts of “geographic isolation,” and (5) has injected unnecessary confusion into scientific investigations and discussions. Instead, we suggest other wetland classification systems offer more informative alternatives. For example, hydrogeomorphic (HGM) classes based on well-established scientific definitions account for wetland functional diversity thereby facilitating explorations into questions of connectivity without an a priori designation of “isolation.” Additionally, an HGM-type approach could be used in combination with terms reflective of current regulatory or policymaking needs. For those rare cases in which the condition of being surrounded by uplands is the relevant distinguishing characteristic, use of terminology that does not unnecessarily imply isolation (e.g., “upland embedded wetlands”) would help alleviate much confusion caused by the “geographically isolated wetlands” misnomer.

  18. GEOGRAPHICAL EDUCATION MEDIATIZATION AND MEDIASECURITY ISSUES

    Directory of Open Access Journals (Sweden)

    M. R. Arpentieva

    2017-01-01

    Full Text Available The article is devoted to the interaction of legal and moral development of media technologies in the context of geographical education. The article summarizes the experience of the theoretical analysis of mediatization in geographic education, the legal and moral aspects of the disorders and ways of their prevention and correction in the process of educational interaction between teacher and student, between student and teacher, mediated mediatechnologies. It is noted that geographical education in the modern world is education, which is closely associated with the use of media technologies. In other types of education the role of media technologies in improving the quality of education is less obvious, in the field of teaching and learning geography, it speaks very clearly. Therefore, the problems associated with its mediatization, are very important and their solution is particularly compelling. These issues are primarily associated with actively flowing social, economic, political and ideological crisis in many communities and countries of the Earth. Many of them as in the “mirror” are reflected in the sphere of high technologies, including media technologies. The article provides guidance and direction to the correction of violations at the individual and social levels.

  19. Geographical assemblages of European raptors and owls

    Science.gov (United States)

    López-López, Pascual; Benavent-Corai, José; García-Ripollés, Clara

    2008-09-01

    In this work we look for geographical structure patterns in European raptors (Order: Falconiformes) and owls (Order: Strigiformes). For this purpose we have conducted our research using freely available tools such as statistical software and databases. To perform the study, presence-absence data for the European raptors and owl species (Class Aves) were downloaded from the BirdLife International website. Using the freely available "pvclust" R-package, we applied similarity Jaccard index and cluster analysis in order to delineate biogeographical relationships for European countries. According to the cluster of similarity, we found that Europe is structured into two main geographical assemblages. The larger length branch separated two main groups: one containing Iceland, Greenland and the countries of central, northern and northwestern Europe, and the other group including the countries of eastern, southern and southwestern Europe. Both groups are divided into two main subgroups. According to our results, the European raptors and owls could be considered structured into four meta-communities well delimited by suture zones defined by Remington (1968) [Remington, C.L., 1968. Suture-zones of hybrid interaction between recently joined biotas. Evol. Biol. 2, 321-428]. Climatic oscillations during the Quaternary Ice Ages could explain at least in part the modern geographical distribution of the group.

  20. Ontology for cell-based geographic information

    Science.gov (United States)

    Zheng, Bin; Huang, Lina; Lu, Xinhai

    2009-10-01

    Inter-operability is a key notion in geographic information science (GIS) for the sharing of geographic information (GI). That requires a seamless translation among different information sources. Ontology is enrolled in GI discovery to settle the semantic conflicts for its natural language appearance and logical hierarchy structure, which are considered to be able to provide better context for both human understanding and machine cognition in describing the location and relationships in the geographic world. However, for the current, most studies on field ontology are deduced from philosophical theme and not applicable for the raster expression in GIS-which is a kind of field-like phenomenon but does not physically coincide to the general concept of philosophical field (mostly comes from the physics concepts). That's why we specifically discuss the cell-based GI ontology in this paper. The discussion starts at the investigation of the physical characteristics of cell-based raster GI. Then, a unified cell-based GI ontology framework for the recognition of the raster objects is introduced, from which a conceptual interface for the connection of the human epistemology and the computer world so called "endurant-occurrant window" is developed for the better raster GI discovery and sharing.

  1. Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey.

    Science.gov (United States)

    Peyre, Hugo; Leplège, Alain; Coste, Joël

    2011-03-01

    Missing items are common in quality of life (QoL) questionnaires and present a challenge for research in this field. It remains unclear which of the various methods proposed to deal with missing data performs best in this context. We compared personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques using various realistic simulation scenarios of item missingness in QoL questionnaires constructed within the framework of classical test theory. Samples of 300 and 1,000 subjects were randomly drawn from the 2003 INSEE Decennial Health Survey (of 23,018 subjects representative of the French population and having completed the SF-36) and various patterns of missing data were generated according to three different item non-response rates (3, 6, and 9%) and three types of missing data (Little and Rubin's "missing completely at random," "missing at random," and "missing not at random"). The missing data methods were evaluated in terms of accuracy and precision for the analysis of one descriptive and one association parameter for three different scales of the SF-36. For all item non-response rates and types of missing data, multiple imputation and full information maximum likelihood appeared superior to the personal mean score and especially to hot deck in terms of accuracy and precision; however, the use of personal mean score was associated with insignificant bias (relative bias personal mean score appears nonetheless appropriate for dealing with items missing from completed SF-36 questionnaires in most situations of routine use. These results can reasonably be extended to other questionnaires constructed according to classical test theory.

  2. Imaging geographic atrophy in age-related macular degeneration.

    Science.gov (United States)

    Göbel, Arno P; Fleckenstein, Monika; Schmitz-Valckenberg, Steffen; Brinkmann, Christian K; Holz, Frank G

    2011-01-01

    Advances in retinal imaging technology have largely contributed to the understanding of the natural history, prognostic markers and disease mechanisms of geographic atrophy (GA) due to age-related macular degeneration. There is still no therapy available to halt or slow the disease process. In order to evaluate potential therapeutic effects in interventional trials, there is a need for precise quantification of the GA progression rate. Fundus autofluorescence imaging allows for accurate identification and segmentation of atrophic areas and currently represents the gold standard for evaluating progressive GA enlargement. By means of high-resolution spectral-domain optical coherence tomography, distinct microstructural alterations related to GA can be visualized. Copyright © 2011 S. Karger AG, Basel.

  3. Geographic Information Systems for the Regional Integration of Renewable Energies

    International Nuclear Information System (INIS)

    Amador Guerra, J.; Dominguez Bravo, J.

    2000-01-01

    This report is based on the project: The GIS in the regional integration of Renewable Energies for decentralised electricity production; developed by CIEMAT (Spanish Energy Research Centre) and UPM (Polytechnic University of Madrid, Spain) since 1997. The objective of this project is to analyse, evaluate and improve the GIS methodologies for application in RE and how GIS can aid in the evaluation and simulation of influence of technical, socio economical and geographical parameters. This project begin with the review of SOLARGIS methodology. SOLARGIS was developed by an european research team (included CIEMAT) in the frame of JOULE II Programme. In the first place this report described the state of the art in the application of GIS to Renewable Energies. In second place, the SOLARGIS review tasks and the application of this new product to Lorca (Murcia Region in Spain). Finally, the report describes the methodology for the spatial sensibility analysis. (Author) 24 refs

  4. The Rebirth of the Theory of Imputation in the Science of Criminal Law: to an Overcoming Stage or an Involution to Pre-Scientific Conceptions?

    Directory of Open Access Journals (Sweden)

    Nicolás Santiago Cordini

    2015-06-01

    Full Text Available The Science of Criminal Law goes through a moment that can be characterized as a “crisis”. Faced with this situation, have been proliferate theories that define themselves as “theories of imputation” that leave, in whole or in part, the theory of crime up to now dominating. The aim of this article is to analyze three theories enrolled under the concept of imputation and determine in which proportion they conserve other they get off the categories proposed by the theory of crime. Then, we will establish in which proportion these theories constitute an advance for the Science of Criminal Law or, on the contrary, they are manifestations of a retreat to a pre-scientific stage.

  5. New insights into the pharmacogenomics of antidepressant response from the GENDEP and STAR*D studies: rare variant analysis and high-density imputation.

    Science.gov (United States)

    Fabbri, C; Tansey, K E; Perlis, R H; Hauser, J; Henigsberg, N; Maier, W; Mors, O; Placentino, A; Rietschel, M; Souery, D; Breen, G; Curtis, C; Sang-Hyuk, L; Newhouse, S; Patel, H; Guipponi, M; Perroud, N; Bondolfi, G; O'Donovan, M; Lewis, G; Biernacka, J M; Weinshilboum, R M; Farmer, A; Aitchison, K J; Craig, I; McGuffin, P; Uher, R; Lewis, C M

    2017-11-21

    Genome-wide association studies have generally failed to identify polymorphisms associated with antidepressant response. Possible reasons include limited coverage of genetic variants that this study tried to address by exome genotyping and dense imputation. A meta-analysis of Genome-Based Therapeutic Drugs for Depression (GENDEP) and Sequenced Treatment Alternatives to Relieve Depression (STAR*D) studies was performed at the single-nucleotide polymorphism (SNP), gene and pathway levels. Coverage of genetic variants was increased compared with previous studies by adding exome genotypes to previously available genome-wide data and using the Haplotype Reference Consortium panel for imputation. Standard quality control was applied. Phenotypes were symptom improvement and remission after 12 weeks of antidepressant treatment. Significant findings were investigated in NEWMEDS consortium samples and Pharmacogenomic Research Network Antidepressant Medication Pharmacogenomic Study (PGRN-AMPS) for replication. A total of 7062 950 SNPs were analyzed in GENDEP (n=738) and STAR*D (n=1409). rs116692768 (P=1.80e-08, ITGA9 (integrin α9)) and rs76191705 (P=2.59e-08, NRXN3 (neurexin 3)) were significantly associated with symptom improvement during citalopram/escitalopram treatment. At the gene level, no consistent effect was found. At the pathway level, the Gene Ontology (GO) terms GO: 0005694 (chromosome) and GO: 0044427 (chromosomal part) were associated with improvement (corrected P=0.007 and 0.045, respectively). The association between rs116692768 and symptom improvement was replicated in PGRN-AMPS (P=0.047), whereas rs76191705 was not. The two SNPs did not replicate in NEWMEDS. ITGA9 codes for a membrane receptor for neurotrophins and NRXN3 is a transmembrane neuronal adhesion receptor involved in synaptic differentiation. Despite their meaningful biological rationale for being involved in antidepressant effect, replication was partial. Further studies may help in clarifying

  6. Treating pre-instrumental data as "missing" data: using a tree-ring-based paleoclimate record and imputations to reconstruct streamflow in the Missouri River Basin

    Science.gov (United States)

    Ho, M. W.; Lall, U.; Cook, E. R.

    2015-12-01

    Advances in paleoclimatology in the past few decades have provided opportunities to expand the temporal perspective of the hydrological and climatological variability across the world. The North American region is particularly fortunate in this respect where a relatively dense network of high resolution paleoclimate proxy records have been assembled. One such network is the annually-resolved Living Blended Drought Atlas (LBDA): a paleoclimate reconstruction of the Palmer Drought Severity Index (PDSI) that covers North America on a 0.5° × 0.5° grid based on tree-ring chronologies. However, the use of the LBDA to assess North American streamflow variability requires a model by which streamflow may be reconstructed. Paleoclimate reconstructions have typically used models that first seek to quantify the relationship between the paleoclimate variable and the environmental variable of interest before extrapolating the relationship back in time. In contrast, the pre-instrumental streamflow is here considered as "missing" data. A method of imputing the "missing" streamflow data, prior to the instrumental record, is applied through multiple imputation using chained equations for streamflow in the Missouri River Basin. In this method, the distribution of the instrumental streamflow and LBDA is used to estimate sets of plausible values for the "missing" streamflow data resulting in a ~600 year-long streamflow reconstruction. Past research into external climate forcings, oceanic-atmospheric variability and its teleconnections, and assessments of rare multi-centennial instrumental records demonstrate that large temporal oscillations in hydrological conditions are unlikely to be captured in most instrumental records. The reconstruction of multi-centennial records of streamflow will enable comprehensive assessments of current and future water resource infrastructure and operations under the existing scope of natural climate variability.

  7. Exploring the Interplay between Rescue Drugs, Data Imputation, and Study Outcomes: Conceptual Review and Qualitative Analysis of an Acute Pain Data Set.

    Science.gov (United States)

    Singla, Neil K; Meske, Diana S; Desjardins, Paul J

    2017-12-01

    In placebo-controlled acute surgical pain studies, provisions must be made for study subjects to receive adequate analgesic therapy. As such, most protocols allow study subjects to receive a pre-specified regimen of open-label analgesic drugs (rescue drugs) as needed. The selection of an appropriate rescue regimen is a critical experimental design choice. We hypothesized that a rescue regimen that is too liberal could lead to all study arms receiving similar levels of pain relief (thereby confounding experimental results), while a regimen that is too stringent could lead to a high subject dropout rate (giving rise to a preponderance of missing data). Despite the importance of rescue regimen as a study design feature, there exist no published review articles or meta-analysis focusing on the impact of rescue therapy on experimental outcomes. Therefore, when selecting a rescue regimen, researchers must rely on clinical factors (what analgesics do patients usually receive in similar surgical scenarios) and/or anecdotal evidence. In the following article, we attempt to bridge this gap by reviewing and discussing the experimental impacts of rescue therapy on a common acute surgical pain population: first metatarsal bunionectomy. The function of this analysis is to (1) create a framework for discussion and future exploration of rescue as a methodological study design feature, (2) discuss the interplay between data imputation techniques and rescue drugs, and (3) inform the readership regarding the impact of data imputation techniques on the validity of study conclusions. Our findings indicate that liberal rescue may degrade assay sensitivity, while stringent rescue may lead to unacceptably high dropout rates.

  8. GNIS: Geographic Names Information Systems - All features (2013)

    Data.gov (United States)

    Earth Data Analysis Center, University of New Mexico — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  9. Urethroplasty: a geographic disparity in care.

    Science.gov (United States)

    Burks, Frank N; Salmon, Scott A; Smith, Aaron C; Santucci, Richard A

    2012-06-01

    Urethroplasty is the gold standard for urethral strictures but its geographic prevalence throughout the United States is unknown. We analyzed where and how often urethroplasty was being performed in the United States compared to other treatment modalities for urethral stricture. De-identified case logs from the American Board of Urology were collected from certifying/recertifying urologists from 2004 to 2009. Results were categorized by ZIP codes to determine the geographic distribution. Case logs from 3,877 urologists (2,533 recertifying and 1,344 certifying) were reviewed including 1,836 urethroplasties, 13,080 urethrotomies and 19,564 urethral dilations. The proportion of urethroplasty varied widely among states (range 0% to 17%). The ratio of urethroplasty-to-urethrotomy/dilation also varied widely from state to state, but overall 1 urethroplasty was performed for every 17 urethrotomies or dilations performed. Certifying urologists were 3 times as likely to perform urethroplasty as recertifying urologists (12% vs 4%, respectively, pUrethroplasties were performed more commonly in states with residency programs (mean 5% vs 3%). Some states reported no urethroplasties during the observation period (Vermont, North Dakota, South Dakota, Maine and West Virginia). To our knowledge this is the first report on the geographic distribution of urethroplasty for urethral stricture disease. There are large variations in the rates of urethroplasty performed throughout the United States, indicating a disparity of care, especially for those regions in which few or no urethroplasties were reported. This disparity may decrease with time as younger certifying urologists are performing 3 times as many urethroplasties as older recertifying urologists. Copyright © 2012 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.

  10. The geosystems of complex geographical atlases

    Directory of Open Access Journals (Sweden)

    Jovanović Jasmina

    2012-01-01

    Full Text Available Complex geographical atlases represent geosystems of different hierarchical rank, complexity and diversity, scale and connection. They represent a set of large number of different pieces of information about geospace. Also, they contain systematized, correlative and in the apparent form represented pieces of information about space. The degree of information revealed in the atlas is precisely explained by its content structure and the form of presentation. The quality of atlas depends on the method of visualization of data and the quality of geodata. Cartographic visualization represents cognitive process. The analysis converts geospatial data into knowledge. A complex geographical atlas represents information complex of spatial - temporal coordinated database on geosystems of different complexity and territorial scope. Each geographical atlas defines a concrete geosystem. Systemic organization (structural and contextual determines its complexity and concreteness. In complex atlases, the attributes of geosystems are modeled and pieces of information are given in systematized, graphically unique form. The atlas can be considered as a database. In composing a database, semantic analysis of data is important. The result of semantic modeling is expressed in structuring of data information, in emphasizing logic connections between phenomena and processes and in defining their classes according to the degree of similarity. Accordingly, the efficiency of research of needed pieces of information in the process of the database use is enabled. An atlas map has a special power to integrate sets of geodata and present information contents in user - friendly and understandable visual and tactile way using its visual ability. Composing an atlas by systemic cartography requires the pieces of information on concrete - defined geosystems of different hierarchical level, the application of scientific methods and making of adequate number of analytical, synthetic

  11. Geographic Information Systems and Web Page Development

    Science.gov (United States)

    Reynolds, Justin

    2004-01-01

    The Facilities Engineering and Architectural Branch is responsible for the design and maintenance of buildings, laboratories, and civil structures. In order to improve efficiency and quality, the FEAB has dedicated itself to establishing a data infrastructure based on Geographic Information Systems, GIS. The value of GIS was explained in an article dating back to 1980 entitled "Need for a Multipurpose Cadastre" which stated, "There is a critical need for a better land-information system in the United States to improve land-conveyance procedures, furnish a basis for equitable taxation, and provide much-needed information for resource management and environmental planning." Scientists and engineers both point to GIS as the solution. What is GIS? According to most text books, Geographic Information Systems is a class of software that stores, manages, and analyzes mapable features on, above, or below the surface of the earth. GIS software is basically database management software to the management of spatial data and information. Simply put, Geographic Information Systems manage, analyze, chart, graph, and map spatial information. GIS can be broken down into two main categories, urban GIS and natural resource GIS. Further still, natural resource GIS can be broken down into six sub-categories, agriculture, forestry, wildlife, catchment management, archaeology, and geology/mining. Agriculture GIS has several applications, such as agricultural capability analysis, land conservation, market analysis, or whole farming planning. Forestry GIs can be used for timber assessment and management, harvest scheduling and planning, environmental impact assessment, and pest management. GIS when used in wildlife applications enables the user to assess and manage habitats, identify and track endangered and rare species, and monitor impact assessment.

  12. Epidemiology of hip fracture: Worldwide geographic variation

    Directory of Open Access Journals (Sweden)

    Dinesh K Dhanwal

    2011-01-01

    Full Text Available Osteoporosis is a major health problem, especially in elderly populations, and is associated with fragility fractures at the hip, spine, and wrist. Hip fracture contributes to both morbidity and mortality in the elderly. The demographics of world populations are set to change, with more elderly living in developing countries, and it has been estimated that by 2050 half of hip fractures will occur in Asia. This review conducted using the PubMed database describes the incidence of hip fracture in different regions of the world and discusses the possible causes of this wide geographic variation. The analysis of data from different studies show a wide geographic variation across the world, with higher hip fracture incidence reported from industrialized countries as compared to developing countries. The highest hip fracture rates are seen in North Europe and the US and lowest in Latin America and Africa. Asian countries such as Kuwait, Iran, China, and Hong Kong show intermediate hip fracture rates. There is also a north-south gradient seen in European studies, and more fractures are seen in the north of the US than in the south. The factors responsible of this variation are population demographics (with more elderly living in countries with higher incidence rates and the influence of ethnicity, latitude, and environmental factors. The understanding of this changing geographic variation will help policy makers to develop strategies to reduce the burden of hip fractures in developing countries such as India, which will face the brunt of this problem over the coming decades.

  13. Virtual Globe Games for Geographic Learning

    Directory of Open Access Journals (Sweden)

    Ola Ahlqvist

    2010-02-01

    Full Text Available Virtual, online maps and globes allow for volunteered geographic information to capitalize on users as sensors and generate unprecedented access to information resources and services. These new "Web 2.0" applications will probably dominate development and use of virtual globes and maps in the near future. We present an experimental platform that integrates an existing virtual globe interface with added functionality as follows; an interactive layer on top of the existing map that support real time creation and manipulation of spatial interaction objects. These objects, together with the existing information delivered through the virtual globe, form a game board that can be used for educational purposes.

  14. House Prices, Geographical Mobility, and Unemployment

    DEFF Research Database (Denmark)

    Ingholt, Marcus Mølbak

    2017-01-01

    Geographical mobility correlates positively with house prices and negatively with unemployment over the U.S. business cycle. I present a DSGE model in which declining house prices and tight credit conditions impede the mobility of indebted workers. This reduces the workers’ cross-area competition...... for jobs, causing wages and unemployment to rise. A Bayesian estimation shows that this channel more than quadruples the response of unemployment to adverse housing market shocks. The estimation also shows that adverse housing market shocks caused the decline in mobility during the Great Recession. Absent...

  15. Studying the making of geographical knowledge

    DEFF Research Database (Denmark)

    Adriansen, Hanne Kirstine; Madsen, Lene Møller

    2009-01-01

    The article addresses the issue of being a ‘double' insider when conducting interviews. Double insider means being an insider both in relation to one's research matter - in the authors' case the making of geographical knowledge - and in relation to one's interviewees - our colleagues. The article...... is a reflection paper in the sense that we reflect upon experiences drawn from a previous research project carried out in Danish academia. It is important that the project was situated in a Scandinavian workplace culture because this has bearings for the social, cultural, and economic situation in which knowledge...

  16. Racial and geographic variation in coronary heart disease mortality trends

    Directory of Open Access Journals (Sweden)

    Gillum Richard F

    2012-06-01

    Full Text Available Abstract Background Magnitudes, geographic and racial variation in trends in coronary heart disease (CHD mortality within the US require updating for health services and health disparities research. Therefore the aim of this study is to present data on these trends through 2007. Methods Data for CHD were analyzed using the US mortality files for 1999–2007 obtained from the US Centers for Disease Control and Prevention. Age-adjusted annual death rates were computed for non-Hispanic African Americans (AA and European Americans (EA aged 35–84 years. The direct method was used to standardize rates by age, using the 2000 US standard population. Joinpoint regression models were used to evaluate trends, expressed as annual percent change (APC. Results For both AA men and women the magnitude in CHD mortality is higher compared to EA men and women, respectively. Between 1999 and 2007 the rate declined both in AA and in EA of both sexes in every geographic division; however, relative declines varied. For example, among men, relative average annual declines ranged from 3.2% to 4.7% in AA and from 4.4% to 5.5% in EA among geographic divisions. In women, rates declined more in later years of the decade and in women over 54 years. In 2007, age-adjusted death rate per 100,000 for CHD ranged from 93 in EA women in New England to 345 in AA men in the East North Central division. In EA, areas near the Ohio and lower Mississippi Rivers had above average rates. Disparities in trends by urbanization level were also found. For AA in the East North Central division, the APC was similar in large central metro (−4.2, large fringe metro (−4.3, medium metro urbanization strata (−4.4, and small metro (−3.9. APC was somewhat higher in the micropolitan/non-metro (−5.3, and especially the non-core/non-metro (−6.5. For EA in the East South Central division, the APC was higher in large central metro (−5.3, large fringe metro (−4.3 and medium metro

  17. Human-geographical concept of the regional geodemographic system

    Directory of Open Access Journals (Sweden)

    Kateryna Sehida

    2017-10-01

    Full Text Available The synergetic analysis of geodemographic researches indicates that they can be solved with use of modern technologies of management. according to the theory of a sotsioaktogenez, for this purpose it is necessary to define and formulate accurately the purpose of future phase transition, to construct consistent system of the purposes taking into account own and provided resources, to create executive system, effective from the point of view of optimum use of the available methods (technologies and means of activity, and to control and analyze obtaining result. The analysis of results of social management demands the quantitative description and comparison of real result with his expected model (purpose. The offered concept of geodemographic system of the region on the basis of dissipative structures which treats people, groups of people, society is aimed at the development and functioning of the studied system where the special role belongs to implementation of administrative decisions. In article it is covered the generalized structure of the concept, it is revealed her the purpose, an object subject area. It is defined public and spatial localization of a research, in particular within regional, region and local communities. It is identified geodemographic process as composite human and geographical process as sotsioaktogenez (with determination of stages of motivation, system of the purposes, executive system and result from a line item of society and a family as self-development and self-organization (with determination of the internal and external factors supporting and evolutionary resources, mechanisms as process (information exchange, external and internal adaptation. Methodological approaches (geographical, system, synergy, information, historical, research techniques (the analysis of system indices, simulation of a path of development, the component analysis and evaluation and prognostic simulation are opened. Technological procedures

  18. Cartography and Geographic Information Science in Current Contents

    Directory of Open Access Journals (Sweden)

    Nedjeljko Frančula

    2009-12-01

    Full Text Available The Cartography and Geographic Information Science (CaGIS journal was published as The American Cartographer from 1974 to 1989, after that as Cartography and Geographic Information System, and since then has been published with its current name. It is published by the Cartography and Geographic Information Society, a member of the American Congress on Surveying and Mapping.

  19. Geographic Literacy and Moral Formation among University Students

    Science.gov (United States)

    Bascom, Jonathan

    2011-01-01

    This study extends analysis of geographic literacy further by examining the relationship of geographic knowledge with the primary goal of geographic educators--cultivation of cultural understanding and moral sensitivity for global citizenry. The main aim is to examine contributors to moral formation during the university years based on a survey…

  20. Surveying and Mapping Geographical Information from the Perspective of Geography

    Directory of Open Access Journals (Sweden)

    LÜ Guonian

    2017-10-01

    Full Text Available It briefly reviewed the history of geographic information content development since the existence of geographic information system. It pointed out that the current definition of geographic information is always the extension from the "spatial+ attributes" basic mapping framework of geographic information. It is increasingly difficult to adapt to the analysis and application of spatial-temporal big data. From the perspective of geography research subject and content, it summarized systematically that the content and extension of the "geographic information" that geography needs. It put forward that a six-element expression model of geographic information, including spatial location, semantic description, attribute characteristics, geometric form, evolution process, and objects relationship.Under the guidance of the laws of geography, for geographical phenomenon of spatial distribution, temporal pattern and evolution process, the interaction mechanism of the integrated expression, system analysis and efficient management, it designed that a unified GIS data model which is expressed by six basic elements, a new GIS data structure driven by geographical rules and interaction, and key technologies of unstructured spatio-temporal data organization and storage. It provided that a theoretical basis and technical support for the shift from the surveying and mapping geographic information to the scientific geographic information, and it can help improving the organization, management, analysis and expression ability of the GIS of the geographical laws such as geographical pattern, evolution process, and interaction between elements.

  1. Dynamic management of geographic data in a virtual environment

    NARCIS (Netherlands)

    Jense, G.J.; Donkers, K.

    1996-01-01

    In order to achieve true 3D user interaction with geographic information, an interface between a virtual environment system and a geographic information system has been designed and implemented. This VE/GIS interface is based on a loose coupling of the underlying geographic database and the virtual

  2. Location of radiotherapy centers: An exploratory geographic analysis for Belgium

    International Nuclear Information System (INIS)

    Cotteels, C.; Peeters, D.; Coucke, P.A.; Thomas, I.

    2012-01-01

    Purpose. - The distance between the patient's home and a radiotherapy department may represent a hurdle for the patient and influence treatment choice. Therefore, it is necessary to check whether the geographical distribution of radiotherapy centers is in accordance with cancer incidence, taking also into account the cost of travelling to the radiotherapy department. The objective of this study is double; first, to map the current locations of radiotherapy centers across the country and second, to evaluate the observed spatial disparities with appropriate tools. Materials and methods. - A model of operational research (P-median) is used to suggest the optimal locations and allocations and to compare them with the current situation. This is an exploratory study with simple inputs. It helps to better understand the current geographical distribution of radiotherapy centers in Belgium as well as its possible limitations. Results-conclusion. - It appears that the current situation is on the average acceptable in terms of accessibility to the service and that the method presents huge potentialities for decision making so as to yield a spatial system that is both efficient and equitable. (authors)

  3. Regional Geographic Information Systems of Health and Environmental Monitoring

    Directory of Open Access Journals (Sweden)

    Kurolap Semen A.

    2016-12-01

    Full Text Available The article describes a new scientific and methodological approach to designing geographic information systems of health and environmental monitoring for urban areas. Geographic information systems (GIS are analytical tools of the regional health and environmental monitoring; they are used for an integrated assessment of the environmental status of a large industrial centre or a part of it. The authors analyse the environmental situation in Voronezh, a major industrial city, located in the Central Black Earth Region with a population of more than 1 million people. The proposed research methodology is based on modern approaches to the assessment of health risks caused by adverse environmental conditions. The research work was implemented using a GIS and multicriteria probabilistic and statistical evaluation to identify cause-and-effect links, a combination of action and reaction, in the dichotomy ‘environmental factors — public health’. The analysis of the obtained statistical data confirmed an increase in childhood diseases in some areas of the city. Environmentally induced diseases include congenital malformations, tumors, endocrine and urogenital pathologies. The main factors having an adverse impact on health are emissions of carcinogens into the atmosphere and the negative impact of transport on the environment. The authors identify and characterize environmentally vulnerable parts of the city and developed principles of creating an automated system of health monitoring and control of environmental risks. The article offers a number of measures aimed at the reduction of environmental risks, better protection of public health and a more efficient environmental monitoring.

  4. Interactive segmentation for geographic atrophy in retinal fundus images.

    Science.gov (United States)

    Lee, Noah; Smith, R Theodore; Laine, Andrew F

    2008-10-01

    Fundus auto-fluorescence (FAF) imaging is a non-invasive technique for in vivo ophthalmoscopic inspection of age-related macular degeneration (AMD), the most common cause of blindness in developed countries. Geographic atrophy (GA) is an advanced form of AMD and accounts for 12-21% of severe visual loss in this disorder [3]. Automatic quantification of GA is important for determining disease progression and facilitating clinical diagnosis of AMD. The problem of automatic segmentation of pathological images still remains an unsolved problem. In this paper we leverage the watershed transform and generalized non-linear gradient operators for interactive segmentation and present an intuitive and simple approach for geographic atrophy segmentation. We compare our approach with the state of the art random walker [5] algorithm for interactive segmentation using ROC statistics. Quantitative evaluation experiments on 100 FAF images show a mean sensitivity/specificity of 98.3/97.7% for our approach and a mean sensitivity/specificity of 88.2/96.6% for the random walker algorithm.

  5. Geographic wormhole detection in wireless sensor networks.

    Directory of Open Access Journals (Sweden)

    Mehdi Sookhak

    Full Text Available Wireless sensor networks (WSNs are ubiquitous and pervasive, and therefore; highly susceptible to a number of security attacks. Denial of Service (DoS attack is considered the most dominant and a major threat to WSNs. Moreover, the wormhole attack represents one of the potential forms of the Denial of Service (DoS attack. Besides, crafting the wormhole attack is comparatively simple; though, its detection is nontrivial. On the contrary, the extant wormhole defense methods need both specialized hardware and strong assumptions to defend against static and dynamic wormhole attack. The ensuing paper introduces a novel scheme to detect wormhole attacks in a geographic routing protocol (DWGRP. The main contribution of this paper is to detect malicious nodes and select the best and the most reliable neighbors based on pairwise key pre-distribution technique and the beacon packet. Moreover, this novel technique is not subject to any specific assumption, requirement, or specialized hardware, such as a precise synchronized clock. The proposed detection method is validated by comparisons with several related techniques in the literature, such as Received Signal Strength (RSS, Authentication of Nodes Scheme (ANS, Wormhole Detection uses Hound Packet (WHOP, and Wormhole Detection with Neighborhood Information (WDI using the NS-2 simulator. The analysis of the simulations shows promising results with low False Detection Rate (FDR in the geographic routing protocols.

  6. Geographic Gossip: Efficient Averaging for Sensor Networks

    Science.gov (United States)

    Dimakis, Alexandros D. G.; Sarwate, Anand D.; Wainwright, Martin J.

    Gossip algorithms for distributed computation are attractive due to their simplicity, distributed nature, and robustness in noisy and uncertain environments. However, using standard gossip algorithms can lead to a significant waste in energy by repeatedly recirculating redundant information. For realistic sensor network model topologies like grids and random geometric graphs, the inefficiency of gossip schemes is related to the slow mixing times of random walks on the communication graph. We propose and analyze an alternative gossiping scheme that exploits geographic information. By utilizing geographic routing combined with a simple resampling method, we demonstrate substantial gains over previously proposed gossip protocols. For regular graphs such as the ring or grid, our algorithm improves standard gossip by factors of $n$ and $\\sqrt{n}$ respectively. For the more challenging case of random geometric graphs, our algorithm computes the true average to accuracy $\\epsilon$ using $O(\\frac{n^{1.5}}{\\sqrt{\\log n}} \\log \\epsilon^{-1})$ radio transmissions, which yields a $\\sqrt{\\frac{n}{\\log n}}$ factor improvement over standard gossip algorithms. We illustrate these theoretical results with experimental comparisons between our algorithm and standard methods as applied to various classes of random fields.

  7. Plants and geographical names in Croatia.

    Science.gov (United States)

    Cargonja, Hrvoje; Daković, Branko; Alegro, Antun

    2008-09-01

    The main purpose of this paper is to present some general observations, regularities and insights into a complex relationship between plants and people through symbolic systems like geographical names on the territory of Croatia. The basic sources of data for this research were maps from atlas of Croatia of the scale 1:100000. Five groups of maps or areas were selected in order to represent main Croatian phytogeographic regions. A selection of toponyms from each of the map was made in which the name for a plant in Croatian language was recognized (phytotoponyms). Results showed that of all plant names recognized in geographical names the most represented are trees, and among them birch and oak the most. Furthermore, an attempt was made to explain the presence of the most represented plant species in the phytotoponyms in the light of general phytogeographical and sociocultural differences and similarities of comparing areas. The findings confirm an expectation that the genera of climazonal vegetation of particular area are the most represented among the phytotoponyms. Nevertheless, there are ample examples where representation of a plant name in the names of human environment can only be ascribed to ethno-linguistic and socio-cultural motives. Despite the reductionist character of applied methodology, this research also points out some advantages of this approach for ethnobotanic and ethnolinguistic studies of greater areas of human environment.

  8. Geographically weighted regression model on poverty indicator

    Science.gov (United States)

    Slamet, I.; Nugroho, N. F. T. A.; Muslich

    2017-12-01

    In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.

  9. Community structure informs species geographic distributions

    KAUST Repository

    Montesinos-Navarro, Alicia

    2018-05-23

    Understanding what determines species\\' geographic distributions is crucial for assessing global change threats to biodiversity. Measuring limits on distributions is usually, and necessarily, done with data at large geographic extents and coarse spatial resolution. However, survival of individuals is determined by processes that happen at small spatial scales. The relative abundance of coexisting species (i.e. \\'community structure\\') reflects assembly processes occurring at small scales, and are often available for relatively extensive areas, so could be useful for explaining species distributions. We demonstrate that Bayesian Network Inference (BNI) can overcome several challenges to including community structure into studies of species distributions, despite having been little used to date. We hypothesized that the relative abundance of coexisting species can improve predictions of species distributions. In 1570 assemblages of 68 Mediterranean woody plant species we used BNI to incorporate community structure into Species Distribution Models (SDMs), alongside environmental information. Information on species associations improved SDM predictions of community structure and species distributions moderately, though for some habitat specialists the deviance explained increased by up to 15%. We demonstrate that most species associations (95%) were positive and occurred between species with ecologically similar traits. This suggests that SDM improvement could be because species co-occurrences are a proxy for local ecological processes. Our study shows that Bayesian Networks, when interpreted carefully, can be used to include local conditions into measurements of species\\' large-scale distributions, and this information can improve the predictions of species distributions.

  10. Using Metadata to Build Geographic Information Sharing Environment on Internet

    Directory of Open Access Journals (Sweden)

    Chih-hong Sun

    1999-12-01

    Full Text Available Internet provides a convenient environment to share geographic information. Web GIS (Geographic Information System even provides users a direct access environment to geographic databases through Internet. However, the complexity of geographic data makes it difficult for users to understand the real content and the limitation of geographic information. In some cases, users may misuse the geographic data and make wrong decisions. Meanwhile, geographic data are distributed across various government agencies, academic institutes, and private organizations, which make it even more difficult for users to fully understand the content of these complex data. To overcome these difficulties, this research uses metadata as a guiding mechanism for users to fully understand the content and the limitation of geographic data. We introduce three metadata standards commonly used for geographic data and metadata authoring tools available in the US. We also review the current development of geographic metadata standard in Taiwan. Two metadata authoring tools are developed in this research, which will enable users to build their own geographic metadata easily.[Article content in Chinese

  11. Remote sensing research in geographic education: An alternative view

    Science.gov (United States)

    Wilson, H.; Cary, T. K.; Goward, S. N.

    1981-01-01

    It is noted that within many geography departments remote sensing is viewed as a mere technique a student should learn in order to carry out true geographic research. This view inhibits both students and faculty from investigation of remotely sensed data as a new source of geographic knowledge that may alter our understanding of the Earth. The tendency is for geographers to accept these new data and analysis techniques from engineers and mathematicians without questioning the accompanying premises. This black-box approach hinders geographic applications of the new remotely sensed data and limits the geographer's contribution to further development of remote sensing observation systems. It is suggested that geographers contribute to the development of remote sensing through pursuit of basic research. This research can be encouraged, particularly among students, by demonstrating the links between geographic theory and remotely sensed observations, encouraging a healthy skepticism concerning the current understanding of these data.

  12. Geographic profiling survey : a preliminary examination of geographic profilers' views and experiences

    NARCIS (Netherlands)

    Emeno, Karla; Bennell, Craig; Snook, Brent; Taylor, Paul Jonathon

    Geographic profiling (GP) is an investigative technique that involves predicting a serial offender?s home location (or some other anchor point) based on where he or she committed a crime. Although the use of GP in police investigations appears to be on the rise, little is known about the procedure

  13. The geographic applications program of the U. S. Geological Survey

    Science.gov (United States)

    Gerlach, Arch C.

    1969-01-01

    The fundamental objective of modern Geography is to improve man's level of living through a better understanding of man-environment inter actions. Related goals of the USGS program for applications of remote sensor data to Geographical research are: (1) the analysis and improvement of land use, with special emphasis on urban problems; and (2) more effective use of the total available energy budget, including insolation, mineral fuels, atomic energy, human resources, and mental energy, all of which are integrated into man-environment interactions. The collection of data through remote sensors in air craft and spacecraft is financed largely by funds from NASA, and is part of the much broader EROS Program of the Department of the Interior. Results to date have achieved much toward the identification of remote sensor signatures for Earth features and human activities, and toward evaluation of instruments for collecting essential information.

  14. The Significant Surface-Water Connectivity of “Geographically Isolated Wetlands”

    Science.gov (United States)

    We evaluated the current literature, coupled with our collective research expertise, on surface-water connectivity of wetlands considered to be “geographically isolated” (sensu Tiner Wetlands 23:494–516, 2003a) to critically assess the scientific foundation of g...

  15. Geographic variation in forest composition and precipitation predict the synchrony of forest insect outbreaks

    Science.gov (United States)

    Kyle J. Haynes; Andrew M. Liebhold; Ottar N. Bjørnstad; Andrew J. Allstadt; Randall S. Morin

    2018-01-01

    Evaluating the causes of spatial synchrony in population dynamics in nature is notoriously difficult due to a lack of data and appropriate statistical methods. Here, we use a recently developed method, a multivariate extension of the local indicators of spatial autocorrelation statistic, to map geographic variation in the synchrony of gypsy moth outbreaks. Regression...

  16. What Influences Geography Teachers' Usage of Geographic Information Systems? A Structural Equation Analysis

    Science.gov (United States)

    Lay, Jinn-Guey; Chi, Yu-Lin; Hsieh, Yeu-Sheng; Chen, Yu-Wen

    2013-01-01

    Understanding the usage of the geographic information system (GIS) among geography teachers is a crucial step in evaluating the current dissemination of GIS knowledge and skills in Taiwan's educational system. The primary contribution of this research is to further our understanding of the factors that affect teachers' GIS usage. The structural…

  17. Geographic Information Systems and Web Page Development

    Science.gov (United States)

    Reynolds, Justin

    2004-01-01

    The Facilities Engineering and Architectural Branch is responsible for the design and maintenance of buildings, laboratories, and civil structures. In order to improve efficiency and quality, the FEAB has dedicated itself to establishing a data infrastructure based on Geographic Information Systems, GIs. The value of GIS was explained in an article dating back to 1980 entitled "Need for a Multipurpose Cadastre which stated, "There is a critical need for a better land-information system in the United States to improve land-conveyance procedures, furnish a basis for equitable taxation, and provide much-needed information for resource management and environmental planning." Scientists and engineers both point to GIS as the solution. What is GIS? According to most text books, Geographic Information Systems is a class of software that stores, manages, and analyzes mapable features on, above, or below the surface of the earth. GIS software is basically database management software to the management of spatial data and information. Simply put, Geographic Information Systems manage, analyze, chart, graph, and map spatial information. At the outset, I was given goals and expectations from my branch and from my mentor with regards to the further implementation of GIs. Those goals are as follows: (1) Continue the development of GIS for the underground structures. (2) Extract and export annotated data from AutoCAD drawing files and construct a database (to serve as a prototype for future work). (3) Examine existing underground record drawings to determine existing and non-existing underground tanks. Once this data was collected and analyzed, I set out on the task of creating a user-friendly database that could be assessed by all members of the branch. It was important that the database be built using programs that most employees already possess, ruling out most AutoCAD-based viewers. Therefore, I set out to create an Access database that translated onto the web using Internet

  18. Geographic analysis of shigellosis in Vietnam.

    Science.gov (United States)

    Kim, Deok Ryun; Ali, Mohammad; Thiem, Vu Dinh; Park, Jin-Kyung; von Seidlein, Lorenz; Clemens, John

    2008-12-01

    Geographic and ecological analysis may provide investigators useful ecological information for the control of shigellosis. This paper provides distribution of individual Shigella species in space, and ecological covariates for shigellosis in Nha Trang, Vietnam. Data on shigellosis in neighborhoods were used to identify ecological covariates. A Bayesian hierarchical model was used to obtain joint posterior distribution of model parameters and to construct smoothed risk maps for shigellosis. Neighborhoods with a high proportion of worshippers of traditional religion, close proximity to hospital, or close proximity to the river had increased risk for shigellosis. The ecological covariates associated with Shigella flexneri differed from the covariates for Shigella sonnei. In contrast the spatial distribution of the two species was similar. The disease maps can help identify high-risk areas of shigellosis that can be targeted for interventions. This approach may be useful for the selection of populations and the analysis of vaccine trials.

  19. Geographic delivery models for radiotherapy services

    International Nuclear Information System (INIS)

    Roberts, G.H.; Dunscombe, P.B.; Samant, R.S.

    2002-01-01

    The study described here was undertaken to quantify the societal cost of radiotherapy in idealized urban and rural populations and, hence, to generate a measure of impediment to access. The costs of centralized, distributed comprehensive and satellite radiotherapy delivery formats were examined by decomposing them into institutional, productivity and geographical components. Our results indicate that centralized radiotherapy imposes the greatest financial burden on the patient population in both urban and rural scenarios. The financial burden faced by patients who must travel for radiotherapy can be interpreted as one component of the overall impediment to access. With advances in remote-monitoring systems, it is possible to maintain technical quality while enhancing patient access. However, the maintenance of professional competence will remain a challenge with a distributed service-delivery format. Copyright (2002) Blackwell Science Pty Ltd

  20. Comprehensive Monitoring for Heterogeneous Geographically Distributed Storage

    Energy Technology Data Exchange (ETDEWEB)

    Ratnikova, N. [Fermilab; Karavakis, E. [CERN; Lammel, S. [Fermilab; Wildish, T. [Princeton U.

    2015-12-23

    Storage capacity at CMS Tier-1 and Tier-2 sites reached over 100 Petabytes in 2014, and will be substantially increased during Run 2 data taking. The allocation of storage for the individual users analysis data, which is not accounted as a centrally managed storage space, will be increased to up to 40%. For comprehensive tracking and monitoring of the storage utilization across all participating sites, CMS developed a space monitoring system, which provides a central view of the geographically dispersed heterogeneous storage systems. The first prototype was deployed at pilot sites in summer 2014, and has been substantially reworked since then. In this paper we discuss the functionality and our experience of system deployment and operation on the full CMS scale.

  1. Genomic evaluations with many more genotypes

    Directory of Open Access Journals (Sweden)

    Wiggans George R

    2011-03-01

    Full Text Available Abstract Background Genomic evaluations in Holstein dairy cattle have quickly become more reliable over the last two years in many countries as more animals have been genotyped for 50,000 markers. Evaluations can also include animals genotyped with more or fewer markers using new tools such as the 777,000 or 2,900 marker chips recently introduced for cattle. Gains from more markers can be predicted using simulation, whereas strategies to use fewer markers have been compared using subsets of actual genotypes. The overall cost of selection is reduced by genotyping most animals at less than the highest density and imputing their missing genotypes using haplotypes. Algorithms to combine different densities need to be efficient because numbers of genotyped animals and markers may continue to grow quickly. Methods Genotypes for 500,000 markers were simulated for the 33,414 Holsteins that had 50,000 marker genotypes in the North American database. Another 86,465 non-genotyped ancestors were included in the pedigree file, and linkage disequilibrium was generated directly in the base population. Mixed density datasets were created by keeping 50,000 (every tenth of the markers for most animals. Missing genotypes were imputed using a combination of population haplotyping and pedigree haplotyping. Reliabilities of genomic evaluations using linear and nonlinear methods were compared. Results Differing marker sets for a large population were combined with just a few hours of computation. About 95% of paternal alleles were determined correctly, and > 95% of missing genotypes were called correctly. Reliability of breeding values was already high (84.4% with 50,000 simulated markers. The gain in reliability from increasing the number of markers to 500,000 was only 1.6%, but more than half of that gain resulted from genotyping just 1,406 young bulls at higher density. Linear genomic evaluations had reliabilities 1.5% lower than the nonlinear evaluations with 50

  2. Evaluación multicriterio de la exposición al riesgo ambiental mediante un sistema de información geográfica en Argentina Multicriteria evaluation of environmental risk exposure using a geographic information system in Argentina

    Directory of Open Access Journals (Sweden)

    Diana De Pietri

    2011-10-01

    Full Text Available OBJETIVO: Elaborar un modelo espacial que integre los factores ambientales que constituyen una amenaza para la salud, de aplicación en la cuenca del río Matanza-Riachuelo (CMR. MÉTODOS: Se implementaron procedimientos de evaluación multicriterio en el entorno de los sistemas de información geográfica para obtener una zonificación del territorio basada en grados de aptitud para residir. Se georreferenciaron variables que caracterizan las condiciones de habitabilidad de las viviendas y las posibles fuentes de contaminación de la cuenca. Se extrajo información de salud de la Encuesta de Factores de Riesgo (EFARS para medir el riesgo relativo de vivir en zonas no aptas (población expuesta en relación con las zonas aptas (población no expuesta. RESULTADOS: La CMR presenta 60% de su superficie en condición de aptitud, situación que afecta a 40% de la población residente. El resto de la población habita en un territorio no apto, y 6% se encuentra en la condición más desfavorable de la cuenca. Las condiciones ambientales adversas para la salud presentes en las zonas no aptas se hicieron manifiestas en el estado de salud de los entrevistados a través de tres de las patologías contempladas: diarreas, enfermedades respiratorias y cáncer. CONCLUSIONES: Se obtuvo un diagnóstico regional válido como información de apoyo en la toma de decisiones. La consideración de la cuenca como una unidad de análisis permitió establecer un único protocolo para medir la magnitud del riesgo en forma integral y, de esta manera, establecer prioridades.OBJECTIVE: Develop a spatial model that includes environmental factors posing a health hazard, for application in the Matanza-Riachuelo River Basin (MRB in Argentina. METHODS: Multicriteria evaluation procedures were used with geographic information systems to obtain territorial zoning based on the degree of suitability for residence. Variables that characterize the habitability of housing and potential

  3. Family-based Association Analyses of Imputed Genotypes Reveal Genome-Wide Significant Association of Alzheimer’s disease with OSBPL6, PTPRG and PDCL3

    Science.gov (United States)

    Herold, Christine; Hooli, Basavaraj V.; Mullin, Kristina; Liu, Tian; Roehr, Johannes T; Mattheisen, Manuel; Parrado, Antonio R.; Bertram, Lars; Lange, Christoph; Tanzi, Rudolph E.

    2015-01-01

    The genetic basis of Alzheimer's disease (AD) is complex and heterogeneous. Over 200 highly penetrant pathogenic variants in the genes APP, PSEN1 and PSEN2 cause a subset of early-onset familial Alzheimer's disease (EOFAD). On the other hand, susceptibility to late-onset forms of AD (LOAD) is indisputably associated to the ε4 allele in the gene APOE, and more recently to variants in more than two-dozen additional genes identified in the large-scale genome-wide association studies (GWAS) and meta-analyses reports. Taken together however, although the heritability in AD is estimated to be as high as 80%, a large proportion of the underlying genetic factors still remain to be elucidated. In this study we performed a systematic family-based genome-wide association and meta-analysis on close to 15 million imputed variants from three large collections of AD families (~3,500 subjects from 1,070 families). Using a multivariate phenotype combining affection status and onset age, meta-analysis of the association results revealed three single nucleotide polymorphisms (SNPs) that achieved genome-wide significance for association with AD risk: rs7609954 in the gene PTPRG (P-value = 3.98·10−08), rs1347297 in the gene OSBPL6 (P-value = 4.53·10−08), and rs1513625 near PDCL3 (P-value = 4.28·10−08). In addition, rs72953347 in OSBPL6 (P-value = 6.36·10−07) and two SNPs in the gene CDKAL1 showed marginally significant association with LOAD (rs10456232, P-value: 4.76·10−07; rs62400067, P-value: 3.54·10−07). In summary, family-based GWAS meta-analysis of imputed SNPs revealed novel genomic variants in (or near) PTPRG, OSBPL6, and PDCL3 that influence risk for AD with genome-wide significance. PMID:26830138

  4. Determination of the Geographical Origin of All Commercial Hake Species by Stable Isotope Ratio (SIR) Analysis.

    Science.gov (United States)

    Carrera, Mónica; Gallardo, José M

    2017-02-08

    The determination of the geographical origin of food products is relevant to comply with the legal regulations of traceability, to avoid food fraud, and to guarantee food quality and safety to the consumers. For these reasons, stable isotope ratio (SIR) analysis using an isotope ratio mass spectrometry (IRMS) instrument is one of the most useful techniques for evaluating food traceability and authenticity. The present study was aimed to determine, for the first time, the geographical origin for all commercial fish species belonging to the Merlucciidae family using SIR analysis of carbon (δ 13 C) and nitrogen (δ 15 N). The specific results enabled their clear classification according to the FAO (Food and Agriculture Organization of the United Nations) fishing areas, latitude, and geographical origin in the following six different clusters: European, North African, South African, North American, South American, and Australian hake species.

  5. An Approach to Measuring Semantic Relatedness of Geographic Terminologies Using a Thesaurus and Lexical Database Sources

    Directory of Open Access Journals (Sweden)

    Zugang Chen

    2018-03-01

    Full Text Available In geographic information science, semantic relatedness is important for Geographic Information Retrieval (GIR, Linked Geospatial Data, geoparsing, and geo-semantics. But computing the semantic similarity/relatedness of geographic terminology is still an urgent issue to tackle. The thesaurus is a ubiquitous and sophisticated knowledge representation tool existing in various domains. In this article, we combined the generic lexical database (WordNet or HowNet with the Thesaurus for Geographic Science and proposed a thesaurus–lexical relatedness measure (TLRM to compute the semantic relatedness of geographic terminology. This measure quantified the relationship between terminologies, interlinked the discrete term trees by using the generic lexical database, and realized the semantic relatedness computation of any two terminologies in the thesaurus. The TLRM was evaluated on a new relatedness baseline, namely, the Geo-Terminology Relatedness Dataset (GTRD which was built by us, and the TLRM obtained a relatively high cognitive plausibility. Finally, we applied the TLRM on a geospatial data sharing portal to support data retrieval. The application results of the 30 most frequently used queries of the portal demonstrated that using TLRM could improve the recall of geospatial data retrieval in most situations and rank the retrieval results by the matching scores between the query of users and the geospatial dataset.

  6. The Effects of Geographic Isolation and Social Support on the Health of Wisconsin Women.

    Science.gov (United States)

    Tittman, Sarah M; Harteau, Christy; Beyer, Kirsten M M

    2016-04-01

    Rural residents are less likely to receive preventive health screening, more likely to be uninsured, and more likely to report fair to poor health than urban residents. Social disconnectedness and perceived isolation are known to be negative predictors of self-rated physical health; however, the direct effects of geographic isolation and social support on overall health have not been well elucidated. A cross-sectional survey of women (n = 113) participating in Wisconsin Rural Women's initiative programming was conducted, which included measures of geographic isolation, an assessment of overall health, and social support using the validated Interpersonal Support Evaluation List with 3 subscales, including belonging support, tangible support, and appraisal support. Geographic isolation was shown to be a negative predictor of belonging support (P = .0064) and tangible support (P = .0349); however, geographic isolation was not a statistically significant predictor of appraisal support. A strong and direct relationship was observed between social support and self-perceived health status among this population of Wisconsin women, and hospital access based on geographic proximity was positively correlated (P = .028) with overall health status. The direct relationship between social support and overall health demonstrated here stresses the importance of developing and maintaining strong social support networks, which can be improved through rural support groups that have the unique ability to assist rural residents in fostering social support systems, advocating stress management techniques, and achieving a greater sense of well-being.

  7. A geographic analysis of wind turbine placement in Northern California

    International Nuclear Information System (INIS)

    Rodman, Laura C.; Meentemeyer, Ross K.

    2006-01-01

    The development of new wind energy projects requires a significant consideration of land use issues. An analytic framework using a Geographic Information System (GIS) was developed to evaluate site suitability for wind turbines and to predict the locations and extent of land available for feasible wind power development. The framework uses rule-based spatial analysis to evaluate different scenarios. The suitability criteria include physical requirements as well as environmental and human impact factors. By including socio-political concerns, this technique can assist in forecasting the acceptance level of wind farms by the public. The analysis was used to evaluate the nine-county region of the Greater San Francisco Bay Area. The model accurately depicts areas where large-scale wind farms have been developed or proposed. It also shows that there are many locations available in the Bay Area for the placement of smaller-scale wind turbines. The framework has application to other regions where future wind farm development is proposed. This information can be used by energy planners to predict the extent that wind energy can be developed based on land availability and public perception

  8. Geographic variation in health insurance benefit in Qianjiang District, China

    OpenAIRE

    Ye, Ting; Wu, Yue; Zhang, Liang

    2017-01-01

    Background: Health insurance coverage is of great importance; yet, it is unclear whether there is some geographic variation in health insurance benefit for urban and rural patients covered by a same basic health insurance, especially in China.Objective: To identify the potential geographic variation in health insurance benefit and its possible socioeconomic and geographical factors at the town level.Methods: All the beneficiaries underthe health insurance who had the in-hospital experience in...

  9. Generalisation of geographic information cartographic modelling and applications

    CERN Document Server

    Mackaness, William A; Sarjakoski, L Tiina

    2011-01-01

    Theoretical and Applied Solutions in Multi Scale MappingUsers have come to expect instant access to up-to-date geographical information, with global coverage--presented at widely varying levels of detail, as digital and paper products; customisable data that can readily combined with other geographic information. These requirements present an immense challenge to those supporting the delivery of such services (National Mapping Agencies (NMA), Government Departments, and private business. Generalisation of Geographic Information: Cartographic Modelling and Applications provides detailed review

  10. Evolution of research in health geographics through the International Journal of Health Geographics (2002-2015).

    Science.gov (United States)

    Pérez, Sandra; Laperrière, Vincent; Borderon, Marion; Padilla, Cindy; Maignant, Gilles; Oliveau, Sébastien

    2016-01-20

    Health geographics is a fast-developing research area. Subjects broached in scientific literature are most varied, ranging from vectorial diseases to access to healthcare, with a recent revival of themes such as the implication of health in the Smart City, or a predominantly individual-centered approach. Far beyond standard meta-analyses, the present study deliberately adopts the standpoint of questioning space in its foundations, through various authors of the International Journal of Health Geographics, a highly influential journal in that field. The idea is to find space as the common denominator in this specialized literature, as well as its relation to spatial analysis, without for all that trying to tend towards exhaustive approaches. 660 articles have being published in the journal since launch, but 359 articles were selected based on the presence of the word "Space" in either the title, or the abstract or the text over 13 years of the journal's existence. From that database, a lexical analysis (tag cloud) reveals the perception of space in literature, and shows how approaches are evolving, thus underlining that the scope of health geographics is far from narrowing.

  11. A review of geographic variation and Geographic Information Systems (GIS) applications in prescription drug use research.

    Science.gov (United States)

    Wangia, Victoria; Shireman, Theresa I

    2013-01-01

    While understanding geography's role in healthcare has been an area of research for over 40 years, the application of geography-based analyses to prescription medication use is limited. The body of literature was reviewed to assess the current state of such studies to demonstrate the scale and scope of projects in order to highlight potential research opportunities. To review systematically how researchers have applied geography-based analyses to medication use data. Empiric, English language research articles were identified through PubMed and bibliographies. Original research articles were independently reviewed as to the medications or classes studied, data sources, measures of medication exposure, geographic units of analysis, geospatial measures, and statistical approaches. From 145 publications matching key search terms, forty publications met the inclusion criteria. Cardiovascular and psychotropic classes accounted for the largest proportion of studies. Prescription drug claims were the primary source, and medication exposure was frequently captured as period prevalence. Medication exposure was documented across a variety of geopolitical units such as countries, provinces, regions, states, and postal codes. Most results were descriptive and formal statistical modeling capitalizing on geospatial techniques was rare. Despite the extensive research on small area variation analysis in healthcare, there are a limited number of studies that have examined geographic variation in medication use. Clearly, there is opportunity to collaborate with geographers and GIS professionals to harness the power of GIS technologies and to strengthen future medication studies by applying more robust geospatial statistical methods. Copyright © 2013 Elsevier Inc. All rights reserved.

  12. Geographic Variation in Characteristics of Postpartum Women Using Female Sterilization.

    Science.gov (United States)

    White, Kari; Potter, Joseph E; Zite, Nikki

    2015-01-01

    Southern states have higher rates of female sterilization compared with other areas of the United States, and the reasons for this are not well understood. We examined whether low-income and racial/ethnic minority women, who were previous targets of coercive practices, disproportionately report using sterilization in the South. We used data from 12 states participating in the Pregnancy Risk Assessment Monitoring System that collected information on women's contraceptive method use between 2006 and 2009. We categorized states according to geographic region: South, Midwest/West, and Northeast. Within each region, we computed the percentage of women using sterilization according to their demographic and obstetric characteristics and estimated multivariable-adjusted prevalence ratios to evaluate whether the same characteristics were associated with sterilization use. The percentage of postpartum women using sterilization ranged from 5.0% to 9.9% in the Northeast, 8.9% to 10.6% in the Midwest/West, and 11.6% to 22.4% in the South. Women in nearly all subgroups in Southern states were more likely to use sterilization than women in the Northeast. After multivariable adjustment, there were no differences in the prevalence of sterilization for Blacks compared with Whites in the Northeast (0.76; 95% CI, 0.55-1.06), Midwest/West (0.91; 95% CI, 0.80-1.04), and South (0.96; 95% CI, 0.85-1.07). Women with Medicaid-paid deliveries (vs. private insurance) had a higher prevalence of sterilization in all regions (p sterilization at disproportionately higher rates compared with other regions, and suggest that other differences, such as social norms and family planning policies, may contribute to this geographic variation. Copyright © 2015 Jacobs Institute of Women's Health. Published by Elsevier Inc. All rights reserved.

  13. A note on the relationships between multiple imputation, maximum likelihood and fully Bayesian methods for missing responses in linear regression models.

    Science.gov (United States)

    Chen, Qingxia; Ibrahim, Joseph G

    2014-07-01

    Multiple Imputation, Maximum Likelihood and Fully Bayesian methods are the three most commonly used model-based approaches in missing data problems. Although it is easy to show that when the responses are missing at random (MAR), the complete case analysis is unbiased and efficient, the aforementioned methods are still commonly used in practice for this setting. To examine the performance of and relationships between these three methods in this setting, we derive and investigate small sample and asymptotic expressions of the estimates and standard errors, and fully examine how these estimates are related for the three approaches in the linear regression model when the responses are MAR. We show that when the responses are MAR in the linear model, the estimates of the regression coefficients using these three methods are asymptotically equivalent to the complete case estimates under general conditions. One simulation and a real data set from a liver cancer clinical trial are given to compare the properties of these methods when the responses are MAR.

  14. Do geographically isolated wetlands influence landscape functions?

    Science.gov (United States)

    Cohen, Matthew J.; Creed, Irena F.; Alexander, Laurie C.; Basu, Nandita; Calhoun, Aram J.K.; Craft, Christopher; D’Amico, Ellen; DeKeyser, Edward S.; Fowler, Laurie; Golden, Heather E.; Jawitz, James W.; Kalla, Peter; Kirkman, L. Katherine; Lane, Charles R.; Lang, Megan; Leibowitz, Scott G.; Lewis, David Bruce; Marton, John; McLaughlin, Daniel L.; Mushet, David M.; Raanan-Kiperwas, Hadas; Rains, Mark C.; Smith, Lora; Walls, Susan C.

    2015-01-01

    Geographically isolated wetlands (GIWs), those surrounded by uplands, exchange materials, energy, and organisms with other elements in hydrological and habitat networks, contributing to landscape functions, such as flow generation, nutrient and sediment retention, and biodiversity support. GIWs constitute most of the wetlands in many North American landscapes, provide a disproportionately large fraction of wetland edges where many functions are enhanced, and form complexes with other water bodies to create spatial and temporal heterogeneity in the timing, flow paths, and magnitude of network connectivity. These attributes signal a critical role for GIWs in sustaining a portfolio of landscape functions, but legal protections remain weak despite preferential loss from many landscapes. GIWs lack persistent surface water connections, but this condition does not imply the absence of hydrological, biogeochemical, and biological exchanges with nearby and downstream waters. Although hydrological and biogeochemical connectivity is often episodic or slow (e.g., via groundwater), hydrologic continuity and limited evaporative solute enrichment suggest both flow generation and solute and sediment retention. Similarly, whereas biological connectivity usually requires overland dispersal, numerous organisms, including many rare or threatened species, use both GIWs and downstream waters at different times or life stages, suggesting that GIWs are critical elements of landscape habitat mosaics. Indeed, weaker hydrologic connectivity with downstream waters and constrained biological connectivity with other landscape elements are precisely what enhances some GIW functions and enables others. Based on analysis of wetland geography and synthesis of wetland functions, we argue that sustaining landscape functions requires conserving the entire continuum of wetland connectivity, including GIWs.

  15. Geographic Hotspots of Critical National Infrastructure.

    Science.gov (United States)

    Thacker, Scott; Barr, Stuart; Pant, Raghav; Hall, Jim W; Alderson, David

    2017-12-01

    Failure of critical national infrastructures can result in major disruptions to society and the economy. Understanding the criticality of individual assets and the geographic areas in which they are located is essential for targeting investments to reduce risks and enhance system resilience. Within this study we provide new insights into the criticality of real-life critical infrastructure networks by integrating high-resolution data on infrastructure location, connectivity, interdependence, and usage. We propose a metric of infrastructure criticality in terms of the number of users who may be directly or indirectly disrupted by the failure of physically interdependent infrastructures. Kernel density estimation is used to integrate spatially discrete criticality values associated with individual infrastructure assets, producing a continuous surface from which statistically significant infrastructure criticality hotspots are identified. We develop a comprehensive and unique national-scale demonstration for England and Wales that utilizes previously unavailable data from the energy, transport, water, waste, and digital communications sectors. The testing of 200,000 failure scenarios identifies that hotspots are typically located around the periphery of urban areas where there are large facilities upon which many users depend or where several critical infrastructures are concentrated in one location. © 2017 Society for Risk Analysis.

  16. Geographic differences in heart failure trials.

    Science.gov (United States)

    Ferreira, João Pedro; Girerd, Nicolas; Rossignol, Patrick; Zannad, Faiez

    2015-09-01

    Randomized controlled trials (RCTs) are essential to develop advances in heart failure (HF). The need for increasing numbers of patients (without substantial cost increase) and generalization of results led to the disappearance of international boundaries in large RCTs. The significant geographic differences in patients' characteristics, outcomes, and, most importantly, treatment effect observed in HF trials have recently been highlighted. Whether the observed regional discrepancies in HF trials are due to trial-specific issues, patient heterogeneity, structural differences in countries, or a complex interaction between factors are the questions we propose to debate in this review. To do so, we will analyse and review data from HF trials conducted in different world regions, from heart failure with preserved ejection fraction (HF-PEF), heart failure with reduced ejection fraction (HF-REF), and acute heart failure (AHF). Finally, we will suggest objective and actionable measures in order to mitigate regional discrepancies in future trials, particularly in HF-PEF where prognostic modifying treatments are urgently needed and in which trials are more prone to selection bias, due to a larger patient heterogeneity. © 2015 The Authors European Journal of Heart Failure © 2015 European Society of Cardiology.

  17. Ecoregions and ecoregionalization: geographical and ecological perspectives

    Science.gov (United States)

    Loveland, Thomas R.; Merchant, James W.

    2005-01-01

    Ecoregions, i.e., areas exhibiting relative homogeneity of ecosystems, are units of analysis that are increasingly important in environmental assessment and management. Ecoregions provide a holistic framework for flexible, comparative analysis of complex environmental problems. Ecoregions mapping has intellectual foundations in both geography and ecology. However, a hallmark of ecoregions mapping is that it is a truly interdisciplinary endeavor that demands the integration of knowledge from a multitude of sciences. Geographers emphasize the role of place, scale, and both natural and social elements when delineating and characterizing regions. Ecologists tend to focus on environmental processes with special attention given to energy flows and nutrient cycling. Integration of disparate knowledge from the many key sciences has been one of the great challenges of ecoregions mapping, and may lie at the heart of the lack of consensus on the “optimal” approach and methods to use in such work. Through a review of the principal existing US ecoregion maps, issues that should be addressed in order to advance the state of the art are identified. Research related to needs, methods, data sources, data delivery, and validation is needed. It is also important that the academic system foster education so that there is an infusion of new expertise in ecoregion mapping and use.

  18. Development and Application of the Key Technologies for the Quality Control and Inspection of National Geographical Conditions Survey Products

    Science.gov (United States)

    Zhao, Y.; Zhang, L.; Ma, W.; Zhang, P.; Zhao, T.

    2018-04-01

    The First National Geographical Condition Survey is a predecessor task to dynamically master basic situations of the nature, ecology and human activities on the earth's surface and it is the brand-new mapping geographic information engineering. In order to ensure comprehensive, real and accurate survey results and achieve the quality management target which the qualified rate is 100 % and the yield is more than 80 %, it is necessary to carry out the quality control and result inspection for national geographical conditions survey on a national scale. To ensure that achievement quality meets quality target requirements, this paper develops the key technology method of "five-in-one" quality control that is constituted by "quality control system of national geographical condition survey, quality inspection technology system, quality evaluation system, quality inspection information management system and national linked quality control institutions" by aiming at large scale, wide coverage range, more undertaking units, more management levels, technical updating, more production process and obvious regional differences in the national geographical condition survey and combining with novel achievement manifestation, complicated dependency, more special reference data, and large data size. This project fully considering the domestic and foreign related research results and production practice experience, combined with the technology development and the needs of the production, it stipulates the inspection methods and technical requirements of each stage in the quality inspection of the geographical condition survey results, and extends the traditional inspection and acceptance technology, and solves the key technologies that are badly needed in the first national geographic survey.

  19. Geographic variation in the advertisement calls of Hyla eximia and its possible explanations.

    Science.gov (United States)

    Rodríguez-Tejeda, Ruth E; Méndez-Cárdenas, María Guadalupe; Islas-Villanueva, Valentina; Macías Garcia, Constantino

    2014-01-01

    Populations of species occupying large geographic ranges are often phenotypically diverse as a consequence of variation in selective pressures and drift. This applies to attributes involved in mate choice, particularly when both geographic range and breeding biology overlap between related species. This condition may lead to interference of mating signals, which would in turn promote reproductive character displacement (RCD). We investigated whether variation in the advertisement call of the mountain treefrog (Hyla eximia) is linked to geographic distribution with respect to major Mexican river basins (Panuco, Lerma, Balsas and Magdalena), or to coexistence with its sister (the canyon treefrog, Hyla arenicolor) or another related species (the dwarf treefrog, Tlalocohyla smithii). We also evaluated whether call divergence across the main river basins could be linked to genetic structure. We found that the multidimensional acoustic space of calls from two basins where H. eximia currently interacts with T. smithii, was different from the acoustic space of calls from H. eximia elsewhere. Individuals from these two basins were also distinguishable from the rest by both the phylogeny inferred from mitochondrial sequences, and the genetic structure inferred from nuclear markers. The discordant divergence of H. eximia advertisement calls in the two separate basins where its geographic range overlaps that of T. smithii can be interpreted as the result of two independent events of RCD, presumably as a consequence of acoustic interference in the breeding choruses, although more data are required to evaluate this possibility.

  20. Geographic variation in the advertisement calls of Hyla eximia and its possible explanations

    Directory of Open Access Journals (Sweden)

    Ruth E. Rodríguez-Tejeda

    2014-06-01

    Full Text Available Populations of species occupying large geographic ranges are often phenotypically diverse as a consequence of variation in selective pressures and drift. This applies to attributes involved in mate choice, particularly when both geographic range and breeding biology overlap between related species. This condition may lead to interference of mating signals, which would in turn promote reproductive character displacement (RCD. We investigated whether variation in the advertisement call of the mountain treefrog (Hyla eximia is linked to geographic distribution with respect to major Mexican river basins (Panuco, Lerma, Balsas and Magdalena, or to coexistence with its sister (the canyon treefrog, Hyla arenicolor or another related species (the dwarf treefrog, Tlalocohyla smithii. We also evaluated whether call divergence across the main river basins could be linked to genetic structure. We found that the multidimensional acoustic space of calls from two basins where H. eximia currently interacts with T. smithii, was different from the acoustic space of calls from H. eximia elsewhere. Individuals from these two basins were also distinguishable from the rest by both the phylogeny inferred from mitochondrial sequences, and the genetic structure inferred from nuclear markers. The discordant divergence of H. eximia advertisement calls in the two separate basins where its geographic range overlaps that of T. smithii can be interpreted as the result of two independent events of RCD, presumably as a consequence of acoustic interference in the breeding choruses, although more data are required to evaluate this possibility.

  1. Probabilistic Flood Mapping using Volunteered Geographical Information

    Science.gov (United States)

    Rivera, S. J.; Girons Lopez, M.; Seibert, J.; Minsker, B. S.

    2016-12-01

    Flood extent maps are widely used by decision makers and first responders to provide critical information that prevents economic impacts and the loss of human lives. These maps are usually obtained from sensory data and/or hydrologic models, which often have limited coverage in space and time. Recent developments in social media and communication technology have created a wealth of near-real-time, user-generated content during flood events in many urban areas, such as flooded locations, pictures of flooding extent and height, etc. These data could improve decision-making and response operations as events unfold. However, the integration of these data sources has been limited due to the need for methods that can extract and translate the data into useful information for decision-making. This study presents an approach that uses volunteer geographic information (VGI) and non-traditional data sources (i.e., Twitter, Flicker, YouTube, and 911 and 311 calls) to generate/update the flood extent maps in areas where no models and/or gauge data are operational. The approach combines Web-crawling and computer vision techniques to gather information about the location, extent, and water height of the flood from unstructured textual data, images, and videos. These estimates are then used to provide an updated flood extent map for areas surrounding the geo-coordinate of the VGI through the application of a Hydro Growing Region Algorithm (HGRA). HGRA combines hydrologic and image segmentation concepts to estimate a probabilistic flooding extent along the corresponding creeks. Results obtained for a case study in Austin, TX (i.e., 2015 Memorial Day flood) were comparable to those obtained by a calibrated hydrologic model and had good spatial correlation with flooding extents estimated by the Federal Emergency Management Agency (FEMA).

  2. Automation technology using Geographic Information System (GIS)

    Science.gov (United States)

    Brooks, Cynthia L.

    1994-01-01

    Airport Surface Movement Area is but one of the actions taken to increase the capacity and safety of existing airport facilities. The System Integration Branch (SIB) has designed an integrated system consisting of an electronic moving display in the cockpit, and includes display of taxi routes which will warn controllers and pilots of the position of other traffic and warning information automatically. Although, this system has in test simulation proven to be accurate and helpful; the initial process of obtaining an airport layout of the taxi-routes and designing each of them is a very tedious and time-consuming process. Other methods of preparing the display maps are being researched. One such method is the use of the Geographical Information System (GIS). GIS is an integrated system of computer hardware and software linking topographical, demographic and other resource data that is being referenced. The software can support many areas of work with virtually unlimited information compatibility due to the system's open architecture. GIS will allow us to work faster with increased efficiency and accuracy while providing decision making capabilities. GIS is currently being used at the Langley Research Center with other applications and has been validated as an accurate system for that task. GIS usage for our task will involve digitizing aerial photographs of the topology for each taxi-runway and identifying each position according to its specific spatial coordinates. The information currently being used can be integrated with the GIS system, due to its ability to provide a wide variety of user interfaces. Much more research and data analysis will be needed before this technique will be used, however we are hopeful this will lead to better usage of man-power and technological capabilities for the future.

  3. Choroidal Round Hyporeflectivities in Geographic Atrophy.

    Directory of Open Access Journals (Sweden)

    Eleonora Corbelli

    Full Text Available In geographic atrophy (GA, choroidal vessels typically appear on structural optical coherence tomography (OCT as hyperreflective round areas with highly reflective borders. We observed that some GA eyes show choroidal round hyporeflectivities with highly reflective borders beneath the atrophy, and futher investigated the charcteristcs by comparing structural OCT, indocyanine green angiography (ICGA and OCT angiography (OCT-A.Round hyporeflectivities were individuated from a pool of patients with GA secondary to non-neovascular age-related macular degeneration consecutively presenting between October 2015 and March 2016 at the Medical Retina & Imaging Unit of the University Vita-Salute San Raffaele. Patients underwent a complete ophthalmologic examination including ICGA, structural OCT and OCT-A. The correspondence between choroidal round hyporeflectivities beneath GA on structural OCT and ICGA and OCT-A imaging were analyzed.Fifty eyes of 26 consecutive patients (17 females and 9 males; mean age 76.8±6.2 years with GA were included. Twenty-nine round hyporeflectivities have been found by OCT in choroidal layers in 21 eyes of 21 patients (42.0%; estimated prevalence of 57.7%. All 29 round hyporeflectivities showed constantly a hyperreflective border and a backscattering on structural OCT, and appeared as hypofluorescent in late phase ICGA and as dark foci with non detectable flow in the choroidal segmentation of OCT-A. Interestingly, the GA area was greater in eyes with compared to eyes without round hyporeflectivities (9.30±5.74 and 5.57±4.48mm2, respectively; p = 0.01.Our results suggest that most round hyporeflectivities beneath GA may represent non-perfused or hypo-perfused choroidal vessels with non-detectable flow.

  4. Choroidal Round Hyporeflectivities in Geographic Atrophy.

    Science.gov (United States)

    Corbelli, Eleonora; Sacconi, Riccardo; De Vitis, Luigi Antonio; Carnevali, Adriano; Rabiolo, Alessandro; Querques, Lea; Bandello, Francesco; Querques, Giuseppe

    2016-01-01

    In geographic atrophy (GA), choroidal vessels typically appear on structural optical coherence tomography (OCT) as hyperreflective round areas with highly reflective borders. We observed that some GA eyes show choroidal round hyporeflectivities with highly reflective borders beneath the atrophy, and futher investigated the charcteristcs by comparing structural OCT, indocyanine green angiography (ICGA) and OCT angiography (OCT-A). Round hyporeflectivities were individuated from a pool of patients with GA secondary to non-neovascular age-related macular degeneration consecutively presenting between October 2015 and March 2016 at the Medical Retina & Imaging Unit of the University Vita-Salute San Raffaele. Patients underwent a complete ophthalmologic examination including ICGA, structural OCT and OCT-A. The correspondence between choroidal round hyporeflectivities beneath GA on structural OCT and ICGA and OCT-A imaging were analyzed. Fifty eyes of 26 consecutive patients (17 females and 9 males; mean age 76.8±6.2 years) with GA were included. Twenty-nine round hyporeflectivities have been found by OCT in choroidal layers in 21 eyes of 21 patients (42.0%; estimated prevalence of 57.7%). All 29 round hyporeflectivities showed constantly a hyperreflective border and a backscattering on structural OCT, and appeared as hypofluorescent in late phase ICGA and as dark foci with non detectable flow in the choroidal segmentation of OCT-A. Interestingly, the GA area was greater in eyes with compared to eyes without round hyporeflectivities (9.30±5.74 and 5.57±4.48mm2, respectively; p = 0.01). Our results suggest that most round hyporeflectivities beneath GA may represent non-perfused or hypo-perfused choroidal vessels with non-detectable flow.

  5. Distribución geográfica del riesgo de rabia de origen silvestre y evaluación de los factores asociados con su incidencia en Colombia, 1982-2010 Geographic distribution of wild rabies risk and evaluation of the factors associated with its incidence in Colombia, 1982-2010

    Directory of Open Access Journals (Sweden)

    Diana Marcela Brito-Hoyos

    2013-01-01

    Full Text Available OBJETIVO: Actualizar la información sobre la distribución geográfica de los focos de rabia transmitida por quirópteros en Colombia y evaluar las condiciones bióticas y abióticas asociadas con la incidencia de esta enfermedad en el país. MÉTODOS: Estudio observacional a partir de una base de datos construida con la información de los focos de rabia silvestre detectados entre 1982 y 2010 y la población bovina de cada municipio. Se clasificaron los municipios según el riesgo de transmisión de la enfermedad y se realizó una caracterización ambiental de 15 variables. Se elaboró un modelo de máxima entropía para predecir las zonas con condiciones apropiadas para la presencia del vector Desmodus rotundus infectado por el virus y evaluar la importancia de las variables empleadas. RESULTADOS: Se presentaron 2 330 focos en 359 (31,8% de los 1 128 municipios del país; 144 municipios se clasificaron como de alto riesgo. Montería, Valledupar, Riohacha, Aguachica, Unguía, Acandí, Río de Oro, Tibú, Sahagún y San Onofre concentraron las mayores tasas de incidencia. Los focos de rabia se presentaron a lo largo de todo el año, aunque en los meses secos (de enero a abril se observó una mayor frecuencia (correlación lineal [r] = 0,64. La temperatura y las precipitaciones son las variables que más robustez aportaron al modelo de predicción. CONCLUSIONES: Se recomienda aplicar medidas de control y prevención en los municipios con alto riesgo. Los mejores meses para realizar jornadas de vacunación son junio, noviembre y diciembre. En futuros análisis se deben incluir variables de interacción biótica para mejorar la capacidad predictiva del modelo.OBJECTIVE: To update the information on the geographic distribution of bat-transmitted rabies foci in Colombia and evaluate the biotic and abiotic conditions associated with the incidence of this disease in the country. METHODS: Observational study of a database containing information on the

  6. Geographic Differences in the Earnings of Economics Majors

    Science.gov (United States)

    Winters, John V.; Xu, Weineng

    2014-01-01

    Economics has been shown to be a relatively high-earning college major, but geographic differences in earnings have been largely overlooked. The authors of this article use the American Community Survey to examine geographic differences in both absolute earnings and relative earnings for economics majors. They find that there are substantial…

  7. Geographic Mobility and Social Inequality among Peruvian University Students

    Science.gov (United States)

    Wells, Ryan; Cuenca, Ricardo; Blanco Ramirez, Gerardo; Aragón, Jorge

    2018-01-01

    The purpose of this study was to explore geographic mobility among university students in Peru and to understand how mobility patterns differ by region and by demographic indicators of inequality. The ways that students may be able to move geographically in order to access quality higher education within the educational system can be a driver of…

  8. SCHOOL LINGUISTIC CREATIVITY BASED ON SCIENTIFIC GEOGRAPHICAL TEXTS

    OpenAIRE

    VIORICA BLÎNDĂ

    2012-01-01

    The analysis and observation of the natural environment and of the social and economic one, observing phenomena, objects, beings, and geographical events are at the basis of producing geographical scientific texts. The symbols of iconotexts and cartotexts are another source of inspiration for linguistic interpretation. The linguistic creations that we selected for our study are the scientific analysis, the commentary, the characteriz...

  9. Developing Trainee Teacher Practice with Geographical Information Systems (GIS)

    Science.gov (United States)

    Walshe, Nicola

    2017-01-01

    There is general agreement that geographical information systems (GIS) have a place within the geography classroom; they offer the potential to support geographical learning, exploring real-world problems through student-centred learning, and developing spatial thinking. Despite this, teachers often avoid engaging with GIS and research suggests…

  10. application of geographic information system (gis) in industrial land ...

    African Journals Online (AJOL)

    DEPHILIHS

    Land capability index mapping using Geographic Information System (GIS) principles was used for this study. The study was undertaken using Arc View ... Geographic Information Systems (GIS) is one of the best approaches for this type of ..... western segments and to a small extent the east. Some of the available lands are ...

  11. Issues Surrounding the Use of Virtual Reality in Geographic Education

    Science.gov (United States)

    Lisichenko, Richard

    2015-01-01

    As with all classroom innovations intended to improve geographic education, the adoption of virtual reality (VR) poses issues for consideration prior to endorsing its use. Of these, effectiveness, implementation, and safe use need to be addressed. Traditionally, sense of place, geographic knowledge, and firsthand experiences provided by field…

  12. 47 CFR 22.911 - Cellular geographic service area.

    Science.gov (United States)

    2010-10-01

    ... Cellular Geographic Service Area (CGSA) of a cellular system is the geographic area considered by the FCC... application for modification of the CGSA using FCC Form 601, a depiction of what the carrier believes the CGSA... location and the locus of points where the predicted or measured median field strength finally drops to 32...

  13. Influence of the geographical curriculum on competences of geography teachers

    Directory of Open Access Journals (Sweden)

    Tatjana Resnik Planinc

    2007-12-01

    Full Text Available The paper analyses the influence of geographical curriculum on competences of geography teacher. It is focused on complex and symbiotic relation between curriculum and achieved and recommended competences of geography teacher and their importance for geographical education. The competences should therefore be derived from the theories, concerning values, knowledge, curriculum and whole educational process, which underpin good pedagogical practice.

  14. Characteristics of the socio-geographical factors in the Drina-Velika Morava strategic direction zone

    Directory of Open Access Journals (Sweden)

    Dejan Radivoj Inđić

    2013-06-01

    Full Text Available This paper presents the assessment of the operational – geographic features of the Drina–Velikamorava strategic direction. Due to the scope of the article, a variant of the assessment of the strategic direction is presented through its socio–geographic factors, while the mathematical–geographical and physical–geographic factors, as well as the operating lines of action are not discussed. Within the socio–geographic factors, the characteristics of the population, economy and communication networks are considered. The geographic area of the direction is nationally compact and provides war mobilization of units with no particular strain. The transportation network is not fully developed which makes combat operations difficult to attackers and facilitates them for the deffenders.. There are significant technical and technological potentials in the direction of the zone, but they are not evenly distributed. After the consideration of the complex socio – geographical factors, it is  concluded that the shown strategic direction enables, without any special restrictions, a successful execution of combat operations in the long run.   Introduction The Drina–Velikamorava strategic direction of action consists of two operational lines: Semberija–Šumadija and Glasinac–Zapadnamorava. This paper presents a variant of a complex evaluation of the socio–geographic factors in the area of strategic direction. Within the socio–geographic factors in the strategic direction, the characteristics of the population, economy and communication networks are discussed.   Characteristics of the population and settlements In the geographic strategic direction, there is about 30% of the population of the Republic of Serbia. The highest population density is in major cities (Belgrade, Novi Sad, Šabac, Čačak, etc.. The space is nationally compact, and over 95% of the population are Serbs. In terms of building methods,there are the following types of

  15. Development Trends of Cartography and Geographic Information Engineering

    Directory of Open Access Journals (Sweden)

    WANG Jiayao

    2010-04-01

    Full Text Available Aimed at the problems of cartography and geographic information engineering and increasing demands of national and military infomationization construction, the paper proposes six hotspots on the research of cartography and geographic information engineering for the future on the foundation of analyzing the development track of cartology, which are heterogeneous geospatial data assimilation, transferring from emphasizing geography infor-mation gaining to user-oriented geographic information deep processing, web or grid geographic information service. intelligent spatial data generalization. integration of GIS and VGE. cartography and geographic information engineering theory system with multi-mode(Map,.GlS..VGE spatial-temporal integrated cognition as the core. And discusses the necessity ,existing groundwork and research contents on studying these hotspots.

  16. Air Force Recruitment: A Geographic Perspective

    National Research Council Canada - National Science Library

    Ross, Jason J

    2000-01-01

    ... of relying upon propensity studies. Descriptive and inferential statistics were used to create and evaluate both non-spatial and spatial auto correlated models to determine the best method for predicting recruitment...

  17. National Geographic Education. An Interview with Gilbert M. Grosvenor, President and Chairman of the Board, National Geographic Society.

    Science.gov (United States)

    Jumper, Sidney R.

    1991-01-01

    Presents an interview with Gilbert Grosvenor, president and chairman of the board of the National Geographic Society. Examines student and public ignorance about geography. Describes the Society's Geography Education Project, Geographic Alliance Project, and Education Foundation. Includes Grosvenor's call for greater emphasis on geography in…

  18. Education and health and well-being: direct and indirect effects with multiple mediators and interactions with multiple imputed data in Stata.

    Science.gov (United States)

    Sheikh, Mashhood Ahmed; Abelsen, Birgit; Olsen, Jan Abel

    2017-11-01

    Previous methods for assessing mediation assume no multiplicative interactions. The inverse odds weighting (IOW) approach has been presented as a method that can be used even when interactions exist. The substantive aim of this study was to assess the indirect effect of education on health and well-being via four indicators of adult socioeconomic status (SES): income, management position, occupational hierarchy position and subjective social status. 8516 men and women from the Tromsø Study (Norway) were followed for 17 years. Education was measured at age 25-74 years, while SES and health and well-being were measured at age 42-91 years. Natural direct and indirect effects (NIE) were estimated using weighted Poisson regression models with IOW. Stata code is provided that makes it easy to assess mediation in any multiple imputed dataset with multiple mediators and interactions. Low education was associated with lower SES. Consequently, low SES was associated with being unhealthy and having a low level of well-being. The effect (NIE) of education on health and well-being is mediated by income, management position, occupational hierarchy position and subjective social status. This study contributes to the literature on mediation analysis, as well as the literature on the importance of education for health-related quality of life and subjective well-being. The influence of education on health and well-being had different pathways in this Norwegian sample. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  19. AN ENCODING METHOD FOR COMPRESSING GEOGRAPHICAL COORDINATES IN 3D SPACE

    Directory of Open Access Journals (Sweden)

    C. Qian

    2017-09-01

    Full Text Available This paper proposed an encoding method for compressing geographical coordinates in 3D space. By the way of reducing the length of geographical coordinates, it helps to lessen the storage size of geometry information. In addition, the encoding algorithm subdivides the whole space according to octree rules, which enables progressive transmission and loading. Three main steps are included in this method: (1 subdividing the whole 3D geographic space based on octree structure, (2 resampling all the vertices in 3D models, (3 encoding the coordinates of vertices with a combination of Cube Index Code (CIC and Geometry Code. A series of geographical 3D models were applied to evaluate the encoding method. The results showed that this method reduced the storage size of most test data by 90 % or even more under the condition of a speed of encoding and decoding. In conclusion, this method achieved a remarkable compression rate in vertex bit size with a steerable precision loss. It shall be of positive meaning to the web 3d map storing and transmission.

  20. Geographic variation in chin shape challenges the universal facial attractiveness hypothesis.

    Directory of Open Access Journals (Sweden)

    Zaneta M Thayer

    Full Text Available The universal facial attractiveness (UFA hypothesis proposes that some facial features are universally preferred because they are reliable signals of mate quality. The primary evidence for this hypothesis comes from cross-cultural studies of perceived attractiveness. However, these studies do not directly address patterns of morphological variation at the population level. An unanswered question is therefore: Are universally preferred facial phenotypes geographically invariant, as the UFA hypothesis implies? The purpose of our study is to evaluate this often overlooked aspect of the UFA hypothesis by examining patterns of geographic variation in chin shape. We collected symphyseal outlines from 180 recent human mandibles (90 male, 90 female representing nine geographic regions. Elliptical Fourier functions analysis was used to quantify chin shape, and principle components analysis was used to compute shape descriptors. In contrast to the expectations of the UFA hypothesis, we found significant geographic differences in male and female chin shape. These findings are consistent with region-specific sexual selection and/or random genetic drift, but not universal sexual selection. We recommend that future studies of facial attractiveness take into consideration patterns of morphological variation within and between diverse human populations.