WorldWideScience

Sample records for geographic imputation approaches

  1. Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes

    Directory of Open Access Journals (Sweden)

    Puett Robin C

    2009-10-01

    Full Text Available Abstract Background There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals and at higher group-levels (e.g., spatial distribution. Methods We evaluated the accuracy of eight geo-imputation methods for address allocation from ZIP codes to census tracts at the individual and group level. The spatial apportioning approaches underlying the imputation methods included four fixed (deterministic and four random (stochastic allocation methods using land area, total population, population under age 20, and race/ethnicity as weighting factors. Data included more than 2,000 geocoded cases of diabetes mellitus among youth aged 0-19 in four U.S. regions. The imputed distribution of cases across tracts was compared to the true distribution using a chi-squared statistic. Results At the individual level, population-weighted (total or under age 20 fixed allocation showed the greatest level of accuracy, with correct census tract assignments averaging 30.01% across all regions, followed by the race/ethnicity-weighted random method (23.83%. The true distribution of cases across census tracts was that 58.2% of tracts exhibited no cases, 26.2% had one case, 9.5% had two cases, and less than 3% had three or more. This distribution was best captured by random allocation methods, with no significant differences (p-value > 0.90. However, significant differences in distributions based on fixed allocation methods were found (p-value Conclusion Fixed imputation methods seemed to yield greatest accuracy at the individual level, suggesting use for studies on area-level environmental exposures. Fixed methods result in artificial clusters in single census tracts. For studies

  2. Estimating the accuracy of geographical imputation

    Directory of Open Access Journals (Sweden)

    Boscoe Francis P

    2008-01-01

    Full Text Available Abstract Background To reduce the number of non-geocoded cases researchers and organizations sometimes include cases geocoded to postal code centroids along with cases geocoded with the greater precision of a full street address. Some analysts then use the postal code to assign information to the cases from finer-level geographies such as a census tract. Assignment is commonly completed using either a postal centroid or by a geographical imputation method which assigns a location by using both the demographic characteristics of the case and the population characteristics of the postal delivery area. To date no systematic evaluation of geographical imputation methods ("geo-imputation" has been completed. The objective of this study was to determine the accuracy of census tract assignment using geo-imputation. Methods Using a large dataset of breast, prostate and colorectal cancer cases reported to the New Jersey Cancer Registry, we determined how often cases were assigned to the correct census tract using alternate strategies of demographic based geo-imputation, and using assignments obtained from postal code centroids. Assignment accuracy was measured by comparing the tract assigned with the tract originally identified from the full street address. Results Assigning cases to census tracts using the race/ethnicity population distribution within a postal code resulted in more correctly assigned cases than when using postal code centroids. The addition of age characteristics increased the match rates even further. Match rates were highly dependent on both the geographic distribution of race/ethnicity groups and population density. Conclusion Geo-imputation appears to offer some advantages and no serious drawbacks as compared with the alternative of assigning cases to census tracts based on postal code centroids. For a specific analysis, researchers will still need to consider the potential impact of geocoding quality on their results and evaluate

  3. Improving accuracy of rare variant imputation with a two-step imputation approach

    DEFF Research Database (Denmark)

    Kreiner-Møller, Eskil; Medina-Gomez, Carolina; Uitterlinden, André G

    2015-01-01

    not being comprehensively scrutinized. Next-generation arrays ensuring sufficient coverage together with new reference panels, as the 1000 Genomes panel, are emerging to facilitate imputation of low frequent single-nucleotide polymorphisms (minor allele frequency (MAF) ... reference sample genotyped on a dense array and hereafter to the 1000 Genomes reference panel. We show that mean imputation quality, measured by the r(2) using this approach, increases by 28% for variants with a MAF between 1 and 5% as compared with direct imputation to 1000 Genomes reference. Similarly......Genotype imputation has been the pillar of the success of genome-wide association studies (GWAS) for identifying common variants associated with common diseases. However, most GWAS have been run using only 60 HapMap samples as reference for imputation, meaning less frequent and rare variants...

  4. A web-based approach to data imputation

    KAUST Repository

    Li, Zhixu

    2013-10-24

    In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Moreover, several optimization techniques are also proposed to reduce the cost of estimating the confidence of imputation queries at both the tuple-level and the database-level. Experiments based on several real-world data collections demonstrate not only the effectiveness of WebPut compared to existing approaches, but also the efficiency of our proposed algorithms and optimization techniques. © 2013 Springer Science+Business Media New York.

  5. TRIP: An interactive retrieving-inferring data imputation approach

    KAUST Repository

    Li, Zhixu

    2016-06-25

    Data imputation aims at filling in missing attribute values in databases. Existing imputation approaches to nonquantitive string data can be roughly put into two categories: (1) inferring-based approaches [2], and (2) retrieving-based approaches [1]. Specifically, the inferring-based approaches find substitutes or estimations for the missing ones from the complete part of the data set. However, they typically fall short in filling in unique missing attribute values which do not exist in the complete part of the data set [1]. The retrieving-based approaches resort to external resources for help by formulating proper web search queries to retrieve web pages containing the missing values from the Web, and then extracting the missing values from the retrieved web pages [1]. This webbased retrieving approach reaches a high imputation precision and recall, but on the other hand, issues a large number of web search queries, which brings a large overhead [1]. © 2016 IEEE.

  6. TRIP: An interactive retrieving-inferring data imputation approach

    KAUST Repository

    Li, Zhixu; Qin, Lu; Cheng, Hong; Zhang, Xiangliang; Zhou, Xiaofang

    2016-01-01

    Data imputation aims at filling in missing attribute values in databases. Existing imputation approaches to nonquantitive string data can be roughly put into two categories: (1) inferring-based approaches [2], and (2) retrieving-based approaches [1]. Specifically, the inferring-based approaches find substitutes or estimations for the missing ones from the complete part of the data set. However, they typically fall short in filling in unique missing attribute values which do not exist in the complete part of the data set [1]. The retrieving-based approaches resort to external resources for help by formulating proper web search queries to retrieve web pages containing the missing values from the Web, and then extracting the missing values from the retrieved web pages [1]. This webbased retrieving approach reaches a high imputation precision and recall, but on the other hand, issues a large number of web search queries, which brings a large overhead [1]. © 2016 IEEE.

  7. DTW-APPROACH FOR UNCORRELATED MULTIVARIATE TIME SERIES IMPUTATION

    OpenAIRE

    Phan , Thi-Thu-Hong; Poisson Caillault , Emilie; Bigand , André; Lefebvre , Alain

    2017-01-01

    International audience; Missing data are inevitable in almost domains of applied sciences. Data analysis with missing values can lead to a loss of efficiency and unreliable results, especially for large missing sub-sequence(s). Some well-known methods for multivariate time series imputation require high correlations between series or their features. In this paper , we propose an approach based on the shape-behaviour relation in low/un-correlated multivariate time series under an assumption of...

  8. A nonparametric multiple imputation approach for missing categorical data

    Directory of Open Access Journals (Sweden)

    Muhan Zhou

    2017-06-01

    Full Text Available Abstract Background Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness probabilities. Methods We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model and the other fits a logistic regression for predicting missingness probabilities (the missingness model. A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. Results The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. Conclusions We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with

  9. Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

    DEFF Research Database (Denmark)

    Meseck, Kristin; Jankowska, Marta M; Schipperijn, Jasper

    2016-01-01

    The main purpose of the present study was to assess the impact of global positioning system (GPS) signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accurate...

  10. A suggested approach for imputation of missing dietary data for young children in daycare.

    Science.gov (United States)

    Stevens, June; Ou, Fang-Shu; Truesdale, Kimberly P; Zeng, Donglin; Vaughn, Amber E; Pratt, Charlotte; Ward, Dianne S

    2015-01-01

    Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult. The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method. Data were from children aged 2-5 years in the My Parenting SOS project (n=308; 870 24-h diet recalls). Mixed models were used to simultaneously predict breakfast, dinner, and evening snacks (B+D+ES); lunch; and daytime snacks for all children after adjusting for age, sex, and body mass index (BMI). From these models, we imputed the missing weekday daycare lunches by interpolation using the mean lunch to B+D+ES [L/(B+D+ES)] ratio among non-daycare children on weekdays and the L/(B+D+ES) ratio for all children on weekends. Daytime snack data were used to impute snacks. The reported mean (± standard deviation) weekday intake was lower for daycare children [725 (±324) kcal] compared to non-daycare children [1,048 (±463) kcal]. Weekend intake for all children was 1,173 (±427) kcal. After imputation, weekday caloric intake for daycare children was 1,230 (±409) kcal. Daily intakes that included imputed data were associated with age and sex but not with BMI. This work indicates that imputation is a promising method for improving the precision of daily nutrient data from young children.

  11. A suggested approach for imputation of missing dietary data for young children in daycare

    Directory of Open Access Journals (Sweden)

    June Stevens

    2015-12-01

    Full Text Available Background: Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult. Objective: The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method. Design: Data were from children aged 2-5 years in the My Parenting SOS project (n=308; 870 24-h diet recalls. Mixed models were used to simultaneously predict breakfast, dinner, and evening snacks (B+D+ES; lunch; and daytime snacks for all children after adjusting for age, sex, and body mass index (BMI. From these models, we imputed the missing weekday daycare lunches by interpolation using the mean lunch to B+D+ES [L/(B+D+ES] ratio among non-daycare children on weekdays and the L/(B+D+ES ratio for all children on weekends. Daytime snack data were used to impute snacks. Results: The reported mean (± standard deviation weekday intake was lower for daycare children [725 (±324 kcal] compared to non-daycare children [1,048 (±463 kcal]. Weekend intake for all children was 1,173 (±427 kcal. After imputation, weekday caloric intake for daycare children was 1,230 (±409 kcal. Daily intakes that included imputed data were associated with age and sex but not with BMI. Conclusion: This work indicates that imputation is a promising method for improving the precision of daily nutrient data from young children.

  12. A suggested approach for imputation of missing dietary data for young children in daycare

    OpenAIRE

    Stevens, June; Ou, Fang-Shu; Truesdale, Kimberly P.; Zeng, Donglin; Vaughn, Amber E.; Pratt, Charlotte; Ward, Dianne S.

    2015-01-01

    Background: Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult.Objective: The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method.Design: Data were from children aged 2-5 years in the My Parenting...

  13. Combination of individual tree detection and area-based approach in imputation of forest variables using airborne laser data

    Science.gov (United States)

    Vastaranta, Mikko; Kankare, Ville; Holopainen, Markus; Yu, Xiaowei; Hyyppä, Juha; Hyyppä, Hannu

    2012-01-01

    The two main approaches to deriving forest variables from laser-scanning data are the statistical area-based approach (ABA) and individual tree detection (ITD). With ITD it is feasible to acquire single tree information, as in field measurements. Here, ITD was used for measuring training data for the ABA. In addition to automatic ITD (ITD auto), we tested a combination of ITD auto and visual interpretation (ITD visual). ITD visual had two stages: in the first, ITD auto was carried out and in the second, the results of the ITD auto were visually corrected by interpreting three-dimensional laser point clouds. The field data comprised 509 circular plots ( r = 10 m) that were divided equally for testing and training. ITD-derived forest variables were used for training the ABA and the accuracies of the k-most similar neighbor ( k-MSN) imputations were evaluated and compared with the ABA trained with traditional measurements. The root-mean-squared error (RMSE) in the mean volume was 24.8%, 25.9%, and 27.2% with the ABA trained with field measurements, ITD auto, and ITD visual, respectively. When ITD methods were applied in acquiring training data, the mean volume, basal area, and basal area-weighted mean diameter were underestimated in the ABA by 2.7-9.2%. This project constituted a pilot study for using ITD measurements as training data for the ABA. Further studies are needed to reduce the bias and to determine the accuracy obtained in imputation of species-specific variables. The method could be applied in areas with sparse road networks or when the costs of fieldwork must be minimized.

  14. A spatial haplotype copying model with applications to genotype imputation.

    Science.gov (United States)

    Yang, Wen-Yun; Hormozdiari, Farhad; Eskin, Eleazar; Pasaniuc, Bogdan

    2015-05-01

    Ever since its introduction, the haplotype copy model has proven to be one of the most successful approaches for modeling genetic variation in human populations, with applications ranging from ancestry inference to genotype phasing and imputation. Motivated by coalescent theory, this approach assumes that any chromosome (haplotype) can be modeled as a mosaic of segments copied from a set of chromosomes sampled from the same population. At the core of the model is the assumption that any chromosome from the sample is equally likely to contribute a priori to the copying process. Motivated by recent works that model genetic variation in a geographic continuum, we propose a new spatial-aware haplotype copy model that jointly models geography and the haplotype copying process. We extend hidden Markov models of haplotype diversity such that at any given location, haplotypes that are closest in the genetic-geographic continuum map are a priori more likely to contribute to the copying process than distant ones. Through simulations starting from the 1000 Genomes data, we show that our model achieves superior accuracy in genotype imputation over the standard spatial-unaware haplotype copy model. In addition, we show the utility of our model in selecting a small personalized reference panel for imputation that leads to both improved accuracy as well as to a lower computational runtime than the standard approach. Finally, we show our proposed model can be used to localize individuals on the genetic-geographical map on the basis of their genotype data.

  15. Missing data imputation: focusing on single imputation.

    Science.gov (United States)

    Zhang, Zhongheng

    2016-01-01

    Complete case analysis is widely used for handling missing data, and it is the default method in many statistical packages. However, this method may introduce bias and some useful information will be omitted from analysis. Therefore, many imputation methods are developed to make gap end. The present article focuses on single imputation. Imputations with mean, median and mode are simple but, like complete case analysis, can introduce bias on mean and deviation. Furthermore, they ignore relationship with other variables. Regression imputation can preserve relationship between missing values and other variables. There are many sophisticated methods exist to handle missing values in longitudinal data. This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations.

  16. Missing in space: an evaluation of imputation methods for missing data in spatial analysis of risk factors for type II diabetes.

    Science.gov (United States)

    Baker, Jannah; White, Nicole; Mengersen, Kerrie

    2014-11-20

    Spatial analysis is increasingly important for identifying modifiable geographic risk factors for disease. However, spatial health data from surveys are often incomplete, ranging from missing data for only a few variables, to missing data for many variables. For spatial analyses of health outcomes, selection of an appropriate imputation method is critical in order to produce the most accurate inferences. We present a cross-validation approach to select between three imputation methods for health survey data with correlated lifestyle covariates, using as a case study, type II diabetes mellitus (DM II) risk across 71 Queensland Local Government Areas (LGAs). We compare the accuracy of mean imputation to imputation using multivariate normal and conditional autoregressive prior distributions. Choice of imputation method depends upon the application and is not necessarily the most complex method. Mean imputation was selected as the most accurate method in this application. Selecting an appropriate imputation method for health survey data, after accounting for spatial correlation and correlation between covariates, allows more complete analysis of geographic risk factors for disease with more confidence in the results to inform public policy decision-making.

  17. Data driven estimation of imputation error-a strategy for imputation with a reject option

    DEFF Research Database (Denmark)

    Bak, Nikolaj; Hansen, Lars Kai

    2016-01-01

    Missing data is a common problem in many research fields and is a challenge that always needs careful considerations. One approach is to impute the missing values, i.e., replace missing values with estimates. When imputation is applied, it is typically applied to all records with missing values i...

  18. Managing environmental radioactivity monitoring data: a geographic information system approach

    International Nuclear Information System (INIS)

    Heywood, I.; Cornelius, S.

    1993-01-01

    An overview of the current British approach to environmental radiation monitoring is presented here, followed by a discussion of the major issues which would have to be considered in formulating a geographical information system (GIS) for the management of radiation monitoring data. Finally, examples illustrating the use of spatial data handling and automated cartographic techniques are provided from work undertaken by the authors. These examples are discussed in the context of developing a National Radiological Spatial Information System (NRSIS) demonstrator utilising GIS technology. (Author)

  19. Missing value imputation for epistatic MAPs

    LENUS (Irish Health Repository)

    Ryan, Colm

    2010-04-20

    Abstract Background Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data. Results We identify different categories for the missing data based on their underlying cause, and show that values from the largest category can be imputed effectively. We compare local and global imputation approaches across a variety of distinct E-MAP datasets, showing that both are competitive and preferable to filling in with zeros. In addition we show that these methods are effective in an E-MAP from a different species, suggesting that pairwise imputation techniques will be increasingly useful as analogous epistasis mapping techniques are developed in different species. We show that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions. Finally we show that imputed interactions, generated using nearest neighbor methods, are enriched for annotations in the same manner as measured interactions. Therefore our method potentially

  20. Fully conditional specification in multivariate imputation

    NARCIS (Netherlands)

    van Buuren, S.; Brand, J. P.L.; Groothuis-Oudshoorn, C. G.M.; Rubin, D. B.

    2006-01-01

    The use of the Gibbs sampler with fully conditionally specified models, where the distribution of each variable given the other variables is the starting point, has become a popular method to create imputations in incomplete multivariate data. The theoretical weakness of this approach is that the

  1. Public Undertakings and Imputability

    DEFF Research Database (Denmark)

    Ølykke, Grith Skovgaard

    2013-01-01

    In this article, the issue of impuability to the State of public undertakings’ decision-making is analysed and discussed in the context of the DSBFirst case. DSBFirst is owned by the independent public undertaking DSB and the private undertaking FirstGroup plc and won the contracts in the 2008...... Oeresund tender for the provision of passenger transport by railway. From the start, the services were provided at a loss, and in the end a part of DSBFirst was wound up. In order to frame the problems illustrated by this case, the jurisprudence-based imputability requirement in the definition of State aid...... in Article 107(1) TFEU is analysed. It is concluded that where the public undertaking transgresses the control system put in place by the State, conditions for imputability are not fulfilled, and it is argued that in the current state of law, there is no conditional link between the level of control...

  2. Multiple imputation in the presence of non-normal data.

    Science.gov (United States)

    Lee, Katherine J; Carlin, John B

    2017-02-20

    Multiple imputation (MI) is becoming increasingly popular for handling missing data. Standard approaches for MI assume normality for continuous variables (conditionally on the other variables in the imputation model). However, it is unclear how to impute non-normally distributed continuous variables. Using simulation and a case study, we compared various transformations applied prior to imputation, including a novel non-parametric transformation, to imputation on the raw scale and using predictive mean matching (PMM) when imputing non-normal data. We generated data from a range of non-normal distributions, and set 50% to missing completely at random or missing at random. We then imputed missing values on the raw scale, following a zero-skewness log, Box-Cox or non-parametric transformation and using PMM with both type 1 and 2 matching. We compared inferences regarding the marginal mean of the incomplete variable and the association with a fully observed outcome. We also compared results from these approaches in the analysis of depression and anxiety symptoms in parents of very preterm compared with term-born infants. The results provide novel empirical evidence that the decision regarding how to impute a non-normal variable should be based on the nature of the relationship between the variables of interest. If the relationship is linear in the untransformed scale, transformation can introduce bias irrespective of the transformation used. However, if the relationship is non-linear, it may be important to transform the variable to accurately capture this relationship. A useful alternative is to impute the variable using PMM with type 1 matching. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  3. Airline loyalty (programs) across borders : A geographic discontinuity approach

    NARCIS (Netherlands)

    de Jong, Gerben; Behrens, Christiaan; van Ommeren, Jos

    2018-01-01

    We analyze brand loyalty advantages of national airlines in their domestic countries using geocoded data from a major international frequent flier program. We employ a geographic discontinuity design that estimates discontinuities in program activity at the national borders of the program's

  4. Place Branding – Geographical Approach. Case Study: Waterloo

    Directory of Open Access Journals (Sweden)

    Marius-Cristian Neacşu

    2016-11-01

    Full Text Available This study represents an exploratory analysis of the evolution of the place branding concept, with an important focus on the geographical perspective. How has this notion, a newcomer into the geographers' analysis, changed over time and what role does it have in the decision making process of intervening into the way a certain place is organised or as an instrument of economic revival and territorial development? At least from the perspective of Romanian geographical literature, the originality and novelty of this study is obvious. An element of the originality of this research is the attempt of redefining the concept of place branding so that it is more meaningful from the perspective of spatial analyses. The reason for which Waterloo was chosen as a case study is multi-dimensional: the case studies so far have mainly focused on large cities (urban branding instead of place branding and this site has all the theoretical elements to create a stand-alone brand.

  5. Bootstrap inference when using multiple imputation.

    Science.gov (United States)

    Schomaker, Michael; Heumann, Christian

    2018-04-16

    Many modern estimators require bootstrapping to calculate confidence intervals because either no analytic standard error is available or the distribution of the parameter of interest is nonsymmetric. It remains however unclear how to obtain valid bootstrap inference when dealing with multiple imputation to address missing data. We present 4 methods that are intuitively appealing, easy to implement, and combine bootstrap estimation with multiple imputation. We show that 3 of the 4 approaches yield valid inference, but that the performance of the methods varies with respect to the number of imputed data sets and the extent of missingness. Simulation studies reveal the behavior of our approaches in finite samples. A topical analysis from HIV treatment research, which determines the optimal timing of antiretroviral treatment initiation in young children, demonstrates the practical implications of the 4 methods in a sophisticated and realistic setting. This analysis suffers from missing data and uses the g-formula for inference, a method for which no standard errors are available. Copyright © 2018 John Wiley & Sons, Ltd.

  6. Using imputation to provide location information for nongeocoded addresses.

    Directory of Open Access Journals (Sweden)

    Frank C Curriero

    2010-02-01

    Full Text Available The importance of geography as a source of variation in health research continues to receive sustained attention in the literature. The inclusion of geographic information in such research often begins by adding data to a map which is predicated by some knowledge of location. A precise level of spatial information is conventionally achieved through geocoding, the geographic information system (GIS process of translating mailing address information to coordinates on a map. The geocoding process is not without its limitations, though, since there is always a percentage of addresses which cannot be converted successfully (nongeocodable. This raises concerns regarding bias since traditionally the practice has been to exclude nongeocoded data records from analysis.In this manuscript we develop and evaluate a set of imputation strategies for dealing with missing spatial information from nongeocoded addresses. The strategies are developed assuming a known zip code with increasing use of collateral information, namely the spatial distribution of the population at risk. Strategies are evaluated using prostate cancer data obtained from the Maryland Cancer Registry. We consider total case enumerations at the Census county, tract, and block group level as the outcome of interest when applying and evaluating the methods. Multiple imputation is used to provide estimated total case counts based on complete data (geocodes plus imputed nongeocodes with a measure of uncertainty. Results indicate that the imputation strategy based on using available population-based age, gender, and race information performed the best overall at the county, tract, and block group levels.The procedure allows for the potentially biased and likely under reported outcome, case enumerations based on only the geocoded records, to be presented with a statistically adjusted count (imputed count with a measure of uncertainty that are based on all the case data, the geocodes and imputed

  7. Multiply-Imputed Synthetic Data: Advice to the Imputer

    Directory of Open Access Journals (Sweden)

    Loong Bronwyn

    2017-12-01

    Full Text Available Several statistical agencies have started to use multiply-imputed synthetic microdata to create public-use data in major surveys. The purpose of doing this is to protect the confidentiality of respondents’ identities and sensitive attributes, while allowing standard complete-data analyses of microdata. A key challenge, faced by advocates of synthetic data, is demonstrating that valid statistical inferences can be obtained from such synthetic data for non-confidential questions. Large discrepancies between observed-data and synthetic-data analytic results for such questions may arise because of uncongeniality; that is, differences in the types of inputs available to the imputer, who has access to the actual data, and to the analyst, who has access only to the synthetic data. Here, we discuss a simple, but possibly canonical, example of uncongeniality when using multiple imputation to create synthetic data, which specifically addresses the choices made by the imputer. An initial, unanticipated but not surprising, conclusion is that non-confidential design information used to impute synthetic data should be released with the confidential synthetic data to allow users of synthetic data to avoid possible grossly conservative inferences.

  8. IMPROVEMENT OF THE F-PERCEPTORY APPROACH THROUGH MANAGEMENT OF FUZZY COMPLEX GEOGRAPHIC OBJECTS

    Directory of Open Access Journals (Sweden)

    B. Khalfi

    2015-08-01

    Full Text Available In the real world, data is imperfect and in various ways such as imprecision, vagueness, uncertainty, ambiguity and inconsistency. For geographic data, the fuzzy aspect is mainly manifested in time, space and the function of objects and is due to a lack of precision. Therefore, the researchers in the domain emphasize the importance of modeling data structures in GIS but also their lack of adaptation to fuzzy data. The F-Perceptory approachh manages the modeling of imperfect geographic information with UML. This management is essential to maintain faithfulness to reality and to better guide the user in his decision-making. However, this approach does not manage fuzzy complex geographic objects. The latter presents a multiple object with similar or different geographic shapes. So, in this paper, we propose to improve the F-Perceptory approach by proposing to handle fuzzy complex geographic objects modeling. In a second step, we propose its transformation to the UML modeling.

  9. Avian surveys of large geographical areas: A systematic approach

    Science.gov (United States)

    Scott, J.M.; Jacobi, J.D.; Ramsey, F.L.

    1981-01-01

    A multidisciplinary team approach was used to simultaneously map the distribution of birds, selected food items, and major vegetation types in 34,000- to 140,000-ha tracts in native Hawaiian forests. By using a team approach, large savings in time can be realized over attempts to conduct similar surveys of smaller scope, and a systems approach to management problems is made easier. The methods used in survey design, training observers, and documenting bird numbersand habitat descriptions are discussed in detail.

  10. A SOA-based approach to geographical data sharing

    Science.gov (United States)

    Li, Zonghua; Peng, Mingjun; Fan, Wei

    2009-10-01

    In the last few years, large volumes of spatial data have been available in different government departments in China, but these data are mainly used within these departments. With the e-government project initiated, spatial data sharing become more and more necessary. Currently, the Web has been used not only for document searching but also for the provision and use of services, known as Web services, which are published in a directory and may be automatically discovered by software agents. Particularly in the spatial domain, the possibility of accessing these large spatial datasets via Web services has motivated research into the new field of Spatial Data Infrastructure (SDI) implemented using service-oriented architecture. In this paper a Service-Oriented Architecture (SOA) based Geographical Information Systems (GIS) is proposed, and a prototype system is deployed based on Open Geospatial Consortium (OGC) standard in Wuhan, China, thus that all the departments authorized can access the spatial data within the government intranet, and also these spatial data can be easily integrated into kinds of applications.

  11. Multiple imputation and its application

    CERN Document Server

    Carpenter, James

    2013-01-01

    A practical guide to analysing partially observed data. Collecting, analysing and drawing inferences from data is central to research in the medical and social sciences. Unfortunately, it is rarely possible to collect all the intended data. The literature on inference from the resulting incomplete  data is now huge, and continues to grow both as methods are developed for large and complex data structures, and as increasing computer power and suitable software enable researchers to apply these methods. This book focuses on a particular statistical method for analysing and drawing inferences from incomplete data, called Multiple Imputation (MI). MI is attractive because it is both practical and widely applicable. The authors aim is to clarify the issues raised by missing data, describing the rationale for MI, the relationship between the various imputation models and associated algorithms and its application to increasingly complex data structures. Multiple Imputation and its Application: Discusses the issues ...

  12. Flexible Imputation of Missing Data

    CERN Document Server

    van Buuren, Stef

    2012-01-01

    Missing data form a problem in every scientific discipline, yet the techniques required to handle them are complicated and often lacking. One of the great ideas in statistical science--multiple imputation--fills gaps in the data with plausible values, the uncertainty of which is coded in the data itself. It also solves other problems, many of which are missing data problems in disguise. Flexible Imputation of Missing Data is supported by many examples using real data taken from the author's vast experience of collaborative research, and presents a practical guide for handling missing data unde

  13. Imputing data that are missing at high rates using a boosting algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Cauthen, Katherine Regina [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Lambert, Gregory [Apple Inc., Cupertino, CA (United States); Ray, Jaideep [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Lefantzi, Sophia [Sandia National Lab. (SNL-CA), Livermore, CA (United States)

    2016-09-01

    Traditional multiple imputation approaches may perform poorly for datasets with high rates of missingness unless many m imputations are used. This paper implements an alternative machine learning-based approach to imputing data that are missing at high rates. Here, we use boosting to create a strong learner from a weak learner fitted to a dataset missing many observations. This approach may be applied to a variety of types of learners (models). The approach is demonstrated by application to a spatiotemporal dataset for predicting dengue outbreaks in India from meteorological covariates. A Bayesian spatiotemporal CAR model is boosted to produce imputations, and the overall RMSE from a k-fold cross-validation is used to assess imputation accuracy.

  14. The Ability of Different Imputation Methods to Preserve the Significant Genes and Pathways in Cancer

    Directory of Open Access Journals (Sweden)

    Rosa Aghdam

    2017-12-01

    Full Text Available Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving significant genes and pathways. In the first step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next, 10 well-known imputation methods were applied to the complete datasets. The significance analysis of microarrays (SAM method was applied to detect the significant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving significant genes. To determine the impact of different imputation methods on the identification of important genes, the chi-squared test was used to compare the proportions of overlaps between significant genes detected from original data and those detected from the imputed datasets. Additionally, the significant genes are tested for their enrichment in important pathways, using the ConsensusPathDB. Our results showed that almost all the significant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no significant difference in the performance of various imputation methods tested. The source code and selected datasets are available on http://profiles.bs.ipm.ir/softwares/imputation_methods/.

  15. The Ability of Different Imputation Methods to Preserve the Significant Genes and Pathways in Cancer.

    Science.gov (United States)

    Aghdam, Rosa; Baghfalaki, Taban; Khosravi, Pegah; Saberi Ansari, Elnaz

    2017-12-01

    Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving significant genes and pathways. In the first step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next, 10 well-known imputation methods were applied to the complete datasets. The significance analysis of microarrays (SAM) method was applied to detect the significant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving significant genes. To determine the impact of different imputation methods on the identification of important genes, the chi-squared test was used to compare the proportions of overlaps between significant genes detected from original data and those detected from the imputed datasets. Additionally, the significant genes are tested for their enrichment in important pathways, using the ConsensusPathDB. Our results showed that almost all the significant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no significant difference in the performance of various imputation methods tested. The source code and selected datasets are available on http://profiles.bs.ipm.ir/softwares/imputation_methods/. Copyright © 2017. Production and hosting by Elsevier B.V.

  16. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    NARCIS (Netherlands)

    I. Tachmazidou (Ioanna); Süveges, D. (Dániel); J. Min (Josine); G.R.S. Ritchie (Graham R.S.); Steinberg, J. (Julia); K. Walter (Klaudia); V. Iotchkova (Valentina); J.A. Schwartzentruber (Jeremy); J. Huang (Jian); Y. Memari (Yasin); McCarthy, S. (Shane); Crawford, A.A. (Andrew A.); C. Bombieri (Cristina); M. Cocca (Massimiliano); A.-E. Farmaki (Aliki-Eleni); T.R. Gaunt (Tom); P. Jousilahti (Pekka); M.N. Kooijman (Marjolein ); Lehne, B. (Benjamin); G. Malerba (Giovanni); S. Männistö (Satu); A. Matchan (Angela); M.C. Medina-Gomez (Carolina); S. Metrustry (Sarah); A. Nag (Abhishek); I. Ntalla (Ioanna); L. Paternoster (Lavinia); N.W. Rayner (Nigel William); C. Sala (Cinzia); W.R. Scott (William R.); H.A. Shihab (Hashem A.); L. Southam (Lorraine); B. St Pourcain (Beate); M. Traglia (Michela); K. Trajanoska (Katerina); Zaza, G. (Gialuigi); W. Zhang (Weihua); M.S. Artigas; Bansal, N. (Narinder); M. Benn (Marianne); Chen, Z. (Zhongsheng); P. Danecek (Petr); Lin, W.-Y. (Wei-Yu); A. Locke (Adam); J. Luan (Jian'An); A.K. Manning (Alisa); Mulas, A. (Antonella); C. Sidore (Carlo); A. Tybjaerg-Hansen; A. Varbo (Anette); M. Zoledziewska (Magdalena); C. Finan (Chris); Hatzikotoulas, K. (Konstantinos); A.E. Hendricks (Audrey E.); J.P. Kemp (John); A. Moayyeri (Alireza); Panoutsopoulou, K. (Kalliope); Szpak, M. (Michal); S.G. Wilson (Scott); M. Boehnke (Michael); F. Cucca (Francesco); Di Angelantonio, E. (Emanuele); C. Langenberg (Claudia); C.M. Lindgren (Cecilia M.); McCarthy, M.I. (Mark I.); A.P. Morris (Andrew); B.G. Nordestgaard (Børge); R.A. Scott (Robert); M.D. Tobin (Martin); N.J. Wareham (Nick); P.R. Burton (Paul); J.C. Chambers (John); Smith, G.D. (George Davey); G.V. Dedoussis (George); J.F. Felix (Janine); O.H. Franco (Oscar); Gambaro, G. (Giovanni); P. Gasparini (Paolo); C.J. Hammond (Christopher J.); A. Hofman (Albert); V.W.V. Jaddoe (Vincent); M.E. Kleber (Marcus); J.S. Kooner (Jaspal S.); M. Perola (Markus); C.L. Relton (Caroline); S.M. Ring (Susan); F. Rivadeneira Ramirez (Fernando); V. Salomaa (Veikko); T.D. Spector (Timothy); O. Stegle (Oliver); D. Toniolo (Daniela); A.G. Uitterlinden (André); I.E. Barroso (Inês); C.M.T. Greenwood (Celia); Perry, J.R.B. (John R.B.); Walker, B.R. (Brian R.); A.S. Butterworth (Adam); Y. Xue (Yali); R. Durbin (Richard); K.S. Small (Kerrin); N. Soranzo (Nicole); N.J. Timpson (Nicholas); E. Zeggini (Eleftheria)

    2016-01-01

    textabstractDeep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the

  17. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    DEFF Research Database (Denmark)

    Tachmazidou, Ioanna; Süveges, Dániel; Min, Josine L

    2017-01-01

    Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader alleli...

  18. R package imputeTestbench to compare imputations methods for univariate time series

    OpenAIRE

    Bokde, Neeraj; Kulat, Kishore; Beck, Marcus W; Asencio-Cortés, Gualberto

    2016-01-01

    This paper describes the R package imputeTestbench that provides a testbench for comparing imputation methods for missing data in univariate time series. The imputeTestbench package can be used to simulate the amount and type of missing data in a complete dataset and compare filled data using different imputation methods. The user has the option to simulate missing data by removing observations completely at random or in blocks of different sizes. Several default imputation methods are includ...

  19. Mining user-generated geographic content : an interactive, crowdsourced approach to validation and supervision

    NARCIS (Netherlands)

    Ostermann, F.O.; Garcia Chapeton, Gustavo Adolfo; Zurita-Milla, R.; Kraak, M.J.; Bergt, A.; Sarjakoski, T.; van Lammeren, R.; Rip, F.

    2017-01-01

    This paper describes a pilot study that implements a novel approach to validate data mining tasks by using the crowd to train a classifier. This hybrid approach to processing successfully addresses challenges faced during human curation or machine processing of user-generated geographic content

  20. Evaluation and application of summary statistic imputation to discover new height-associated loci.

    Science.gov (United States)

    Rüeger, Sina; McDaid, Aaron; Kutalik, Zoltán

    2018-05-01

    As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian

  1. Geographic and temporal validity of prediction models: Different approaches were useful to examine model performance

    NARCIS (Netherlands)

    P.C. Austin (Peter); D. van Klaveren (David); Y. Vergouwe (Yvonne); D. Nieboer (Daan); D.S. Lee (Douglas); E.W. Steyerberg (Ewout)

    2016-01-01

    textabstractObjective: Validation of clinical prediction models traditionally refers to the assessment of model performance in new patients. We studied different approaches to geographic and temporal validation in the setting of multicenter data from two time periods. Study Design and Setting: We

  2. 3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data.

    Science.gov (United States)

    Luo, Yuan; Szolovits, Peter; Dighe, Anand S; Baron, Jason M

    2018-06-01

    A key challenge in clinical data mining is that most clinical datasets contain missing data. Since many commonly used machine learning algorithms require complete datasets (no missing data), clinical analytic approaches often entail an imputation procedure to "fill in" missing data. However, although most clinical datasets contain a temporal component, most commonly used imputation methods do not adequately accommodate longitudinal time-based data. We sought to develop a new imputation algorithm, 3-dimensional multiple imputation with chained equations (3D-MICE), that can perform accurate imputation of missing clinical time series data. We extracted clinical laboratory test results for 13 commonly measured analytes (clinical laboratory tests). We imputed missing test results for the 13 analytes using 3 imputation methods: multiple imputation with chained equations (MICE), Gaussian process (GP), and 3D-MICE. 3D-MICE utilizes both MICE and GP imputation to integrate cross-sectional and longitudinal information. To evaluate imputation method performance, we randomly masked selected test results and imputed these masked results alongside results missing from our original data. We compared predicted results to measured results for masked data points. 3D-MICE performed significantly better than MICE and GP-based imputation in a composite of all 13 analytes, predicting missing results with a normalized root-mean-square error of 0.342, compared to 0.373 for MICE alone and 0.358 for GP alone. 3D-MICE offers a novel and practical approach to imputing clinical laboratory time series data. 3D-MICE may provide an additional tool for use as a foundation in clinical predictive analytics and intelligent clinical decision support.

  3. Using a 'value-added' approach for contextual design of geographic information.

    Science.gov (United States)

    May, Andrew J

    2013-11-01

    The aim of this article is to demonstrate how a 'value-added' approach can be used for user-centred design of geographic information. An information science perspective was used, with value being the difference in outcomes arising from alternative information sets. Sixteen drivers navigated a complex, unfamiliar urban route, using visual and verbal instructions representing the distance-to-turn and junction layout information presented by typical satellite navigation systems. Data measuring driving errors, navigation errors and driver confidence were collected throughout the trial. The results show how driver performance varied considerably according to the geographic context at specific locations, and that there are specific opportunities to add value with enhanced geographical information. The conclusions are that a value-added approach facilitates a more explicit focus on 'desired' (and feasible) levels of end user performance with different information sets, and is a potentially effective approach to user-centred design of geographic information. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.

  4. Imputing amino acid polymorphisms in human leukocyte antigens.

    Directory of Open Access Journals (Sweden)

    Xiaoming Jia

    Full Text Available DNA sequence variation within human leukocyte antigen (HLA genes mediate susceptibility to a wide range of human diseases. The complex genetic structure of the major histocompatibility complex (MHC makes it difficult, however, to collect genotyping data in large cohorts. Long-range linkage disequilibrium between HLA loci and SNP markers across the major histocompatibility complex (MHC region offers an alternative approach through imputation to interrogate HLA variation in existing GWAS data sets. Here we describe a computational strategy, SNP2HLA, to impute classical alleles and amino acid polymorphisms at class I (HLA-A, -B, -C and class II (-DPA1, -DPB1, -DQA1, -DQB1, and -DRB1 loci. To characterize performance of SNP2HLA, we constructed two European ancestry reference panels, one based on data collected in HapMap-CEPH pedigrees (90 individuals and another based on data collected by the Type 1 Diabetes Genetics Consortium (T1DGC, 5,225 individuals. We imputed HLA alleles in an independent data set from the British 1958 Birth Cohort (N = 918 with gold standard four-digit HLA types and SNPs genotyped using the Affymetrix GeneChip 500 K and Illumina Immunochip microarrays. We demonstrate that the sample size of the reference panel, rather than SNP density of the genotyping platform, is critical to achieve high imputation accuracy. Using the larger T1DGC reference panel, the average accuracy at four-digit resolution is 94.7% using the low-density Affymetrix GeneChip 500 K, and 96.7% using the high-density Illumina Immunochip. For amino acid polymorphisms within HLA genes, we achieve 98.6% and 99.3% accuracy using the Affymetrix GeneChip 500 K and Illumina Immunochip, respectively. Finally, we demonstrate how imputation and association testing at amino acid resolution can facilitate fine-mapping of primary MHC association signals, giving a specific example from type 1 diabetes.

  5. Towards a more efficient representation of imputation operators in TPOT

    OpenAIRE

    Garciarena, Unai; Mendiburu, Alexander; Santana, Roberto

    2018-01-01

    Automated Machine Learning encompasses a set of meta-algorithms intended to design and apply machine learning techniques (e.g., model selection, hyperparameter tuning, model assessment, etc.). TPOT, a software for optimizing machine learning pipelines based on genetic programming (GP), is a novel example of this kind of applications. Recently we have proposed a way to introduce imputation methods as part of TPOT. While our approach was able to deal with problems with missing data, it can prod...

  6. Hospital distribution in a metropolitan city: assessment by a geographical information system grid modelling approach

    Directory of Open Access Journals (Sweden)

    Kwang-Soo Lee

    2014-05-01

    Full Text Available Grid models were used to assess urban hospital distribution in Seoul, the capital of South Korea. A geographical information system (GIS based analytical model was developed and applied to assess the situation in a metropolitan area with a population exceeding 10 million. Secondary data for this analysis were obtained from multiple sources: the Korean Statistical Information Service, the Korean Hospital Association and the Statistical Geographical Information System. A grid of cells measuring 1 × 1 km was superimposed on the city map and a set of variables related to population, economy, mobility and housing were identified and measured for each cell. Socio-demographic variables were included to reflect the characteristics of each area. Analytical models were then developed using GIS software with the number of hospitals as the dependent variable. Applying multiple linear regression and geographically weighted regression models, three factors (highway and major arterial road areas; number of subway entrances; and row house areas were statistically significant in explaining the variance of hospital distribution for each cell. The overall results show that GIS is a useful tool for analysing and understanding location strategies. This approach appears a useful source of information for decision-makers concerned with the distribution of hospitals and other health care centres in a city.

  7. Towards a semantics-based approach in the development of geographic portals

    Science.gov (United States)

    Athanasis, Nikolaos; Kalabokidis, Kostas; Vaitis, Michail; Soulakellis, Nikolaos

    2009-02-01

    As the demand for geospatial data increases, the lack of efficient ways to find suitable information becomes critical. In this paper, a new methodology for knowledge discovery in geographic portals is presented. Based on the Semantic Web, our approach exploits the Resource Description Framework (RDF) in order to describe the geoportal's information with ontology-based metadata. When users traverse from page to page in the portal, they take advantage of the metadata infrastructure to navigate easily through data of interest. New metadata descriptions are published in the geoportal according to the RDF schemas.

  8. An Imputation Model for Dropouts in Unemployment Data

    Directory of Open Access Journals (Sweden)

    Nilsson Petra

    2016-09-01

    Full Text Available Incomplete unemployment data is a fundamental problem when evaluating labour market policies in several countries. Many unemployment spells end for unknown reasons; in the Swedish Public Employment Service’s register as many as 20 percent. This leads to an ambiguity regarding destination states (employment, unemployment, retired, etc.. According to complete combined administrative data, the employment rate among dropouts was close to 50 for the years 1992 to 2006, but from 2007 the employment rate has dropped to 40 or less. This article explores an imputation approach. We investigate imputation models estimated both on survey data from 2005/2006 and on complete combined administrative data from 2005/2006 and 2011/2012. The models are evaluated in terms of their ability to make correct predictions. The models have relatively high predictive power.

  9. Towards the observation of Territorial Energy Systems - a geographical design approach for energy territorialisation

    International Nuclear Information System (INIS)

    Flety, Yann

    2014-01-01

    Having an effect on a system calls for a thorough knowledge of it. The main goal of this research is to provide a general framework for the interpretation of geographic information, as well as a methodological framework to understand the interrelations between territory and energy in the context of a territorial observatory. A literature review of energy planning on the one hand and spatial planning on the other reveals similar developments in the two fields, in particular in terms of decentralisation and environmental concerns. The change of geographical scale chosen for the analysis brings new possibilities for public intervention. In this context, therefore, local authorities have a key role to play in implementing energy policy goals in their planning practices. They need analysis and prospective studies, as well as basic knowledge to carry out territorial energy planning. Indeed, the socio-spatial functions (living, travelling, working, etc.) are themselves at the root of spatial layout, urban forms and settlement structures. Those functions cannot be disassociated from questions of land use and energy. So, to understand energy which is vital, ubiquitous, and responsible for the organisation of territory, a systemic approach is proposed: the Territorial Energy System. It illustrates the importance of the interactions between a territory and its energy system, and more precisely, the interdependence between energy processes and territorial ones. We propose a design approach in the context of an observatory, and more precisely conceptual models, to analyse the territory-energy interrelations, especially with a focus on semantic dimensions. This approach combines three elements: a meta-model, a light and pre-consensus domain ontology, and individual conceptual data models for each indicator. An original indicator is then used for a first ontology population: the territorial energy label. Characterising the interrelations between territory and energy is non

  10. Assessment of imputation methods using varying ecological information to fill the gaps in a tree functional trait database

    Science.gov (United States)

    Poyatos, Rafael; Sus, Oliver; Vilà-Cabrera, Albert; Vayreda, Jordi; Badiella, Llorenç; Mencuccini, Maurizio; Martínez-Vilalta, Jordi

    2016-04-01

    Plant functional traits are increasingly being used in ecosystem ecology thanks to the growing availability of large ecological databases. However, these databases usually contain a large fraction of missing data because measuring plant functional traits systematically is labour-intensive and because most databases are compilations of datasets with different sampling designs. As a result, within a given database, there is an inevitable variability in the number of traits available for each data entry and/or the species coverage in a given geographical area. The presence of missing data may severely bias trait-based analyses, such as the quantification of trait covariation or trait-environment relationships and may hamper efforts towards trait-based modelling of ecosystem biogeochemical cycles. Several data imputation (i.e. gap-filling) methods have been recently tested on compiled functional trait databases, but the performance of imputation methods applied to a functional trait database with a regular spatial sampling has not been thoroughly studied. Here, we assess the effects of data imputation on five tree functional traits (leaf biomass to sapwood area ratio, foliar nitrogen, maximum height, specific leaf area and wood density) in the Ecological and Forest Inventory of Catalonia, an extensive spatial database (covering 31900 km2). We tested the performance of species mean imputation, single imputation by the k-nearest neighbors algorithm (kNN) and a multiple imputation method, Multivariate Imputation with Chained Equations (MICE) at different levels of missing data (10%, 30%, 50%, and 80%). We also assessed the changes in imputation performance when additional predictors (species identity, climate, forest structure, spatial structure) were added in kNN and MICE imputations. We evaluated the imputed datasets using a battery of indexes describing departure from the complete dataset in trait distribution, in the mean prediction error, in the correlation matrix

  11. Missing value imputation for microarray gene expression data using histone acetylation information

    Directory of Open Access Journals (Sweden)

    Feng Jihua

    2008-05-01

    Full Text Available Abstract Background It is an important pre-processing step to accurately estimate missing values in microarray data, because complete datasets are required in numerous expression profile analysis in bioinformatics. Although several methods have been suggested, their performances are not satisfactory for datasets with high missing percentages. Results The paper explores the feasibility of doing missing value imputation with the help of gene regulatory mechanism. An imputation framework called histone acetylation information aided imputation method (HAIimpute method is presented. It incorporates the histone acetylation information into the conventional KNN(k-nearest neighbor and LLS(local least square imputation algorithms for final prediction of the missing values. The experimental results indicated that the use of acetylation information can provide significant improvements in microarray imputation accuracy. The HAIimpute methods consistently improve the widely used methods such as KNN and LLS in terms of normalized root mean squared error (NRMSE. Meanwhile, the genes imputed by HAIimpute methods are more correlated with the original complete genes in terms of Pearson correlation coefficients. Furthermore, the proposed methods also outperform GOimpute, which is one of the existing related methods that use the functional similarity as the external information. Conclusion We demonstrated that the using of histone acetylation information could greatly improve the performance of the imputation especially at high missing percentages. This idea can be generalized to various imputation methods to facilitate the performance. Moreover, with more knowledge accumulated on gene regulatory mechanism in addition to histone acetylation, the performance of our approach can be further improved and verified.

  12. Imputation of missing data in time series for air pollutants

    Science.gov (United States)

    Junger, W. L.; Ponce de Leon, A.

    2015-02-01

    Missing data are major concerns in epidemiological studies of the health effects of environmental air pollutants. This article presents an imputation-based method that is suitable for multivariate time series data, which uses the EM algorithm under the assumption of normal distribution. Different approaches are considered for filtering the temporal component. A simulation study was performed to assess validity and performance of proposed method in comparison with some frequently used methods. Simulations showed that when the amount of missing data was as low as 5%, the complete data analysis yielded satisfactory results regardless of the generating mechanism of the missing data, whereas the validity began to degenerate when the proportion of missing values exceeded 10%. The proposed imputation method exhibited good accuracy and precision in different settings with respect to the patterns of missing observations. Most of the imputations obtained valid results, even under missing not at random. The methods proposed in this study are implemented as a package called mtsdi for the statistical software system R.

  13. When homogeneity meets heterogeneity: the geographically weighted regression with spatial lag approach to prenatal care utilization

    Science.gov (United States)

    Shoff, Carla; Chen, Vivian Yi-Ju; Yang, Tse-Chuan

    2014-01-01

    Using geographically weighted regression (GWR), a recent study by Shoff and colleagues (2012) investigated the place-specific risk factors for prenatal care utilization in the US and found that most of the relationships between late or not prenatal care and its determinants are spatially heterogeneous. However, the GWR approach may be subject to the confounding effect of spatial homogeneity. The goal of this study is to address this concern by including both spatial homogeneity and heterogeneity into the analysis. Specifically, we employ an analytic framework where a spatially lagged (SL) effect of the dependent variable is incorporated into the GWR model, which is called GWR-SL. Using this innovative framework, we found evidence to argue that spatial homogeneity is neglected in the study by Shoff et al. (2012) and the results are changed after considering the spatially lagged effect of prenatal care utilization. The GWR-SL approach allows us to gain a place-specific understanding of prenatal care utilization in US counties. In addition, we compared the GWR-SL results with the results of conventional approaches (i.e., OLS and spatial lag models) and found that GWR-SL is the preferred modeling approach. The new findings help us to better estimate how the predictors are associated with prenatal care utilization across space, and determine whether and how the level of prenatal care utilization in neighboring counties matters. PMID:24893033

  14. Teaching Introductory GIS Programming to Geographers Using an Open Source Python Approach

    Science.gov (United States)

    Etherington, Thomas R.

    2016-01-01

    Computer programming is not commonly taught to geographers as a part of geographic information system (GIS) courses, but the advent of NeoGeography, big data and open GIS means that programming skills are becoming more important. To encourage the teaching of programming to geographers, this paper outlines a course based around a series of…

  15. A geographic approach to modelling human exposure to traffic air pollution using GIS. Separate appendix report

    Energy Technology Data Exchange (ETDEWEB)

    Solvang Jensen, S.

    1998-10-01

    A new exposure model has been developed that is based on a physical, single media (air) and single source (traffic) micro environmental approach that estimates traffic related exposures geographically with the postal address as exposure indicator. The micro environments: residence, workplace and street (road user exposure) may be considered. The model estimates outdoor levels for selected ambient air pollutants (benzene, CO, NO{sub 2} and O{sub 3}). The influence of outdoor air pollution on indoor levels can be estimated using average (I/O-ratios. The model has a very high spatial resolution (the address), a high temporal resolution (one hour) and may be used to predict past, present and future exposures. The model may be used for impact assessment of control measures provided that the changes to the model inputs are obtained. The exposure model takes advantage of a standard Geographic Information System (GIS) (ArcView and Avenue) for generation of inputs, for visualisation of input and output, and uses available digital maps, national administrative registers and a local traffic database, and the Danish Operational Street Pollution Model (OSPM). The exposure model presents a new approach to exposure determination by integration of digital maps, administrative registers, a street pollution model and GIS. New methods have been developed to generate the required input parameters for the OSPM model: to geocode buildings using cadastral maps and address points, to automatically generate street configuration data based on digital maps, the BBR and GIS; to predict the temporal variation in traffic and related parameters; and to provide hourly background levels for the OSPM model. (EG)

  16. A geographic approach to modelling human exposure to traffic air pollution using GIS

    Energy Technology Data Exchange (ETDEWEB)

    Solvang Jensen, S.

    1998-10-01

    A new exposure model has been developed that is based on a physical, single media (air) and single source (traffic) micro environmental approach that estimates traffic related exposures geographically with the postal address as exposure indicator. The micro environments: residence, workplace and street (road user exposure) may be considered. The model estimates outdoor levels for selected ambient air pollutants (benzene, CO, NO{sub 2} and O{sub 3}). The influence of outdoor air pollution on indoor levels can be estimated using average (I/O-ratios. The model has a very high spatial resolution (the address), a high temporal resolution (one hour) and may be used to predict past, present and future exposures. The model may be used for impact assessment of control measures provided that the changes to the model inputs are obtained. The exposure model takes advantage of a standard Geographic Information System (GIS) (ArcView and Avenue) for generation of inputs, for visualisation of input and output, and uses available digital maps, national administrative registers and a local traffic database, and the Danish Operational Street Pollution Model (OSPM). The exposure model presents a new approach to exposure determination by integration of digital maps, administrative registers, a street pollution model and GIS. New methods have been developed to generate the required input parameters for the OSPM model: to geocode buildings using cadastral maps and address points, to automatically generate street configuration data based on digital maps, the BBR and GIS; to predict the temporal variation in traffic and related parameters; and to provide hourly background levels for the OSPM model. (EG) 109 refs.

  17. Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.

    Science.gov (United States)

    Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A

    2016-01-01

    Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.

  18. Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information

    Science.gov (United States)

    Poyatos, Rafael; Sus, Oliver; Badiella, Llorenç; Mencuccini, Maurizio; Martínez-Vilalta, Jordi

    2018-05-01

    The ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. At the same time, they offer specific challenges in terms of data imputation. Here we compare statistical imputation approaches, using varying levels of environmental information, for five plant traits (leaf biomass to sapwood area ratio, leaf nitrogen content, maximum tree height, leaf mass per area and wood density) in a spatially explicit plant trait dataset of temperate and Mediterranean tree species (Ecological and Forest Inventory of Catalonia, IEFC, dataset for Catalonia, north-east Iberian Peninsula, 31 900 km2). We simulated gaps at different missingness levels (10-80 %) in a complete trait matrix, and we used overall trait means, species means, k nearest neighbours (kNN), ordinary and regression kriging, and multivariate imputation using chained equations (MICE) to impute missing trait values. We assessed these methods in terms of their accuracy and of their ability to preserve trait distributions, multi-trait correlation structure and bivariate trait relationships. The relatively good performance of mean and species mean imputations in terms of accuracy masked a poor representation of trait distributions and multivariate trait structure. Species identity improved MICE imputations for all traits, whereas forest structure and topography improved imputations for some traits. No method performed best consistently for the five studied traits, but, considering all traits and performance metrics, MICE informed by relevant ecological variables gave the best results. However, at higher missingness (> 30 %), species mean imputations and regression kriging tended to outperform MICE for some traits. MICE informed by relevant ecological variables allowed us to fill the gaps in

  19. Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information

    Directory of Open Access Journals (Sweden)

    R. Poyatos

    2018-05-01

    Full Text Available The ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. At the same time, they offer specific challenges in terms of data imputation. Here we compare statistical imputation approaches, using varying levels of environmental information, for five plant traits (leaf biomass to sapwood area ratio, leaf nitrogen content, maximum tree height, leaf mass per area and wood density in a spatially explicit plant trait dataset of temperate and Mediterranean tree species (Ecological and Forest Inventory of Catalonia, IEFC, dataset for Catalonia, north-east Iberian Peninsula, 31 900 km2. We simulated gaps at different missingness levels (10–80 % in a complete trait matrix, and we used overall trait means, species means, k nearest neighbours (kNN, ordinary and regression kriging, and multivariate imputation using chained equations (MICE to impute missing trait values. We assessed these methods in terms of their accuracy and of their ability to preserve trait distributions, multi-trait correlation structure and bivariate trait relationships. The relatively good performance of mean and species mean imputations in terms of accuracy masked a poor representation of trait distributions and multivariate trait structure. Species identity improved MICE imputations for all traits, whereas forest structure and topography improved imputations for some traits. No method performed best consistently for the five studied traits, but, considering all traits and performance metrics, MICE informed by relevant ecological variables gave the best results. However, at higher missingness (> 30 %, species mean imputations and regression kriging tended to outperform MICE for some traits. MICE informed by relevant ecological variables

  20. Evaluating fuel poverty policy in Northern Ireland using a geographic approach

    International Nuclear Information System (INIS)

    Walker, Ryan; Liddell, Christine; McKenzie, Paul; Morris, Chris

    2013-01-01

    Recent audits have shown that anti-fuel poverty policies in the UK depend on loosely defined targeting and cannot accurately identify fuel poor households. New methods of targeting are necessary to improve fuel poverty policy. This paper uses Geographic Information System (GIS) techniques to evaluate the targeting of a home energy efficiency scheme small area level in Northern Ireland, based on the level of need. The concept of need is modelled using an area-based, multi-dimensional fuel poverty risk index. The characteristics and spatial distribution of household retrofits are explored. Policy activity and expenditure are compared with the level of need in an area. Results indicate that policy activity is only weakly associated with the level of need in an area, although policy appears to be well targeted in a few areas. Contrary to existing evidence, rural areas appear to be well served by policy, receiving above average numbers of retrofits and expenditure. There are typically two types of retrofit (major and minor). Most retrofits are minor and may not reduce fuel poverty. These results evidence the limitations of the current targeting system and suggest that there may be scope for improved policy implemented via a more proactive, area-based approach. - Highlights: • We analyse the spatial distribution of home energy efficiency installations. • Significant geographic disparity exists in the rate and cost of home retrofits. • Targeting is only weakly associated with the level of need. • Many interventions are small-scale and are unlikely to reduce fuel poverty. • Results suggest scope for more proactive policy delivered from area-based platforms

  1. Environmental and Geographical Factors Structure Soil Microbial Diversity in New Caledonian Ultramafic Substrates: A Metagenomic Approach.

    Directory of Open Access Journals (Sweden)

    Véronique Gourmelon

    Full Text Available Soil microorganisms play key roles in ecosystem functioning and are known to be influenced by biotic and abiotic factors, such as plant cover or edaphic parameters. New Caledonia, a biodiversity hotspot located in the southwest Pacific, is one-third covered by ultramafic substrates. These types of soils are notably characterised by low nutrient content and high heavy metal concentrations. Ultramafic outcrops harbour diverse vegetation types and remarkable plant diversity. In this study, we aimed to assess soil bacterial and fungal diversity in New Caledonian ultramafic substrates and to determine whether floristic composition, edaphic parameters and geographical factors affect this microbial diversity. Therefore, four plant formation types at two distinct sites were studied. These formations represent different stages in a potential chronosequence. Soil cores, according to a given sampling procedure, were collected to assess microbial diversity using a metagenomic approach, and to characterise the physico-chemical parameters. A botanical inventory was also performed. Our results indicated that microbial richness, composition and abundance were linked to the plant cover type and the dominant plant species. Furthermore, a large proportion of Ascomycota phylum (fungi, mostly in non-rainforest formations, and Planctomycetes phylum (bacteria in all formations were observed. Interestingly, such patterns could be indicators of past disturbances that occurred on different time scales. Furthermore, the bacteria and fungi were influenced by diverse edaphic parameters as well as by the interplay between these two soil communities. Another striking finding was the existence of a site effect. Differences in microbial communities between geographical locations may be explained by dispersal limitation in the context of the biogeographical island theory. In conclusion, each plant formation at each site possesses is own microbial community resulting from

  2. VIGAN: Missing View Imputation with Generative Adversarial Networks.

    Science.gov (United States)

    Shang, Chao; Palmer, Aaron; Sun, Jiangwen; Chen, Ko-Shin; Lu, Jin; Bi, Jinbo

    2017-01-01

    In an era when big data are becoming the norm, there is less concern with the quantity but more with the quality and completeness of the data. In many disciplines, data are collected from heterogeneous sources, resulting in multi-view or multi-modal datasets. The missing data problem has been challenging to address in multi-view data analysis. Especially, when certain samples miss an entire view of data, it creates the missing view problem. Classic multiple imputations or matrix completion methods are hardly effective here when no information can be based on in the specific view to impute data for such samples. The commonly-used simple method of removing samples with a missing view can dramatically reduce sample size, thus diminishing the statistical power of a subsequent analysis. In this paper, we propose a novel approach for view imputation via generative adversarial networks (GANs), which we name by VIGAN. This approach first treats each view as a separate domain and identifies domain-to-domain mappings via a GAN using randomly-sampled data from each view, and then employs a multi-modal denoising autoencoder (DAE) to reconstruct the missing view from the GAN outputs based on paired data across the views. Then, by optimizing the GAN and DAE jointly, our model enables the knowledge integration for domain mappings and view correspondences to effectively recover the missing view. Empirical results on benchmark datasets validate the VIGAN approach by comparing against the state of the art. The evaluation of VIGAN in a genetic study of substance use disorders further proves the effectiveness and usability of this approach in life science.

  3. Sources of endocrine-disrupting compounds in North Carolina waterways: a geographic information systems approach

    Science.gov (United States)

    Sackett, Dana K.; Pow, Crystal Lee; Rubino, Matthew J.; Aday, D.D.; Cope, W. Gregory; Kullman, Seth W.; Rice, J.A.; Kwak, Thomas J.; Law, L.M.

    2015-01-01

    The presence of endocrine-disrupting compounds (EDCs), particularly estrogenic compounds, in the environment has drawn public attention across the globe, yet a clear understanding of the extent and distribution of estrogenic EDCs in surface waters and their relationship to potential sources is lacking. The objective of the present study was to identify and examine the potential input of estrogenic EDC sources in North Carolina water bodies using a geographic information system (GIS) mapping and analysis approach. Existing data from state and federal agencies were used to create point and nonpoint source maps depicting the cumulative contribution of potential sources of estrogenic EDCs to North Carolina surface waters. Water was collected from 33 sites (12 associated with potential point sources, 12 associated with potential nonpoint sources, and 9 reference), to validate the predictive results of the GIS analysis. Estrogenicity (measured as 17β-estradiol equivalence) ranged from 0.06 ng/L to 56.9 ng/L. However, the majority of sites (88%) had water 17β-estradiol concentrations below 1 ng/L. Sites associated with point and nonpoint sources had significantly higher 17β-estradiol levels than reference sites. The results suggested that water 17β-estradiol was reflective of GIS predictions, confirming the relevance of landscape-level influences on water quality and validating the GIS approach to characterize such relationships.

  4. Cartography, new technologies and geographic education: theoretical approaches to research the field

    Science.gov (United States)

    Seneme do Canto, Tânia

    2018-05-01

    In order to understand the roles that digital mapping can play in cartographic and geographic education, this paper discusses the theoretical and methodological approach used in a research that is undertaking in the education of geography teachers. To develop the study, we found in the works of Lankshear and Knobel (2013) a notion of new literacies that allows us looking at the practices within digital mapping in a sociocultural perspective. From them, we conclude that in order to understand the changes that digital cartography is able to foment in geography teaching, it is necessary to go beyond the substitution of means in the classroom and being able to explore what makes the new mapping practices different from others already consolidated in geography teaching. Therefore, we comment on some features of new forms of cartographic literacy that are in full development with digital technologies, but which are not determined solely by their use. The ideas of Kitchin and Dodge (2007) and Del Casino Junior and Hanna (2006) are also an important reference for the research. Methodologically, this approach helps us to understand that in the seek to comprehend maps and their meanings, irrespective of the medium used, we are dealing with a process of literacy that is very particular and emergent because it involves not only the characteristics of the map artifact and of the individual that produces or consumes it, but depends mainly on a diversity of interconnections that are being built between them (map and individual) and the world.

  5. Multi-generational imputation of single nucleotide polymorphism marker genotypes and accuracy of genomic selection.

    Science.gov (United States)

    Toghiani, S; Aggrey, S E; Rekaya, R

    2016-07-01

    Availability of high-density single nucleotide polymorphism (SNP) genotyping platforms provided unprecedented opportunities to enhance breeding programmes in livestock, poultry and plant species, and to better understand the genetic basis of complex traits. Using this genomic information, genomic breeding values (GEBVs), which are more accurate than conventional breeding values. The superiority of genomic selection is possible only when high-density SNP panels are used to track genes and QTLs affecting the trait. Unfortunately, even with the continuous decrease in genotyping costs, only a small fraction of the population has been genotyped with these high-density panels. It is often the case that a larger portion of the population is genotyped with low-density and low-cost SNP panels and then imputed to a higher density. Accuracy of SNP genotype imputation tends to be high when minimum requirements are met. Nevertheless, a certain rate of genotype imputation errors is unavoidable. Thus, it is reasonable to assume that the accuracy of GEBVs will be affected by imputation errors; especially, their cumulative effects over time. To evaluate the impact of multi-generational selection on the accuracy of SNP genotypes imputation and the reliability of resulting GEBVs, a simulation was carried out under varying updating of the reference population, distance between the reference and testing sets, and the approach used for the estimation of GEBVs. Using fixed reference populations, imputation accuracy decayed by about 0.5% per generation. In fact, after 25 generations, the accuracy was only 7% lower than the first generation. When the reference population was updated by either 1% or 5% of the top animals in the previous generations, decay of imputation accuracy was substantially reduced. These results indicate that low-density panels are useful, especially when the generational interval between reference and testing population is small. As the generational interval

  6. Cost reduction for web-based data imputation

    KAUST Repository

    Li, Zhixu; Shang, Shuo; Xie, Qing; Zhang, Xiangliang

    2014-01-01

    Web-based Data Imputation enables the completion of incomplete data sets by retrieving absent field values from the Web. In particular, complete fields can be used as keywords in imputation queries for absent fields. However, due to the ambiguity

  7. Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies

    Directory of Open Access Journals (Sweden)

    McElwee Joshua

    2009-06-01

    -eQTL discoveries detected by various methods can be interpreted as their relative statistical power in the GWAS. In this study, we find that imputation offer modest additional power (by 4% on top of either Ilmn317K or Ilmn650Y, much less than the power gain from Ilmn317K to Ilmn650Y (13%. Conclusion Current algorithms can accurately impute genotypes for untyped markers, which enables researchers to pool data between studies conducted using different SNP sets. While genotyping itself results in a small error rate (e.g. 0.5%, imputing genotypes is surprisingly accurate. We found that dense marker sets (e.g. Ilmn650Y outperform sparser ones (e.g. Ilmn317K in terms of imputation yield and accuracy. We also noticed it was harder to impute genotypes for African American samples, partially due to population admixture, although using a pooled reference boosts performance. Interestingly, GWAS carried out using imputed genotypes only slightly increased power on top of assayed SNPs. The reason is likely due to adding more markers via imputation only results in modest gain in genetic coverage, but worsens the multiple testing penalties. Furthermore, cis-eQTL mapping using dense SNP set derived from imputation achieves great resolution, and locate associate peak closer to causal variants than conventional approach.

  8. Mapping wildland fuels and forest structure for land management: a comparison of nearest neighbor imputation and other methods

    Science.gov (United States)

    Kenneth B. Pierce; Janet L. Ohmann; Michael C. Wimberly; Matthew J. Gregory; Jeremy S. Fried

    2009-01-01

    Land managers need consistent information about the geographic distribution of wildland fuels and forest structure over large areas to evaluate fire risk and plan fuel treatments. We compared spatial predictions for 12 fuel and forest structure variables across three regions in the western United States using gradient nearest neighbor (GNN) imputation, linear models (...

  9. What to expect from a greater geographic dispersion of wind farms?-A risk portfolio approach

    International Nuclear Information System (INIS)

    Drake, Ben; Hubacek, Klaus

    2007-01-01

    The UK, like many other industrialised countries, is committed to reducing greenhouse gas emissions under the Kyoto Protocol. To achieve this goal the UK is increasingly turning towards wind power as a source of emissions free energy. However, the variable nature of wind power generation makes it an unreliable energy source, especially at higher rates of penetration. Likewise the aim of this paper is to measure the potential reduction in wind power variability that could be realised as a result of geographically dispersing the location of wind farm sites. To achieve this aim wind speed data will be used to simulate two scenarios. The first scenario involves locating a total of 2.7 gigawatts (GW) of wind power capacity in a single location within the UK while the second scenario consists of sharing the same amount of capacity amongst four different locations. A risk portfolio approach as used in financial appraisals is then applied in the second scenario to decide upon the allocation of wind power capacity, amongst the four wind farm sites, that succeeds in minimising overall variability for a given level of wind power generation. The findings of this paper indicate that reductions in the order of 36% in wind power variability are possible as a result of distributing wind power capacity

  10. An analytical approach to Sr isotope ratio determination in Lambrusco wines for geographical traceability purposes.

    Science.gov (United States)

    Durante, Caterina; Baschieri, Carlo; Bertacchini, Lucia; Bertelli, Davide; Cocchi, Marina; Marchetti, Andrea; Manzini, Daniela; Papotti, Giulia; Sighinolfi, Simona

    2015-04-15

    Geographical origin and authenticity of food are topics of interest for both consumers and producers. Among the different indicators used for traceability studies, (87)Sr/(86)Sr isotopic ratio has provided excellent results. In this study, two analytical approaches for wine sample pre-treatment, microwave and low temperature mineralisation, were investigated to develop accurate and precise analytical method for (87)Sr/(86)Sr determination. The two procedures led to comparable results (paired t-test, with t

  11. An approach for a complex assessment of the geo-ecological risk from natural disasters in a geographic region

    International Nuclear Information System (INIS)

    Zlateva, Plamena; Stoyanov, Krasimir

    2009-01-01

    The paper proposes an approach for a complex assessment of the geo-ecological risk of a certain geographic region on the basis of quantitative and qualitative datum about the potential natural disasters. A fuzzy logic model is designed. The type of the threats, consequences and interdependencies between infrastructure objects are taken into account. The geographic region is considered as a complex system of interconnected and mutually influencing elements. The expected damages are directly and/or indirectly connected with life quality deterioration. Keywords: Risk, Geo-ecological risk, Damages, Threats, Vulnerabilities, Natural disasters

  12. Development and implementation of an HIV/AIDS trials management system: a geographical information systems approach

    CSIR Research Space (South Africa)

    Busgeeth, K

    2008-01-01

    Full Text Available of randomised and clinically controlled trials of HIV/AIDS interventions can provide invaluable information to decision-making processes. Using the newly emerging geographical information systems (GIS) technology, researchers have developed a tool which assists...

  13. LinkImputeR: user-guided genotype calling and imputation for non-model organisms.

    Science.gov (United States)

    Money, Daniel; Migicovsky, Zoë; Gardner, Kyle; Myles, Sean

    2017-07-10

    Genomic studies such as genome-wide association and genomic selection require genome-wide genotype data. All existing technologies used to create these data result in missing genotypes, which are often then inferred using genotype imputation software. However, existing imputation methods most often make use only of genotypes that are successfully inferred after having passed a certain read depth threshold. Because of this, any read information for genotypes that did not pass the threshold, and were thus set to missing, is ignored. Most genomic studies also choose read depth thresholds and quality filters without investigating their effects on the size and quality of the resulting genotype data. Moreover, almost all genotype imputation methods require ordered markers and are therefore of limited utility in non-model organisms. Here we introduce LinkImputeR, a software program that exploits the read count information that is normally ignored, and makes use of all available DNA sequence information for the purposes of genotype calling and imputation. It is specifically designed for non-model organisms since it requires neither ordered markers nor a reference panel of genotypes. Using next-generation DNA sequence (NGS) data from apple, cannabis and grape, we quantify the effect of varying read count and missingness thresholds on the quantity and quality of genotypes generated from LinkImputeR. We demonstrate that LinkImputeR can increase the number of genotype calls by more than an order of magnitude, can improve genotyping accuracy by several percent and can thus improve the power of downstream analyses. Moreover, we show that the effects of quality and read depth filters can differ substantially between data sets and should therefore be investigated on a per-study basis. By exploiting DNA sequence data that is normally ignored during genotype calling and imputation, LinkImputeR can significantly improve both the quantity and quality of genotype data generated from

  14. Time Series Imputation via L1 Norm-Based Singular Spectrum Analysis

    Science.gov (United States)

    Kalantari, Mahdi; Yarmohammadi, Masoud; Hassani, Hossein; Silva, Emmanuel Sirimal

    Missing values in time series data is a well-known and important problem which many researchers have studied extensively in various fields. In this paper, a new nonparametric approach for missing value imputation in time series is proposed. The main novelty of this research is applying the L1 norm-based version of Singular Spectrum Analysis (SSA), namely L1-SSA which is robust against outliers. The performance of the new imputation method has been compared with many other established methods. The comparison is done by applying them to various real and simulated time series. The obtained results confirm that the SSA-based methods, especially L1-SSA can provide better imputation in comparison to other methods.

  15. Accessibility patterns and community integration among previously homeless adults: a Geographic Information Systems (GIS) approach.

    Science.gov (United States)

    Chan, Dara V; Gopal, Sucharita; Helfrich, Christine A

    2014-11-01

    Although a desired rehabilitation goal, research continues to document that community integration significantly lags behind housing stability success rates for people of a variety of ages who used to be homeless. While accessibility to resources is an environmental factor that may promote or impede integration activity, there has been little empirical investigation into the impact of proximity of community features on resource use and integration. Using a Geographic Information Systems (GIS) approach, the current study examines how accessibility or proximity to community features in Boston, United States related to the types of locations used and the size of an individual's "activity space," or spatial presence in the community. Significant findings include an inverse relationship between activity space size and proximity to the number and type of community features in one's immediate area. Specifically, larger activity spaces were associated with neighborhoods with less community features, and smaller activity spaces corresponded with greater availability of resources within one's immediate area. Activity space size also varied, however, based on proximity to different types of resources, namely transportation and health care. Greater community function, or the ability to navigate and use community resources, was associated with better accessibility and feeling part of the community. Finally, proximity to a greater number of individual identified preferred community features was associated with better social integration. The current study suggests the ongoing challenges of successful integration may vary not just based on accessibility to, but relative importance of, specific community features and affinity with one's surroundings. Community integration researchers and housing providers may need to attend to the meaning attached to resources, not just presence or use in the community. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. IMPROVEMENT EVALUATION ON CERAMIC ROOF EXTRACTION USING WORLDVIEW-2 IMAGERY AND GEOGRAPHIC DATA MINING APPROACH

    Directory of Open Access Journals (Sweden)

    V. S. Brum-Bastos

    2016-06-01

    Full Text Available Advances in geotechnologies and in remote sensing have improved analysis of urban environments. The new sensors are increasingly suited to urban studies, due to the enhancement in spatial, spectral and radiometric resolutions. Urban environments present high heterogeneity, which cannot be tackled using pixel–based approaches on high resolution images. Geographic Object–Based Image Analysis (GEOBIA has been consolidated as a methodology for urban land use and cover monitoring; however, classification of high resolution images is still troublesome. This study aims to assess the improvement on ceramic roof classification using WorldView-2 images due to the increase of 4 new bands besides the standard “Blue-Green-Red-Near Infrared” bands. Our methodology combines GEOBIA, C4.5 classification tree algorithm, Monte Carlo simulation and statistical tests for classification accuracy. Two samples groups were considered: 1 eight multispectral and panchromatic bands, and 2 four multispectral and panchromatic bands, representing previous high-resolution sensors. The C4.5 algorithm generates a decision tree that can be used for classification; smaller decision trees are closer to the semantic networks produced by experts on GEOBIA, while bigger trees, are not straightforward to implement manually, but are more accurate. The choice for a big or small tree relies on the user’s skills to implement it. This study aims to determine for what kind of user the addition of the 4 new bands might be beneficial: 1 the common user (smaller trees or 2 a more skilled user with coding and/or data mining abilities (bigger trees. In overall the classification was improved by the addition of the four new bands for both types of users.

  17. An Approach to Measuring Semantic Relatedness of Geographic Terminologies Using a Thesaurus and Lexical Database Sources

    Directory of Open Access Journals (Sweden)

    Zugang Chen

    2018-03-01

    Full Text Available In geographic information science, semantic relatedness is important for Geographic Information Retrieval (GIR, Linked Geospatial Data, geoparsing, and geo-semantics. But computing the semantic similarity/relatedness of geographic terminology is still an urgent issue to tackle. The thesaurus is a ubiquitous and sophisticated knowledge representation tool existing in various domains. In this article, we combined the generic lexical database (WordNet or HowNet with the Thesaurus for Geographic Science and proposed a thesaurus–lexical relatedness measure (TLRM to compute the semantic relatedness of geographic terminology. This measure quantified the relationship between terminologies, interlinked the discrete term trees by using the generic lexical database, and realized the semantic relatedness computation of any two terminologies in the thesaurus. The TLRM was evaluated on a new relatedness baseline, namely, the Geo-Terminology Relatedness Dataset (GTRD which was built by us, and the TLRM obtained a relatively high cognitive plausibility. Finally, we applied the TLRM on a geospatial data sharing portal to support data retrieval. The application results of the 30 most frequently used queries of the portal demonstrated that using TLRM could improve the recall of geospatial data retrieval in most situations and rank the retrieval results by the matching scores between the query of users and the geospatial dataset.

  18. Clustering with Missing Values: No Imputation Required

    Science.gov (United States)

    Wagstaff, Kiri

    2004-01-01

    Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.

  19. Traffic Speed Data Imputation Method Based on Tensor Completion

    Directory of Open Access Journals (Sweden)

    Bin Ran

    2015-01-01

    Full Text Available Traffic speed data plays a key role in Intelligent Transportation Systems (ITS; however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS. In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC, an efficient tensor completion method, is employed to estimate the missing traffic speed data. This proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. The proposed method is evaluated on Performance Measurement System (PeMS database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches.

  20. Traffic speed data imputation method based on tensor completion.

    Science.gov (United States)

    Ran, Bin; Tan, Huachun; Feng, Jianshuai; Liu, Ying; Wang, Wuhong

    2015-01-01

    Traffic speed data plays a key role in Intelligent Transportation Systems (ITS); however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS). In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC), an efficient tensor completion method, is employed to estimate the missing traffic speed data. This proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. The proposed method is evaluated on Performance Measurement System (PeMS) database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches.

  1. BRITS: Bidirectional Recurrent Imputation for Time Series

    OpenAIRE

    Cao, Wei; Wang, Dong; Li, Jian; Zhou, Hao; Li, Lei; Li, Yitan

    2018-01-01

    Time series are widely used as signals in many classification/regression tasks. It is ubiquitous that time series contains many missing values. Given multiple correlated time series data, how to fill in missing values and to predict their class labels? Existing imputation methods often impose strong assumptions of the underlying data generating process, such as linear dynamics in the state space. In this paper, we propose BRITS, a novel method based on recurrent neural networks for missing va...

  2. Imputation of variants from the 1000 Genomes Project modestly improves known associations and can identify low-frequency variant-phenotype associations undetected by HapMap based imputation.

    Science.gov (United States)

    Wood, Andrew R; Perry, John R B; Tanaka, Toshiko; Hernandez, Dena G; Zheng, Hou-Feng; Melzer, David; Gibbs, J Raphael; Nalls, Michael A; Weedon, Michael N; Spector, Tim D; Richards, J Brent; Bandinelli, Stefania; Ferrucci, Luigi; Singleton, Andrew B; Frayling, Timothy M

    2013-01-01

    Genome-wide association (GWA) studies have been limited by the reliance on common variants present on microarrays or imputable from the HapMap Project data. More recently, the completion of the 1000 Genomes Project has provided variant and haplotype information for several million variants derived from sequencing over 1,000 individuals. To help understand the extent to which more variants (including low frequency (1% ≤ MAF 1000 Genomes imputation, respectively, and 9 and 11 that reached a stricter, likely conservative, threshold of P1000 Genomes genotype data modestly improved the strength of known associations. Of 20 associations detected at P1000 Genomes imputed data and one was nominally more strongly associated in HapMap imputed data. We also detected an association between a low frequency variant and phenotype that was previously missed by HapMap based imputation approaches. An association between rs112635299 and alpha-1 globulin near the SERPINA gene represented the known association between rs28929474 (MAF = 0.007) and alpha1-antitrypsin that predisposes to emphysema (P = 2.5×10(-12)). Our data provide important proof of principle that 1000 Genomes imputation will detect novel, low frequency-large effect associations.

  3. Deciphering the adjustment between environment and life history in annuals: lessons from a geographically-explicit approach in Arabidopsis thaliana.

    Science.gov (United States)

    Manzano-Piedras, Esperanza; Marcer, Arnald; Alonso-Blanco, Carlos; Picó, F Xavier

    2014-01-01

    The role that different life-history traits may have in the process of adaptation caused by divergent selection can be assessed by using extensive collections of geographically-explicit populations. This is because adaptive phenotypic variation shifts gradually across space as a result of the geographic patterns of variation in environmental selective pressures. Hence, large-scale experiments are needed to identify relevant adaptive life-history traits as well as their relationships with putative selective agents. We conducted a field experiment with 279 geo-referenced accessions of the annual plant Arabidopsis thaliana collected across a native region of its distribution range, the Iberian Peninsula. We quantified variation in life-history traits throughout the entire life cycle. We built a geographic information system to generate an environmental data set encompassing climate, vegetation and soil data. We analysed the spatial autocorrelation patterns of environmental variables and life-history traits, as well as the relationship between environmental and phenotypic data. Almost all environmental variables were significantly spatially autocorrelated. By contrast, only two life-history traits, seed weight and flowering time, exhibited significant spatial autocorrelation. Flowering time, and to a lower extent seed weight, were the life-history traits with the highest significant correlation coefficients with environmental factors, in particular with annual mean temperature. In general, individual fitness was higher for accessions with more vigorous seed germination, higher recruitment and later flowering times. Variation in flowering time mediated by temperature appears to be the main life-history trait by which A. thaliana adjusts its life history to the varying Iberian environmental conditions. The use of extensive geographically-explicit data sets obtained from field experiments represents a powerful approach to unravel adaptive patterns of variation. In a

  4. Gaussian mixture clustering and imputation of microarray data.

    Science.gov (United States)

    Ouyang, Ming; Welsh, William J; Georgopoulos, Panos

    2004-04-12

    In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.

  5. Key Factors Affecting the Price of Airbnb Listings: A Geographically Weighted Approach

    OpenAIRE

    Zhihua Zhang; Rachel J. C. Chen; Lee D. Han; Lu Yang

    2017-01-01

    Airbnb has been increasingly gaining popularity since 2008 due to its low prices and direct interactions with the local community. This paper employed a general linear model (GLM) and a geographically weighted regression (GWR) model to identify the key factors affecting Airbnb listing prices using data sets of 794 samples of Airbnb listings of business units in Metro Nashville, Tennessee. The results showed that the GWR model performs better than the GLM in terms of accuracy and affected vari...

  6. A comparative UPLC-Q/TOF-MS-based metabolomics approach for distinguishing Zingiber officinale Roscoe of two geographical origins.

    Science.gov (United States)

    Mais, Enos; Alolga, Raphael N; Wang, Shi-Lei; Linus, Loveth O; Yin, Xiaojin; Qi, Lian-Wen

    2018-02-01

    Ginger, the rhizome of Zingiber officinale Roscoe, is a popular spice used in the food, beverage and confectionary industries. In this study, we report an untargeted UPLC-Q/TOF-MS-based metabolomics approach for comprehensively discriminating between ginger from two geographical locations, Ghana in West Africa and China. Forty batches of fresh ginger from both countries were discriminated using principal component analysis and orthogonal partial least squares discrimination analysis. Sixteen differential metabolites were identified between the gingers from the two geographical locations, six of which were identified as the marker compounds responsible for the discrimination. Our study highlights the essence and predictive power of metabolomics in detecting minute differences in same varieties of plants/plant samples based on the levels and composition of their metabolites. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Imputation of missing genotypes within LD-blocks relying on the basic coalescent and beyond: consideration of population growth and structure.

    Science.gov (United States)

    Kabisch, Maria; Hamann, Ute; Lorenzo Bermejo, Justo

    2017-10-17

    Genotypes not directly measured in genetic studies are often imputed to improve statistical power and to increase mapping resolution. The accuracy of standard imputation techniques strongly depends on the similarity of linkage disequilibrium (LD) patterns in the study and reference populations. Here we develop a novel approach for genotype imputation in low-recombination regions that relies on the coalescent and permits to explicitly account for population demographic factors. To test the new method, study and reference haplotypes were simulated and gene trees were inferred under the basic coalescent and also considering population growth and structure. The reference haplotypes that first coalesced with study haplotypes were used as templates for genotype imputation. Computer simulations were complemented with the analysis of real data. Genotype concordance rates were used to compare the accuracies of coalescent-based and standard (IMPUTE2) imputation. Simulations revealed that, in LD-blocks, imputation accuracy relying on the basic coalescent was higher and less variable than with IMPUTE2. Explicit consideration of population growth and structure, even if present, did not practically improve accuracy. The advantage of coalescent-based over standard imputation increased with the minor allele frequency and it decreased with population stratification. Results based on real data indicated that, even in low-recombination regions, further research is needed to incorporate recombination in coalescence inference, in particular for studies with genetically diverse and admixed individuals. To exploit the full potential of coalescent-based methods for the imputation of missing genotypes in genetic studies, further methodological research is needed to reduce computer time, to take into account recombination, and to implement these methods in user-friendly computer programs. Here we provide reproducible code which takes advantage of publicly available software to facilitate

  8. A Geographic Information System approach to modeling nutrient and sediment transport

    Energy Technology Data Exchange (ETDEWEB)

    Levine, D.A. [Automated Sciences Group, Inc., Oak Ridge, TN (United States); Hunsaker, C.T.; Beauchamp, J.J. [Oak Ridge National Lab., TN (United States); Timmins, S.P. [Analysas Corp., Oak Ridge, TN (United States)

    1993-02-01

    The objective of this study was to develop a water quality model to quantify nonpoint-source (NPS) pollution that uses a geographic information system (GIS) to link statistical modeling of nutrient and sediment delivery with the spatial arrangement of the parameters that drive the model. The model predicts annual nutrient and sediment loading and was developed, calibrated, and tested on 12 watersheds within the Lake Ray Roberts drainage basin in north Texas. Three physiographic regions are represented by these watersheds, and model success, as measured by the accuracy of load estimates, was compared within and across these regions.

  9. Limitations in Using Multiple Imputation to Harmonize Individual Participant Data for Meta-Analysis.

    Science.gov (United States)

    Siddique, Juned; de Chavez, Peter J; Howe, George; Cruden, Gracelyn; Brown, C Hendricks

    2018-02-01

    Individual participant data (IPD) meta-analysis is a meta-analysis in which the individual-level data for each study are obtained and used for synthesis. A common challenge in IPD meta-analysis is when variables of interest are measured differently in different studies. The term harmonization has been coined to describe the procedure of placing variables on the same scale in order to permit pooling of data from a large number of studies. Using data from an IPD meta-analysis of 19 adolescent depression trials, we describe a multiple imputation approach for harmonizing 10 depression measures across the 19 trials by treating those depression measures that were not used in a study as missing data. We then apply diagnostics to address the fit of our imputation model. Even after reducing the scale of our application, we were still unable to produce accurate imputations of the missing values. We describe those features of the data that made it difficult to harmonize the depression measures and provide some guidelines for using multiple imputation for harmonization in IPD meta-analysis.

  10. Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data.

    Science.gov (United States)

    Rahman, Shah Atiqur; Huang, Yuxiao; Claassen, Jan; Heintzman, Nathaniel; Kleinberg, Samantha

    2015-12-01

    Most clinical and biomedical data contain missing values. A patient's record may be split across multiple institutions, devices may fail, and sensors may not be worn at all times. While these missing values are often ignored, this can lead to bias and error when the data are mined. Further, the data are not simply missing at random. Instead the measurement of a variable such as blood glucose may depend on its prior values as well as that of other variables. These dependencies exist across time as well, but current methods have yet to incorporate these temporal relationships as well as multiple types of missingness. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on three biological datasets (simulated and actual Type 1 diabetes datasets, and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Learning-Based Adaptive Imputation Methodwith kNN Algorithm for Missing Power Data

    Directory of Open Access Journals (Sweden)

    Minkyung Kim

    2017-10-01

    Full Text Available This paper proposes a learning-based adaptive imputation method (LAI for imputing missing power data in an energy system. This method estimates the missing power data by using the pattern that appears in the collected data. Here, in order to capture the patterns from past power data, we newly model a feature vector by using past data and its variations. The proposed LAI then learns the optimal length of the feature vector and the optimal historical length, which are significant hyper parameters of the proposed method, by utilizing intentional missing data. Based on a weighted distance between feature vectors representing a missing situation and past situation, missing power data are estimated by referring to the k most similar past situations in the optimal historical length. We further extend the proposed LAI to alleviate the effect of unexpected variation in power data and refer to this new approach as the extended LAI method (eLAI. The eLAI selects a method between linear interpolation (LI and the proposed LAI to improve accuracy under unexpected variations. Finally, from a simulation under various energy consumption profiles, we verify that the proposed eLAI achieves about a 74% reduction of the average imputation error in an energy system, compared to the existing imputation methods.

  12. The use of multiple imputation for the accurate measurements of individual feed intake by electronic feeders.

    Science.gov (United States)

    Jiao, S; Tiezzi, F; Huang, Y; Gray, K A; Maltecca, C

    2016-02-01

    Obtaining accurate individual feed intake records is the key first step in achieving genetic progress toward more efficient nutrient utilization in pigs. Feed intake records collected by electronic feeding systems contain errors (erroneous and abnormal values exceeding certain cutoff criteria), which are due to feeder malfunction or animal-feeder interaction. In this study, we examined the use of a novel data-editing strategy involving multiple imputation to minimize the impact of errors and missing values on the quality of feed intake data collected by an electronic feeding system. Accuracy of feed intake data adjustment obtained from the conventional linear mixed model (LMM) approach was compared with 2 alternative implementations of multiple imputation by chained equation, denoted as MI (multiple imputation) and MICE (multiple imputation by chained equation). The 3 methods were compared under 3 scenarios, where 5, 10, and 20% feed intake error rates were simulated. Each of the scenarios was replicated 5 times. Accuracy of the alternative error adjustment was measured as the correlation between the true daily feed intake (DFI; daily feed intake in the testing period) or true ADFI (the mean DFI across testing period) and the adjusted DFI or adjusted ADFI. In the editing process, error cutoff criteria are used to define if a feed intake visit contains errors. To investigate the possibility that the error cutoff criteria may affect any of the 3 methods, the simulation was repeated with 2 alternative error cutoff values. Multiple imputation methods outperformed the LMM approach in all scenarios with mean accuracies of 96.7, 93.5, and 90.2% obtained with MI and 96.8, 94.4, and 90.1% obtained with MICE compared with 91.0, 82.6, and 68.7% using LMM for DFI. Similar results were obtained for ADFI. Furthermore, multiple imputation methods consistently performed better than LMM regardless of the cutoff criteria applied to define errors. In conclusion, multiple imputation

  13. Combining Inferential and Deductive Approaches to Estimate the Potential Geographical Range of the Invasive Plant Pathogen, Phytophthora ramorum

    Science.gov (United States)

    Ireland, Kylie B.; Hardy, Giles E. St. J.; Kriticos, Darren J.

    2013-01-01

    Phytophthora ramorum, an invasive plant pathogen of unknown origin, causes considerable and widespread damage in plant industries and natural ecosystems of the USA and Europe. Estimating the potential geographical range of P. ramorum has been complicated by a lack of biological and geographical data with which to calibrate climatic models. Previous attempts to do so, using either invaded range data or surrogate species approaches, have delivered varying results. A simulation model was developed using CLIMEX to estimate the global climate suitability patterns for establishment of P. ramorum. Growth requirements and stress response parameters were derived from ecophysiological laboratory observations and site-level transmission and disease factors related to climate data in the field. Geographical distribution data from the USA (California and Oregon) and Norway were reserved from model-fitting and used to validate the models. The model suggests that the invasion of P. ramorum in both North America and Europe is still in its infancy and that it is presently occupying a small fraction of its potential range. Phytophthora ramorum appears to be climatically suited to large areas of Africa, Australasia and South America, where it could cause biodiversity and economic losses in plant industries and natural ecosystems with susceptible hosts if introduced. PMID:23667628

  14. Cost reduction for web-based data imputation

    KAUST Repository

    Li, Zhixu

    2014-01-01

    Web-based Data Imputation enables the completion of incomplete data sets by retrieving absent field values from the Web. In particular, complete fields can be used as keywords in imputation queries for absent fields. However, due to the ambiguity of these keywords and the data complexity on the Web, different queries may retrieve different answers to the same absent field value. To decide the most probable right answer to each absent filed value, existing method issues quite a few available imputation queries for each absent value, and then vote on deciding the most probable right answer. As a result, we have to issue a large number of imputation queries for filling all absent values in an incomplete data set, which brings a large overhead. In this paper, we work on reducing the cost of Web-based Data Imputation in two aspects: First, we propose a query execution scheme which can secure the most probable right answer to an absent field value by issuing as few imputation queries as possible. Second, we recognize and prune queries that probably will fail to return any answers a priori. Our extensive experimental evaluation shows that our proposed techniques substantially reduce the cost of Web-based Imputation without hurting its high imputation accuracy. © 2014 Springer International Publishing Switzerland.

  15. A Geographic Information Science (GISc) Approach to Characterizing Spatiotemporal Patterns of Terrorist Incidents in Iraq, 2004-2009

    Energy Technology Data Exchange (ETDEWEB)

    Medina, Richard M [ORNL; Siebeneck, Laura K. [University of Utah; Hepner, George F. [University of Utah

    2011-01-01

    As terrorism on all scales continues, it is necessary to improve understanding of terrorist and insurgent activities. This article takes a Geographic Information Systems (GIS) approach to advance the understanding of spatial, social, political, and cultural triggers that influence terrorism incidents. Spatial, temporal, and spatiotemporal patterns of terrorist attacks are examined to improve knowledge about terrorist systems of training, planning, and actions. The results of this study aim to provide a foundation for understanding attack patterns and tactics in emerging havens as well as inform the creation and implementation of various counterterrorism measures.

  16. Biodiversity, land use and ecosystem services—An organismic and comparative approach to different geographical regions

    Directory of Open Access Journals (Sweden)

    Ulrich Zeller

    2017-04-01

    The approach further focuses on human–wildlife interactions. The emergence of top predators in Europe reveals the value of the experience from Africa, where pastoralists manage to coexist with large predators since millennia. The investigation of African grasslands enables a critical reflection and a thorough understanding of processes, which have occurred a long time ago in Europe. Our approach leads to a revaluation of the significance of Africa in terms of a conservative, relict case scenario that can provide essential insights into the original situation of ecosystems, especially in view of “rewilding” approaches in Europe. Thereby, the approach leads to a careful consideration of the term “wilderness”.

  17. Use of Multiple Imputation Method to Improve Estimation of Missing Baseline Serum Creatinine in Acute Kidney Injury Research

    Science.gov (United States)

    Peterson, Josh F.; Eden, Svetlana K.; Moons, Karel G.; Ikizler, T. Alp; Matheny, Michael E.

    2013-01-01

    Summary Background and objectives Baseline creatinine (BCr) is frequently missing in AKI studies. Common surrogate estimates can misclassify AKI and adversely affect the study of related outcomes. This study examined whether multiple imputation improved accuracy of estimating missing BCr beyond current recommendations to apply assumed estimated GFR (eGFR) of 75 ml/min per 1.73 m2 (eGFR 75). Design, setting, participants, & measurements From 41,114 unique adult admissions (13,003 with and 28,111 without BCr data) at Vanderbilt University Hospital between 2006 and 2008, a propensity score model was developed to predict likelihood of missing BCr. Propensity scoring identified 6502 patients with highest likelihood of missing BCr among 13,003 patients with known BCr to simulate a “missing” data scenario while preserving actual reference BCr. Within this cohort (n=6502), the ability of various multiple-imputation approaches to estimate BCr and classify AKI were compared with that of eGFR 75. Results All multiple-imputation methods except the basic one more closely approximated actual BCr than did eGFR 75. Total AKI misclassification was lower with multiple imputation (full multiple imputation + serum creatinine) (9.0%) than with eGFR 75 (12.3%; Pcreatinine) (15.3%) versus eGFR 75 (40.5%; P<0.001). Multiple imputation improved specificity and positive predictive value for detecting AKI at the expense of modestly decreasing sensitivity relative to eGFR 75. Conclusions Multiple imputation can improve accuracy in estimating missing BCr and reduce misclassification of AKI beyond currently proposed methods. PMID:23037980

  18. Two-pass imputation algorithm for missing value estimation in gene expression time series.

    Science.gov (United States)

    Tsiporkova, Elena; Boeva, Veselka

    2007-10-01

    Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different

  19. Determinants of the geographical distribution of endemic giardiasis in Ontario, Canada: a spatial modelling approach.

    Science.gov (United States)

    Odoi, A; Martin, S W; Michel, P; Holt, J; Middleton, D; Wilson, J

    2004-10-01

    Giardiasis surveillance data as well as drinking water, socioeconomic and land-use data were used in spatial regression models to investigate determinants of the geographic distribution of endemic giardiasis in southern Ontario. Higher giardiasis rates were observed in areas using surface water [rate ratio (RR) 2.36, 95 % CI 1.38-4.05] and in rural areas (RR 1.79, 95 % CI 1.32-2.37). Lower rates were observed in areas using filtered water (RR 0.55, 95 % CI 0.42-0.94) and in those with high median income (RR 0.62, 95 % CI 0.42-0.92). Chlorination of drinking water, cattle density and intensity of manure application on farmland were not significant determinants. The study shows that waterborne transmission plays an important role in giardiasis distribution in southern Ontario and that well-collected routine surveillance data could be useful for investigation of disease determinants and identification of high-risk communities. This information is useful in guiding decisions on control strategies.

  20. Key Factors Affecting the Price of Airbnb Listings: A Geographically Weighted Approach

    Directory of Open Access Journals (Sweden)

    Zhihua Zhang

    2017-09-01

    Full Text Available Airbnb has been increasingly gaining popularity since 2008 due to its low prices and direct interactions with the local community. This paper employed a general linear model (GLM and a geographically weighted regression (GWR model to identify the key factors affecting Airbnb listing prices using data sets of 794 samples of Airbnb listings of business units in Metro Nashville, Tennessee. The results showed that the GWR model performs better than the GLM in terms of accuracy and affected variable selections. Statistically significant differences varied across regions in Metro Nashville. The coefficients illustrate a decreasing trend while there is an increase in the distance from the listed units to the convention center, which indicates that Airbnb listing prices are more sensitive to the distance from the convention center in the central area than in other areas. These findings can also provide implications for stakeholders such as Airbnb hosts to gain a better understanding of the market situation and formulate a suitable pricing strategy.

  1. Exploring the Application of Volunteered Geographic Information to Catchment Management: a Survey Approach

    Science.gov (United States)

    Paudyal, D. R.; McDougall, K.; Apan, A.

    2012-07-01

    The participation and engagement of grass-root level community groups and citizens for natural resource management has a long history. With recent developments in ICT tools and spatial technology, these groups are seeking a new opportunity to manage natural resource data. There are lot of spatial information collected/generated by landcare groups, land holders and other community groups at the grass-root level through their volunteer initiatives. State government organisations are also interested in gaining access to this spatial data/information and engaging these groups to collect spatial information under their mapping programs. The aim of this paper is to explore the possible utilisation of volunteered geographic information (VGI) for catchment management activities. This research paper discusses the importance of spatial information and spatial data infrastructure (SDI) for catchment management and the emergence of VGI. A conceptual framework has been developed to illustrate how these emerging spatial information applications and various community volunteer activities can contribute to a more inclusive spatial data infrastructure (SDI) development at local level. A survey of 56 regional NRM bodies in Australia was utilised to explore the current community-driven volunteer initiatives for NRM activities and the potential of utilisation of VGI initiatives for NRM decision making process. This research paper concludes that VGI activities have great potential to contribute to SDI development at the community level to achieve better natural resource management (NRM) outcomes.

  2. Adaptive capacity of geographical clusters: Complexity science and network theory approach

    Science.gov (United States)

    Albino, Vito; Carbonara, Nunzia; Giannoccaro, Ilaria

    This paper deals with the adaptive capacity of geographical clusters (GCs), that is a relevant topic in the literature. To address this topic, GC is considered as a complex adaptive system (CAS). Three theoretical propositions concerning the GC adaptive capacity are formulated by using complexity theory. First, we identify three main properties of CAS s that affect the adaptive capacity, namely the interconnectivity, the heterogeneity, and the level of control, and define how the value of these properties influence the adaptive capacity. Then, we associate these properties with specific GC characteristics so obtaining the key conditions of GCs that give them the adaptive capacity so assuring their competitive advantage. To test these theoretical propositions, a case study on two real GCs is carried out. The considered GCs are modeled as networks where firms are nodes and inter-firms relationships are links. Heterogeneity, interconnectivity, and level of control are considered as network properties and thus measured by using the methods of the network theory.

  3. Imputing forest carbon stock estimates from inventory plots to a nationally continuous coverage

    Directory of Open Access Journals (Sweden)

    Wilson Barry Tyler

    2013-01-01

    Full Text Available Abstract The U.S. has been providing national-scale estimates of forest carbon (C stocks and stock change to meet United Nations Framework Convention on Climate Change (UNFCCC reporting requirements for years. Although these currently are provided as national estimates by pool and year to meet greenhouse gas monitoring requirements, there is growing need to disaggregate these estimates to finer scales to enable strategic forest management and monitoring activities focused on various ecosystem services such as C storage enhancement. Through application of a nearest-neighbor imputation approach, spatially extant estimates of forest C density were developed for the conterminous U.S. using the U.S.’s annual forest inventory. Results suggest that an existing forest inventory plot imputation approach can be readily modified to provide raster maps of C density across a range of pools (e.g., live tree to soil organic carbon and spatial scales (e.g., sub-county to biome. Comparisons among imputed maps indicate strong regional differences across C pools. The C density of pools closely related to detrital input (e.g., dead wood is often highest in forests suffering from recent mortality events such as those in the northern Rocky Mountains (e.g., beetle infestations. In contrast, live tree carbon density is often highest on the highest quality forest sites such as those found in the Pacific Northwest. Validation results suggest strong agreement between the estimates produced from the forest inventory plots and those from the imputed maps, particularly when the C pool is closely associated with the imputation model (e.g., aboveground live biomass and live tree basal area, with weaker agreement for detrital pools (e.g., standing dead trees. Forest inventory imputed plot maps provide an efficient and flexible approach to monitoring diverse C pools at national (e.g., UNFCCC and regional scales (e.g., Reducing Emissions from Deforestation and Forest

  4. Geographic Information System (GIS) capabilities in traffic accident information management: a qualitative approach

    Science.gov (United States)

    Ahmadi, Maryam; Valinejadi, Ali; Goodarzi, Afshin; Safari, Ameneh; Hemmat, Morteza; Majdabadi, Hesamedin Askari; Mohammadi, Ali

    2017-01-01

    Background Traffic accidents are one of the more important national and international issues, and their consequences are important for the political, economical, and social level in a country. Management of traffic accident information requires information systems with analytical and accessibility capabilities to spatial and descriptive data. Objective The aim of this study was to determine the capabilities of a Geographic Information System (GIS) in management of traffic accident information. Methods This qualitative cross-sectional study was performed in 2016. In the first step, GIS capabilities were identified via literature retrieved from the Internet and based on the included criteria. Review of the literature was performed until data saturation was reached; a form was used to extract the capabilities. In the second step, study population were hospital managers, police, emergency, statisticians, and IT experts in trauma, emergency and police centers. Sampling was purposive. Data was collected using a questionnaire based on the first step data; validity and reliability were determined by content validity and Cronbach’s alpha of 75%. Data was analyzed using the decision Delphi technique. Results GIS capabilities were identified in ten categories and 64 sub-categories. Import and process of spatial and descriptive data and so, analysis of this data were the most important capabilities of GIS in traffic accident information management. Conclusion Storing and retrieving of descriptive and spatial data, providing statistical analysis in table, chart and zoning format, management of bad structure issues, determining the cost effectiveness of the decisions and prioritizing their implementation were the most important capabilities of GIS which can be efficient in the management of traffic accident information. PMID:28848627

  5. Geographic Information System (GIS) capabilities in traffic accident information management: a qualitative approach.

    Science.gov (United States)

    Ahmadi, Maryam; Valinejadi, Ali; Goodarzi, Afshin; Safari, Ameneh; Hemmat, Morteza; Majdabadi, Hesamedin Askari; Mohammadi, Ali

    2017-06-01

    Traffic accidents are one of the more important national and international issues, and their consequences are important for the political, economical, and social level in a country. Management of traffic accident information requires information systems with analytical and accessibility capabilities to spatial and descriptive data. The aim of this study was to determine the capabilities of a Geographic Information System (GIS) in management of traffic accident information. This qualitative cross-sectional study was performed in 2016. In the first step, GIS capabilities were identified via literature retrieved from the Internet and based on the included criteria. Review of the literature was performed until data saturation was reached; a form was used to extract the capabilities. In the second step, study population were hospital managers, police, emergency, statisticians, and IT experts in trauma, emergency and police centers. Sampling was purposive. Data was collected using a questionnaire based on the first step data; validity and reliability were determined by content validity and Cronbach's alpha of 75%. Data was analyzed using the decision Delphi technique. GIS capabilities were identified in ten categories and 64 sub-categories. Import and process of spatial and descriptive data and so, analysis of this data were the most important capabilities of GIS in traffic accident information management. Storing and retrieving of descriptive and spatial data, providing statistical analysis in table, chart and zoning format, management of bad structure issues, determining the cost effectiveness of the decisions and prioritizing their implementation were the most important capabilities of GIS which can be efficient in the management of traffic accident information.

  6. Data Editing and Imputation in Business Surveys Using “R”

    Directory of Open Access Journals (Sweden)

    Elena Romascanu

    2014-06-01

    Full Text Available Purpose – Missing data are a recurring problem that can cause bias or lead to inefficient analyses. The objective of this paper is a direct comparison between the two statistical software features R and SPSS, in order to take full advantage of the existing automated methods for data editing process and imputation in business surveys (with a proper design of consistency rules as a partial alternative to the manual editing of data. Approach – The comparison of different methods on editing surveys data, in R with the ‘editrules’ and ‘survey’ packages because inside those, exist commonly used transformations in official statistics, as visualization of missing values pattern using ‘Amelia’ and ‘VIM’ packages, imputation approaches for longitudinal data using ‘VIMGUI’ and a comparison of another statistical software performance on the same features, such as SPSS. Findings – Data on business statistics received by NIS’s (National Institute of Statistics are not ready to be used for direct analysis due to in-record inconsistencies, errors and missing values from the collected data sets. The appropriate automatic methods from R packages, offers the ability to set the erroneous fields in edit-violating records, to verify the results after the imputation of missing values providing for users a flexible, less time consuming approach and easy to perform automation in R than in SPSS Macros syntax situations, when macros are very handy.

  7. A Comparison of Joint Model and Fully Conditional Specification Imputation for Multilevel Missing Data

    Science.gov (United States)

    Mistler, Stephen A.; Enders, Craig K.

    2017-01-01

    Multiple imputation methods can generally be divided into two broad frameworks: joint model (JM) imputation and fully conditional specification (FCS) imputation. JM draws missing values simultaneously for all incomplete variables using a multivariate distribution, whereas FCS imputes variables one at a time from a series of univariate conditional…

  8. Geographically distributed Batch System as a Service: the INDIGO-DataCloud approach exploiting HTCondor

    Science.gov (United States)

    Aiftimiei, D. C.; Antonacci, M.; Bagnasco, S.; Boccali, T.; Bucchi, R.; Caballer, M.; Costantini, A.; Donvito, G.; Gaido, L.; Italiano, A.; Michelotto, D.; Panella, M.; Salomoni, D.; Vallero, S.

    2017-10-01

    One of the challenges a scientific computing center has to face is to keep delivering well consolidated computational frameworks (i.e. the batch computing farm), while conforming to modern computing paradigms. The aim is to ease system administration at all levels (from hardware to applications) and to provide a smooth end-user experience. Within the INDIGO- DataCloud project, we adopt two different approaches to implement a PaaS-level, on-demand Batch Farm Service based on HTCondor and Mesos. In the first approach, described in this paper, the various HTCondor daemons are packaged inside pre-configured Docker images and deployed as Long Running Services through Marathon, profiting from its health checks and failover capabilities. In the second approach, we are going to implement an ad-hoc HTCondor framework for Mesos. Container-to-container communication and isolation have been addressed exploring a solution based on overlay networks (based on the Calico Project). Finally, we have studied the possibility to deploy an HTCondor cluster that spans over different sites, exploiting the Condor Connection Broker component, that allows communication across a private network boundary or firewall as in case of multi-site deployments. In this paper, we are going to describe and motivate our implementation choices and to show the results of the first tests performed.

  9. Practical considerations for sensitivity analysis after multiple imputation applied to epidemiological studies with incomplete data

    Science.gov (United States)

    2012-01-01

    Background Multiple Imputation as usually implemented assumes that data are Missing At Random (MAR), meaning that the underlying missing data mechanism, given the observed data, is independent of the unobserved data. To explore the sensitivity of the inferences to departures from the MAR assumption, we applied the method proposed by Carpenter et al. (2007). This approach aims to approximate inferences under a Missing Not At random (MNAR) mechanism by reweighting estimates obtained after multiple imputation where the weights depend on the assumed degree of departure from the MAR assumption. Methods The method is illustrated with epidemiological data from a surveillance system of hepatitis C virus (HCV) infection in France during the 2001–2007 period. The subpopulation studied included 4343 HCV infected patients who reported drug use. Risk factors for severe liver disease were assessed. After performing complete-case and multiple imputation analyses, we applied the sensitivity analysis to 3 risk factors of severe liver disease: past excessive alcohol consumption, HIV co-infection and infection with HCV genotype 3. Results In these data, the association between severe liver disease and HIV was underestimated, if given the observed data the chance of observing HIV status is high when this is positive. Inference for two other risk factors were robust to plausible local departures from the MAR assumption. Conclusions We have demonstrated the practical utility of, and advocate, a pragmatic widely applicable approach to exploring plausible departures from the MAR assumption post multiple imputation. We have developed guidelines for applying this approach to epidemiological studies. PMID:22681630

  10. Spatial Random Effects Survival Models to Assess Geographical Inequalities in Dengue Fever Using Bayesian Approach: a Case Study

    Science.gov (United States)

    Astuti Thamrin, Sri; Taufik, Irfan

    2018-03-01

    Dengue haemorrhagic fever (DHF) is an infectious disease caused by dengue virus. The increasing number of people with DHF disease correlates with the neighbourhood, for example sub-districts, and the characteristics of the sub-districts are formed from individuals who are domiciled in the sub-districts. Data containing individuals and sub-districts is a hierarchical data structure, called multilevel analysis. Frequently encountered response variable of the data is the time until an event occurs. Multilevel and spatial models are being increasingly used to obtain substantive information on area-level inequalities in DHF survival. Using a case study approach, we report on the implications of using multilevel with spatial survival models to study geographical inequalities in all cause survival.

  11. Ecological study and risk mapping of leishmaniasis in an endemic area of Brazil based on a geographical information systems approach

    Directory of Open Access Journals (Sweden)

    Alba Valéria Machado da Silva

    2011-11-01

    Full Text Available Visceral leishmaniasis is a vector-borne disease highly influenced by eco-epidemiological factors. Geographical information systems (GIS have proved to be a suitable approach for the analysis of environmental components that affect the spatial distribution of diseases. Exploiting this methodology, a model was developed for the mapping of the distribution and incidence of canine leishmaniasis in an endemic area of Brazil. Local variations were observed with respect to infection incidence and distribution of serological titers, i.e. high titers were noted close to areas with preserved vegetation, while low titers were more frequent in areas where people kept chickens. Based on these results, we conclude that the environment plays an important role in generating relatively protected areas within larger endemic regions, but that it can also contribute to the creation of hotspots with clusters of comparatively high serological titers indicating a high level of transmission compared with neighbouring areas.

  12. Historic, economic and geographic approach to the relations between economic growth and oil prices

    International Nuclear Information System (INIS)

    Girard, C.

    1992-01-01

    This paper attempts to analyze the complex relations between energy prices and economic growth. In part 1, the history of the last 30 years is used to analyze the consequences of energy price variations on world economic growth. In part 2, two interpretation schemes are described successively. One has to do with the international oil market, and the other concerns the impact of oil shocks on national economies. In part 3, an approach by type of country leads to conclusions respectively for the industrialized countries, for the developing countries and for the newly industrialized countries. 1 fig

  13. Multiple Imputation of Predictor Variables Using Generalized Additive Models

    NARCIS (Netherlands)

    de Jong, Roel; van Buuren, Stef; Spiess, Martin

    2016-01-01

    The sensitivity of multiple imputation methods to deviations from their distributional assumptions is investigated using simulations, where the parameters of scientific interest are the coefficients of a linear regression model, and values in predictor variables are missing at random. The

  14. Comparison of different Methods for Univariate Time Series Imputation in R

    OpenAIRE

    Moritz, Steffen; Sardá, Alexis; Bartz-Beielstein, Thomas; Zaefferer, Martin; Stork, Jörg

    2015-01-01

    Missing values in datasets are a well-known problem and there are quite a lot of R packages offering imputation functions. But while imputation in general is well covered within R, it is hard to find functions for imputation of univariate time series. The problem is, most standard imputation techniques can not be applied directly. Most algorithms rely on inter-attribute correlations, while univariate time series imputation needs to employ time dependencies. This paper provides an overview of ...

  15. Missing data treatments matter: an analysis of multiple imputation for anterior cervical discectomy and fusion procedures.

    Science.gov (United States)

    Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Cui, Jonathan J; Basques, Bryce A; Albert, Todd J; Grauer, Jonathan N

    2018-04-09

    The presence of missing data is a limitation of large datasets, including the National Surgical Quality Improvement Program (NSQIP). In addressing this issue, most studies use complete case analysis, which excludes cases with missing data, thus potentially introducing selection bias. Multiple imputation, a statistically rigorous approach that approximates missing data and preserves sample size, may be an improvement over complete case analysis. The present study aims to evaluate the impact of using multiple imputation in comparison with complete case analysis for assessing the associations between preoperative laboratory values and adverse outcomes following anterior cervical discectomy and fusion (ACDF) procedures. This is a retrospective review of prospectively collected data. Patients undergoing one-level ACDF were identified in NSQIP 2012-2015. Perioperative adverse outcome variables assessed included the occurrence of any adverse event, severe adverse events, and hospital readmission. Missing preoperative albumin and hematocrit values were handled using complete case analysis and multiple imputation. These preoperative laboratory levels were then tested for associations with 30-day postoperative outcomes using logistic regression. A total of 11,999 patients were included. Of this cohort, 63.5% of patients had missing preoperative albumin and 9.9% had missing preoperative hematocrit. When using complete case analysis, only 4,311 patients were studied. The removed patients were significantly younger, healthier, of a common body mass index, and male. Logistic regression analysis failed to identify either preoperative hypoalbuminemia or preoperative anemia as significantly associated with adverse outcomes. When employing multiple imputation, all 11,999 patients were included. Preoperative hypoalbuminemia was significantly associated with the occurrence of any adverse event and severe adverse events. Preoperative anemia was significantly associated with the

  16. Imputation of microsatellite alleles from dense SNP genotypes for parental verification

    Directory of Open Access Journals (Sweden)

    Matthew eMcclure

    2012-08-01

    Full Text Available Microsatellite (MS markers have recently been used for parental verification and are still the international standard despite higher cost, error rate, and turnaround time compared with Single Nucleotide Polymorphisms (SNP-based assays. Despite domestic and international interest from producers and research communities, no viable means currently exist to verify parentage for an individual unless all familial connections were analyzed using the same DNA marker type (MS or SNP. A simple and cost-effective method was devised to impute MS alleles from SNP haplotypes within breeds. For some MS, imputation results may allow inference across breeds. A total of 347 dairy cattle representing 4 dairy breeds (Brown Swiss, Guernsey, Holstein, and Jersey were used to generate reference haplotypes. This approach has been verified (>98% accurate for imputing the International Society of Animal Genetics (ISAG recommended panel of 12 MS for cattle parentage verification across a validation set of 1,307 dairy animals.. Implementation of this method will allow producers and breed associations to transition to SNP-based parentage verification utilizing MS genotypes from historical data on parents where SNP genotypes are missing. This approach may be applicable to additional cattle breeds and other species that wish to migrate from MS- to SNP- based parental verification.

  17. Multiple Improvements of Multiple Imputation Likelihood Ratio Tests

    OpenAIRE

    Chan, Kin Wai; Meng, Xiao-Li

    2017-01-01

    Multiple imputation (MI) inference handles missing data by first properly imputing the missing values $m$ times, and then combining the $m$ analysis results from applying a complete-data procedure to each of the completed datasets. However, the existing method for combining likelihood ratio tests has multiple defects: (i) the combined test statistic can be negative in practice when the reference null distribution is a standard $F$ distribution; (ii) it is not invariant to re-parametrization; ...

  18. Geographic Names

    Data.gov (United States)

    Minnesota Department of Natural Resources — The Geographic Names Information System (GNIS), developed by the United States Geological Survey in cooperation with the U.S. Board of Geographic Names, provides...

  19. Quick, “Imputation-free” meta-analysis with proxy-SNPs

    Directory of Open Access Journals (Sweden)

    Meesters Christian

    2012-09-01

    Full Text Available Abstract Background Meta-analysis (MA is widely used to pool genome-wide association studies (GWASes in order to a increase the power to detect strong or weak genotype effects or b as a result verification method. As a consequence of differing SNP panels among genotyping chips, imputation is the method of choice within GWAS consortia to avoid losing too many SNPs in a MA. YAMAS (Yet Another Meta Analysis Software, however, enables cross-GWAS conclusions prior to finished and polished imputation runs, which eventually are time-consuming. Results Here we present a fast method to avoid forfeiting SNPs present in only a subset of studies, without relying on imputation. This is accomplished by using reference linkage disequilibrium data from 1,000 Genomes/HapMap projects to find proxy-SNPs together with in-phase alleles for SNPs missing in at least one study. MA is conducted by combining association effect estimates of a SNP and those of its proxy-SNPs. Our algorithm is implemented in the MA software YAMAS. Association results from GWAS analysis applications can be used as input files for MA, tremendously speeding up MA compared to the conventional imputation approach. We show that our proxy algorithm is well-powered and yields valuable ad hoc results, possibly providing an incentive for follow-up studies. We propose our method as a quick screening step prior to imputation-based MA, as well as an additional main approach for studies without available reference data matching the ethnicities of study participants. As a proof of principle, we analyzed six dbGaP Type II Diabetes GWAS and found that the proxy algorithm clearly outperforms naïve MA on the p-value level: for 17 out of 23 we observe an improvement on the p-value level by a factor of more than two, and a maximum improvement by a factor of 2127. Conclusions YAMAS is an efficient and fast meta-analysis program which offers various methods, including conventional MA as well as inserting proxy

  20. Association between Floods and Acute Cardiovascular Diseases: A Population-Based Cohort Study Using a Geographic Information System Approach.

    Science.gov (United States)

    Vanasse, Alain; Cohen, Alan; Courteau, Josiane; Bergeron, Patrick; Dault, Roxanne; Gosselin, Pierre; Blais, Claudia; Bélanger, Diane; Rochette, Louis; Chebana, Fateh

    2016-01-28

    Floods represent a serious threat to human health beyond the immediate risk of drowning. There is few data on the potential link between floods and direct consequences on health such as on cardiovascular health. This study aimed to explore the impact of one of the worst floods in the history of Quebec, Canada on acute cardiovascular diseases (CVD). A cohort study with a time series design with multiple control groups was built with the adult population identified in the Quebec Integrated Chronic Disease Surveillance System. A geographic information system approach was used to define the study areas. Logistic regressions were performed to compare the occurrence of CVD between groups. The results showed a 25%-27% increase in the odds in the flooded population in spring 2011 when compared with the population in the same area in springs 2010 and 2012. Besides, an increase up to 69% was observed in individuals with a medical history of CVD. Despite interesting results, the association was not statistically significant. A possible explanation to this result can be that the population affected by the flood was probably too small to provide the statistical power to answer the question, and leaves open a substantial possibility for a real and large effect.

  1. Assessing accuracy of genotype imputation in American Indians.

    Directory of Open Access Journals (Sweden)

    Alka Malhotra

    Full Text Available Genotype imputation is commonly used in genetic association studies to test untyped variants using information on linkage disequilibrium (LD with typed markers. Imputing genotypes requires a suitable reference population in which the LD pattern is known, most often one selected from HapMap. However, some populations, such as American Indians, are not represented in HapMap. In the present study, we assessed accuracy of imputation using HapMap reference populations in a genome-wide association study in Pima Indians.Data from six randomly selected chromosomes were used. Genotypes in the study population were masked (either 1% or 20% of SNPs available for a given chromosome. The masked genotypes were then imputed using the software Markov Chain Haplotyping Algorithm. Using four HapMap reference populations, average genotype error rates ranged from 7.86% for Mexican Americans to 22.30% for Yoruba. In contrast, use of the original Pima Indian data as a reference resulted in an average error rate of 1.73%.Our results suggest that the use of HapMap reference populations results in substantial inaccuracy in the imputation of genotypes in American Indians. A possible solution would be to densely genotype or sequence a reference American Indian population.

  2. The multiple imputation method: a case study involving secondary data analysis.

    Science.gov (United States)

    Walani, Salimah R; Cleland, Charles M

    2015-05-01

    To illustrate with the example of a secondary data analysis study the use of the multiple imputation method to replace missing data. Most large public datasets have missing data, which need to be handled by researchers conducting secondary data analysis studies. Multiple imputation is a technique widely used to replace missing values while preserving the sample size and sampling variability of the data. The 2004 National Sample Survey of Registered Nurses. The authors created a model to impute missing values using the chained equation method. They used imputation diagnostics procedures and conducted regression analysis of imputed data to determine the differences between the log hourly wages of internationally educated and US-educated registered nurses. The authors used multiple imputation procedures to replace missing values in a large dataset with 29,059 observations. Five multiple imputed datasets were created. Imputation diagnostics using time series and density plots showed that imputation was successful. The authors also present an example of the use of multiple imputed datasets to conduct regression analysis to answer a substantive research question. Multiple imputation is a powerful technique for imputing missing values in large datasets while preserving the sample size and variance of the data. Even though the chained equation method involves complex statistical computations, recent innovations in software and computation have made it possible for researchers to conduct this technique on large datasets. The authors recommend nurse researchers use multiple imputation methods for handling missing data to improve the statistical power and external validity of their studies.

  3. Missing value imputation: with application to handwriting data

    Science.gov (United States)

    Xu, Zhen; Srihari, Sargur N.

    2015-01-01

    Missing values make pattern analysis difficult, particularly with limited available data. In longitudinal research, missing values accumulate, thereby aggravating the problem. Here we consider how to deal with temporal data with missing values in handwriting analysis. In the task of studying development of individuality of handwriting, we encountered the fact that feature values are missing for several individuals at several time instances. Six algorithms, i.e., random imputation, mean imputation, most likely independent value imputation, and three methods based on Bayesian network (static Bayesian network, parameter EM, and structural EM), are compared with children's handwriting data. We evaluate the accuracy and robustness of the algorithms under different ratios of missing data and missing values, and useful conclusions are given. Specifically, static Bayesian network is used for our data which contain around 5% missing data to provide adequate accuracy and low computational cost.

  4. Imputed prices of greenhouse gases and land forests

    International Nuclear Information System (INIS)

    Uzawa, Hirofumi

    1993-01-01

    The theory of dynamic optimum formulated by Maeler gives us the basic theoretical framework within which it is possible to analyse the economic and, possibly, political circumstances under which the phenomenon of global warming occurs, and to search for the policy and institutional arrangements whereby it would be effectively arrested. The analysis developed here is an application of Maeler's theory to atmospheric quality. In the analysis a central role is played by the concept of imputed price in the dynamic context. Our determination of imputed prices of atmospheric carbon dioxide and land forests takes into account the difference in the stages of economic development. Indeed, the ratios of the imputed prices of atmospheric carbon dioxide and land forests over the per capita level of real national income are identical for all countries involved. (3 figures, 2 tables) (Author)

  5. Data imputation analysis for Cosmic Rays time series

    Science.gov (United States)

    Fernandes, R. C.; Lucio, P. S.; Fernandez, J. H.

    2017-05-01

    The occurrence of missing data concerning Galactic Cosmic Rays time series (GCR) is inevitable since loss of data is due to mechanical and human failure or technical problems and different periods of operation of GCR stations. The aim of this study was to perform multiple dataset imputation in order to depict the observational dataset. The study has used the monthly time series of GCR Climax (CLMX) and Roma (ROME) from 1960 to 2004 to simulate scenarios of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of missing data compared to observed ROME series, with 50 replicates. Then, the CLMX station as a proxy for allocation of these scenarios was used. Three different methods for monthly dataset imputation were selected: AMÉLIA II - runs the bootstrap Expectation Maximization algorithm, MICE - runs an algorithm via Multivariate Imputation by Chained Equations and MTSDI - an Expectation Maximization algorithm-based method for imputation of missing values in multivariate normal time series. The synthetic time series compared with the observed ROME series has also been evaluated using several skill measures as such as RMSE, NRMSE, Agreement Index, R, R2, F-test and t-test. The results showed that for CLMX and ROME, the R2 and R statistics were equal to 0.98 and 0.96, respectively. It was observed that increases in the number of gaps generate loss of quality of the time series. Data imputation was more efficient with MTSDI method, with negligible errors and best skill coefficients. The results suggest a limit of about 60% of missing data for imputation, for monthly averages, no more than this. It is noteworthy that CLMX, ROME and KIEL stations present no missing data in the target period. This methodology allowed reconstructing 43 time series.

  6. Comparing alternative approaches to measuring the geographical accessibility of urban health services: Distance types and aggregation-error issues

    OpenAIRE

    Riva Mylène; Abdelmajid Mohamed; Apparicio Philippe; Shearmur Richard

    2008-01-01

    Abstract Background Over the past two decades, geographical accessibility of urban resources for population living in residential areas has received an increased focus in urban health studies. Operationalising and computing geographical accessibility measures depend on a set of four parameters, namely definition of residential areas, a method of aggregation, a measure of accessibility, and a type of distance. Yet, the choice of these parameters may potentially generate different results leadi...

  7. Water quality and health in a Sahelian semi-arid urban context: an integrated geographical approach in Nouakchott, Mauritania

    Directory of Open Access Journals (Sweden)

    Doulo Traoré

    2013-11-01

    Full Text Available Access to sufficient quantities of safe drinking water is a human right. Moreover, access to clean water is of public health relevance, particularly in semi-arid and Sahelian cities due to the risks of water contamination and transmission of water-borne diseases. We conducted a study in Nouakchott, the capital of Mauritania, to deepen the understanding of diarrhoeal incidence in space and time. We used an integrated geographical approach, combining socio-environmental, microbiological and epidemiological data from various sources, including spatially explicit surveys, laboratory analysis of water samples and reported diarrhoeal episodes. A geospatial technique was applied to determine the environmental and microbiological risk factors that govern diarrhoeal transmission. Statistical and cartographic analyses revealed concentration of unimproved sources of drinking water in the most densely populated areas of the city, coupled with a daily water allocation below the recommended standard of 20 l per person. Bacteriological analysis indicated that 93% of the non-piped water sources supplied at water points were contaminated with 10-80 coliform bacteria per 100 ml. Diarrhoea was the second most important disease reported at health centres, accounting for 12.8% of health care service consultations on average. Diarrhoeal episodes were concentrated in municipalities with the largest number of contaminated water sources. Environmental factors (e.g. lack of improved water sources and bacteriological aspects (e.g. water contamination with coliform bacteria are the main drivers explaining the spatio-temporal distribution of diarrhoea. We conclude that integrating environmental, microbiological and epidemiological variables with statistical regression models facilitates risk profiling of diarrhoeal diseases. Modes of water supply and water contamination were the main drivers of diarrhoea in this semi-arid urban context of Nouakchott, and hence require a

  8. Multiple imputation of missing passenger boarding data in the national census of ferry operators

    Science.gov (United States)

    2008-08-01

    This report presents findings from the 2006 National Census of Ferry Operators (NCFO) augmented with imputed values for passengers and passenger miles. Due to the imputation procedures used to calculate missing data, totals in Table 1 may not corresp...

  9. Handling missing data in cluster randomized trials: A demonstration of multiple imputation with PAN through SAS

    Directory of Open Access Journals (Sweden)

    Jiangxiu Zhou

    2014-09-01

    Full Text Available The purpose of this study is to demonstrate a way of dealing with missing data in clustered randomized trials by doing multiple imputation (MI with the PAN package in R through SAS. The procedure for doing MI with PAN through SAS is demonstrated in detail in order for researchers to be able to use this procedure with their own data. An illustration of the technique with empirical data was also included. In this illustration thePAN results were compared with pairwise deletion and three types of MI: (1 Normal Model (NM-MI ignoring the cluster structure; (2 NM-MI with dummy-coded cluster variables (fixed cluster structure; and (3 a hybrid NM-MI which imputes half the time ignoring the cluster structure, and the other half including the dummy-coded cluster variables. The empirical analysis showed that using PAN and the other strategies produced comparable parameter estimates. However, the dummy-coded MI overestimated the intraclass correlation, whereas MI ignoring the cluster structure and the hybrid MI underestimated the intraclass correlation. When compared with PAN, the p-value and standard error for the treatment effect were higher with dummy-coded MI, and lower with MI ignoring the clusterstructure, the hybrid MI approach, and pairwise deletion. Previous studies have shown that NM-MI is not appropriate for handling missing data in clustered randomized trials. This approach, in addition to the pairwise deletion approach, leads to a biased intraclass correlation and faultystatistical conclusions. Imputation in clustered randomized trials should be performed with PAN. We have demonstrated an easy way for using PAN through SAS.

  10. GACT: a Genome build and Allele definition Conversion Tool for SNP imputation and meta-analysis in genetic association studies.

    Science.gov (United States)

    Sulovari, Arvis; Li, Dawei

    2014-07-19

    Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions. In this study, we have developed a tool, GACT, which stands for Genome build and Allele definition Conversion Tool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate > 0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs. GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of genotypes from SNP-arrays or deep

  11. Synthetic Multiple-Imputation Procedure for Multistage Complex Samples

    Directory of Open Access Journals (Sweden)

    Zhou Hanzhi

    2016-03-01

    Full Text Available Multiple imputation (MI is commonly used when item-level missing data are present. However, MI requires that survey design information be built into the imputation models. For multistage stratified clustered designs, this requires dummy variables to represent strata as well as primary sampling units (PSUs nested within each stratum in the imputation model. Such a modeling strategy is not only operationally burdensome but also inferentially inefficient when there are many strata in the sample design. Complexity only increases when sampling weights need to be modeled. This article develops a generalpurpose analytic strategy for population inference from complex sample designs with item-level missingness. In a simulation study, the proposed procedures demonstrate efficient estimation and good coverage properties. We also consider an application to accommodate missing body mass index (BMI data in the analysis of BMI percentiles using National Health and Nutrition Examination Survey (NHANES III data. We argue that the proposed methods offer an easy-to-implement solution to problems that are not well-handled by current MI techniques. Note that, while the proposed method borrows from the MI framework to develop its inferential methods, it is not designed as an alternative strategy to release multiply imputed datasets for complex sample design data, but rather as an analytic strategy in and of itself.

  12. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel.

    Science.gov (United States)

    Mitt, Mario; Kals, Mart; Pärn, Kalle; Gabriel, Stacey B; Lander, Eric S; Palotie, Aarno; Ripatti, Samuli; Morris, Andrew P; Metspalu, Andres; Esko, Tõnu; Mägi, Reedik; Palta, Priit

    2017-06-01

    Genetic imputation is a cost-efficient way to improve the power and resolution of genome-wide association (GWA) studies. Current publicly accessible imputation reference panels accurately predict genotypes for common variants with minor allele frequency (MAF)≥5% and low-frequency variants (0.5≤MAF<5%) across diverse populations, but the imputation of rare variation (MAF<0.5%) is still rather limited. In the current study, we evaluate imputation accuracy achieved with reference panels from diverse populations with a population-specific high-coverage (30 ×) whole-genome sequencing (WGS) based reference panel, comprising of 2244 Estonian individuals (0.25% of adult Estonians). Although the Estonian-specific panel contains fewer haplotypes and variants, the imputation confidence and accuracy of imputed low-frequency and rare variants was significantly higher. The results indicate the utility of population-specific reference panels for human genetic studies.

  13. Sequence imputation of HPV16 genomes for genetic association studies.

    Directory of Open Access Journals (Sweden)

    Benjamin Smith

    Full Text Available Human Papillomavirus type 16 (HPV16 causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs is unknown. In addition, we neither know which nucleotides vary across and within HPV types and lineages, nor which of the single nucleotide polymorphisms (SNPs determine oncogenicity.A reference set of 62 HPV16 complete genome sequences was established and used to examine patterns of evolutionary relatedness amongst variants using a pairwise identity heatmap and HPV16 phylogeny. A BLAST-based algorithm was developed to impute complete genome data from partial sequence information using the reference database. To interrogate the oncogenic risk of determined and imputed HPV16 SNPs, odds-ratios for each SNP were calculated in a case-control viral genome-wide association study (VWAS using biopsy confirmed high-grade cervix neoplasia and self-limited HPV16 infections from Guanacaste, Costa Rica.HPV16 variants display evolutionarily stable lineages that contain conserved diagnostic SNPs. The imputation algorithm indicated that an average of 97.5±1.03% of SNPs could be accurately imputed. The VWAS revealed specific HPV16 viral SNPs associated with variant lineages and elevated odds ratios; however, individual causal SNPs could not be distinguished with certainty due to the nature of HPV evolution.Conserved and lineage-specific SNPs can be imputed with a high degree of accuracy from limited viral polymorphic data due to the lack of recombination and the stochastic mechanism of variation accumulation in the HPV genome. However, to determine the role of novel variants or non-lineage-specific SNPs by VWAS will require direct sequence analysis. The investigation of patterns of genetic variation and the identification of diagnostic SNPs for lineages of HPV16 variants provides a valuable resource for future studies of HPV16

  14. Imputation of the rare HOXB13 G84E mutation and cancer risk in a large population-based cohort.

    Directory of Open Access Journals (Sweden)

    Thomas J Hoffmann

    2015-01-01

    Full Text Available An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into large well-phenotyped cohorts with existing genome-wide genotype data using large sequenced referenced panels. The success of this approach hinges on the accuracy of rare variant imputation, which remains controversial. For example, a recent study suggested that one cannot adequately impute the HOXB13 G84E mutation associated with prostate cancer risk (carrier frequency of 0.0034 in European ancestry participants in the 1000 Genomes Project. We show that by utilizing the 1000 Genomes Project data plus an enriched reference panel of mutation carriers we were able to accurately impute the G84E mutation into a large cohort of 83,285 non-Hispanic White participants from the Kaiser Permanente Research Program on Genes, Environment and Health Genetic Epidemiology Research on Adult Health and Aging cohort. Imputation authenticity was confirmed via a novel classification and regression tree method, and then empirically validated analyzing a subset of these subjects plus an additional 1,789 men from Kaiser specifically genotyped for the G84E mutation (r2 = 0.57, 95% CI = 0.37–0.77. We then show the value of this approach by using the imputed data to investigate the impact of the G84E mutation on age-specific prostate cancer risk and on risk of fourteen other cancers in the cohort. The age-specific risk of prostate cancer among G84E mutation carriers was higher than among non-carriers. Risk estimates from Kaplan-Meier curves were 36.7% versus 13.6% by age 72, and 64.2% versus 24.2% by age 80, for G84E mutation carriers and non-carriers, respectively (p = 3.4x10-12. The G84E mutation was also associated with an increase in risk for the fourteen other most common cancers considered collectively (p = 5.8x10-4 and more so in cases diagnosed with multiple cancer types, both those including and not including prostate cancer, strongly suggesting

  15. A Note on the Effect of Data Clustering on the Multiple-Imputation Variance Estimator: A Theoretical Addendum to the Lewis et al. article in JOS 2014

    Directory of Open Access Journals (Sweden)

    He Yulei

    2016-03-01

    Full Text Available Multiple imputation is a popular approach to handling missing data. Although it was originally motivated by survey nonresponse problems, it has been readily applied to other data settings. However, its general behavior still remains unclear when applied to survey data with complex sample designs, including clustering. Recently, Lewis et al. (2014 compared single- and multiple-imputation analyses for certain incomplete variables in the 2008 National Ambulatory Medicare Care Survey, which has a nationally representative, multistage, and clustered sampling design. Their study results suggested that the increase of the variance estimate due to multiple imputation compared with single imputation largely disappears for estimates with large design effects. We complement their empirical research by providing some theoretical reasoning. We consider data sampled from an equally weighted, single-stage cluster design and characterize the process using a balanced, one-way normal random-effects model. Assuming that the missingness is completely at random, we derive analytic expressions for the within- and between-multiple-imputation variance estimators for the mean estimator, and thus conveniently reveal the impact of design effects on these variance estimators. We propose approximations for the fraction of missing information in clustered samples, extending previous results for simple random samples. We discuss some generalizations of this research and its practical implications for data release by statistical agencies.

  16. Missing data in clinical trials: control-based mean imputation and sensitivity analysis.

    Science.gov (United States)

    Mehrotra, Devan V; Liu, Fang; Permutt, Thomas

    2017-09-01

    In some randomized (drug versus placebo) clinical trials, the estimand of interest is the between-treatment difference in population means of a clinical endpoint that is free from the confounding effects of "rescue" medication (e.g., HbA1c change from baseline at 24 weeks that would be observed without rescue medication regardless of whether or when the assigned treatment was discontinued). In such settings, a missing data problem arises if some patients prematurely discontinue from the trial or initiate rescue medication while in the trial, the latter necessitating the discarding of post-rescue data. We caution that the commonly used mixed-effects model repeated measures analysis with the embedded missing at random assumption can deliver an exaggerated estimate of the aforementioned estimand of interest. This happens, in part, due to implicit imputation of an overly optimistic mean for "dropouts" (i.e., patients with missing endpoint data of interest) in the drug arm. We propose an alternative approach in which the missing mean for the drug arm dropouts is explicitly replaced with either the estimated mean of the entire endpoint distribution under placebo (primary analysis) or a sequence of increasingly more conservative means within a tipping point framework (sensitivity analysis); patient-level imputation is not required. A supplemental "dropout = failure" analysis is considered in which a common poor outcome is imputed for all dropouts followed by a between-treatment comparison using quantile regression. All analyses address the same estimand and can adjust for baseline covariates. Three examples and simulation results are used to support our recommendations. Copyright © 2017 John Wiley & Sons, Ltd.

  17. Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the Arctic.

    Science.gov (United States)

    Hopke, P K; Liu, C; Rubin, D B

    2001-03-01

    Many chemical and environmental data sets are complicated by the existence of fully missing values or censored values known to lie below detection thresholds. For example, week-long samples of airborne particulate matter were obtained at Alert, NWT, Canada, between 1980 and 1991, where some of the concentrations of 24 particulate constituents were coarsened in the sense of being either fully missing or below detection limits. To facilitate scientific analysis, it is appealing to create complete data by filling in missing values so that standard complete-data methods can be applied. We briefly review commonly used strategies for handling missing values and focus on the multiple-imputation approach, which generally leads to valid inferences when faced with missing data. Three statistical models are developed for multiply imputing the missing values of airborne particulate matter. We expect that these models are useful for creating multiple imputations in a variety of incomplete multivariate time series data sets.

  18. A Nonparametric, Multiple Imputation-Based Method for the Retrospective Integration of Data Sets

    Science.gov (United States)

    Carrig, Madeline M.; Manrique-Vallier, Daniel; Ranby, Krista W.; Reiter, Jerome P.; Hoyle, Rick H.

    2015-01-01

    Complex research questions often cannot be addressed adequately with a single data set. One sensible alternative to the high cost and effort associated with the creation of large new data sets is to combine existing data sets containing variables related to the constructs of interest. The goal of the present research was to develop a flexible, broadly applicable approach to the integration of disparate data sets that is based on nonparametric multiple imputation and the collection of data from a convenient, de novo calibration sample. We demonstrate proof of concept for the approach by integrating three existing data sets containing items related to the extent of problematic alcohol use and associations with deviant peers. We discuss both necessary conditions for the approach to work well and potential strengths and weaknesses of the method compared to other data set integration approaches. PMID:26257437

  19. Combining item response theory with multiple imputation to equate health assessment questionnaires.

    Science.gov (United States)

    Gu, Chenyang; Gutman, Roee

    2017-09-01

    The assessment of patients' functional status across the continuum of care requires a common patient assessment tool. However, assessment tools that are used in various health care settings differ and cannot be easily contrasted. For example, the Functional Independence Measure (FIM) is used to evaluate the functional status of patients who stay in inpatient rehabilitation facilities, the Minimum Data Set (MDS) is collected for all patients who stay in skilled nursing facilities, and the Outcome and Assessment Information Set (OASIS) is collected if they choose home health care provided by home health agencies. All three instruments or questionnaires include functional status items, but the specific items, rating scales, and instructions for scoring different activities vary between the different settings. We consider equating different health assessment questionnaires as a missing data problem, and propose a variant of predictive mean matching method that relies on Item Response Theory (IRT) models to impute unmeasured item responses. Using real data sets, we simulated missing measurements and compared our proposed approach to existing methods for missing data imputation. We show that, for all of the estimands considered, and in most of the experimental conditions that were examined, the proposed approach provides valid inferences, and generally has better coverages, relatively smaller biases, and shorter interval estimates. The proposed method is further illustrated using a real data set. © 2016, The International Biometric Society.

  20. A web-based approach to data imputation

    KAUST Repository

    Li, Zhixu; Sharaf, Mohamed Abdel Fattah; Sitbon, Laurianne; Sadiq, Shazia Wasim; Indulska, Marta; Zhou, Xiaofang

    2013-01-01

    principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme

  1. Bayesian Approaches to Imputation, Hypothesis Testing, and Parameter Estimation

    Science.gov (United States)

    Ross, Steven J.; Mackey, Beth

    2015-01-01

    This chapter introduces three applications of Bayesian inference to common and novel issues in second language research. After a review of the critiques of conventional hypothesis testing, our focus centers on ways Bayesian inference can be used for dealing with missing data, for testing theory-driven substantive hypotheses without a default null…

  2. Emerging approach for analytical characterization and geographical classification of Moroccan and French honeys by means of a voltammetric electronic tongue.

    Science.gov (United States)

    El Alami El Hassani, Nadia; Tahri, Khalid; Llobet, Eduard; Bouchikhi, Benachir; Errachid, Abdelhamid; Zine, Nadia; El Bari, Nezha

    2018-03-15

    Moroccan and French honeys from different geographical areas were classified and characterized by applying a voltammetric electronic tongue (VE-tongue) coupled to analytical methods. The studied parameters include color intensity, free lactonic and total acidity, proteins, phenols, hydroxymethylfurfural content (HMF), sucrose, reducing and total sugars. The geographical classification of different honeys was developed through three-pattern recognition techniques: principal component analysis (PCA), support vector machines (SVMs) and hierarchical cluster analysis (HCA). Honey characterization was achieved by partial least squares modeling (PLS). All the PLS models developed were able to accurately estimate the correct values of the parameters analyzed using as input the voltammetric experimental data (i.e. r>0.9). This confirms the potential ability of the VE-tongue for performing a rapid characterization of honeys via PLS in which an uncomplicated, cost-effective sample preparation process that does not require the use of additional chemicals is implemented. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Treatments of Missing Values in Large National Data Affect Conclusions: The Impact of Multiple Imputation on Arthroplasty Research.

    Science.gov (United States)

    Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Su, Edwin P; Grauer, Jonathan N

    2018-03-01

    Despite the advantages of large, national datasets, one continuing concern is missing data values. Complete case analysis, where only cases with complete data are analyzed, is commonly used rather than more statistically rigorous approaches such as multiple imputation. This study characterizes the potential selection bias introduced using complete case analysis and compares the results of common regressions using both techniques following unicompartmental knee arthroplasty. Patients undergoing unicompartmental knee arthroplasty were extracted from the 2005 to 2015 National Surgical Quality Improvement Program. As examples, the demographics of patients with and without missing preoperative albumin and hematocrit values were compared. Missing data were then treated with both complete case analysis and multiple imputation (an approach that reproduces the variation and associations that would have been present in a full dataset) and the conclusions of common regressions for adverse outcomes were compared. A total of 6117 patients were included, of which 56.7% were missing at least one value. Younger, female, and healthier patients were more likely to have missing preoperative albumin and hematocrit values. The use of complete case analysis removed 3467 patients from the study in comparison with multiple imputation which included all 6117 patients. The 2 methods of handling missing values led to differing associations of low preoperative laboratory values with commonly studied adverse outcomes. The use of complete case analysis can introduce selection bias and may lead to different conclusions in comparison with the statistically rigorous multiple imputation approach. Joint surgeons should consider the methods of handling missing values when interpreting arthroplasty research. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Comparing alternative approaches to measuring the geographical accessibility of urban health services: Distance types and aggregation-error issues

    Directory of Open Access Journals (Sweden)

    Riva Mylène

    2008-02-01

    Full Text Available Abstract Background Over the past two decades, geographical accessibility of urban resources for population living in residential areas has received an increased focus in urban health studies. Operationalising and computing geographical accessibility measures depend on a set of four parameters, namely definition of residential areas, a method of aggregation, a measure of accessibility, and a type of distance. Yet, the choice of these parameters may potentially generate different results leading to significant measurement errors. The aim of this paper is to compare discrepancies in results for geographical accessibility of selected health care services for residential areas (i.e. census tracts computed using different distance types and aggregation methods. Results First, the comparison of distance types demonstrates that Cartesian distances (Euclidean and Manhattan distances are strongly correlated with more accurate network distances (shortest network and shortest network time distances across the metropolitan area (Pearson correlation greater than 0.95. However, important local variations in correlation between Cartesian and network distances were observed notably in suburban areas where Cartesian distances were less precise. Second, the choice of the aggregation method is also important: in comparison to the most accurate aggregation method (population-weighted mean of the accessibility measure for census blocks within census tracts, accessibility measures computed from census tract centroids, though not inaccurate, yield important measurement errors for 5% to 10% of census tracts. Conclusion Although errors associated to the choice of distance types and aggregation method are only important for about 10% of census tracts located mainly in suburban areas, we should not avoid using the best estimation method possible for evaluating geographical accessibility. This is especially so if these measures are to be included as a dimension of the

  5. Spatiotemporal Pattern of Crime Using Geographic Information System (GIS) Approach in Dala L.G.A of Kano State, Nigeria

    OpenAIRE

    M. Ahmed

    2013-01-01

    This study explores the use of Geographic Information Systems (GIS) and spatial database of crime characteristics which helps in the determination of hotspots in Dala LGA of Kano State and also it identifies the challenges facing police departments that seek to implement computerized crime mapping systems. Different data sources were used, data from the Nigerian Police Force ( Dala and Jakara Division) of 2008 – 2010. For this study, the crime was divided into four categories: offence against...

  6. A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data

    NARCIS (Netherlands)

    Y.J. Kim (Young Jin); J. Lee (Juyoung); B.-J. Kim (Bong-Jo); T. Park (Taesung); G.R. Abecasis (Gonçalo); M.A.A. De Almeida (Marcio); D. Altshuler (David); J.L. Asimit (Jennifer L.); G. Atzmon (Gil); M. Barber (Mathew); A. Barzilai (Ari); N.L. Beer (Nicola L.); G.I. Bell (Graeme I.); J. Below (Jennifer); T. Blackwell (Tom); J. Blangero (John); M. Boehnke (Michael); D.W. Bowden (Donald W.); N.P. Burtt (Noël); J.C. Chambers (John); H. Chen (Han); P. Chen (Ping); P.S. Chines (Peter); S. Choi (Sungkyoung); C. Churchhouse (Claire); P. Cingolani (Pablo); B.K. Cornes (Belinda); N.J. Cox (Nancy); A.G. Day-Williams (Aaron); A. Duggirala (Aparna); J. Dupuis (Josée); T. Dyer (Thomas); S. Feng (Shuang); J. Fernandez-Tajes (Juan); T. Ferreira (Teresa); T.E. Fingerlin (Tasha E.); J. Flannick (Jason); J.C. Florez (Jose); P. Fontanillas (Pierre); T.M. Frayling (Timothy); C. Fuchsberger (Christian); E. Gamazon (Eric); K. Gaulton (Kyle); S. Ghosh (Saurabh); B. Glaser (Benjamin); A.L. Gloyn (Anna); R.L. Grossman (Robert L.); J. Grundstad (Jason); C. Hanis (Craig); A. Heath (Allison); H. Highland (Heather); M. Horikoshi (Momoko); I.-S. Huh (Ik-Soo); J.R. Huyghe (Jeroen R.); M.K. Ikram (Kamran); K.A. Jablonski (Kathleen); Y. Jun (Yang); N. Kato (Norihiro); J. Kim (Jayoun); Y.J. Kim (Young Jin); B.-J. Kim (Bong-Jo); J. Lee (Juyoung); C.R. King (C. Ryan); J.S. Kooner (Jaspal S.); M.-S. Kwon (Min-Seok); H.K. Im (Hae Kyung); M. Laakso (Markku); K.K.-Y. Lam (Kevin Koi-Yau); J. Lee (Jaehoon); S. Lee (Selyeong); S. Lee (Sungyoung); D.M. Lehman (Donna M.); H. Li (Heng); C.M. Lindgren (Cecilia); X. Liu (Xuanyao); O.E. Livne (Oren E.); A.E. Locke (Adam E.); A. Mahajan (Anubha); J.B. Maller (Julian B.); A.K. Manning (Alisa K.); T.J. Maxwell (Taylor J.); A. Mazoure (Alexander); M.I. McCarthy (Mark); J.B. Meigs (James B.); B. Min (Byungju); K.L. Mohlke (Karen); A.P. Morris (Andrew); S. Musani (Solomon); Y. Nagai (Yoshihiko); M.C.Y. Ng (Maggie C.Y.); D. Nicolae (Dan); S. Oh (Sohee); N.D. Palmer (Nicholette); T. Park (Taesung); T.I. Pollin (Toni I.); I. Prokopenko (Inga); D. Reich (David); M.A. Rivas (Manuel); L.J. Scott (Laura); M. Seielstad (Mark); Y.S. Cho (Yoon Shin); X. Sim (Xueling); R. Sladek (Rob); P. Smith (Philip); I. Tachmazidou (Ioanna); E.S. Tai (Shyong); Y.Y. Teo (Yik Ying); T.M. Teslovich (Tanya M.); J. Torres (Jason); V. Trubetskoy (Vasily); S.M. Willems (Sara); A.L. Williams (Amy L.); J.G. Wilson (James); S. Wiltshire (Steven); S. Won (Sungho); A.R. Wood (Andrew); W. Xu (Wang); J. Yoon (Joon); M. Zawistowski (Matthew); E. Zeggini (Eleftheria); W. Zhang (Weihua); S. Zöllner (Sebastian)

    2015-01-01

    textabstractBackground: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the

  7. Which DTW Method Applied to Marine Univariate Time Series Imputation

    OpenAIRE

    Phan , Thi-Thu-Hong; Caillault , Émilie; Lefebvre , Alain; Bigand , André

    2017-01-01

    International audience; Missing data are ubiquitous in any domains of applied sciences. Processing datasets containing missing values can lead to a loss of efficiency and unreliable results, especially for large missing sub-sequence(s). Therefore, the aim of this paper is to build a framework for filling missing values in univariate time series and to perform a comparison of different similarity metrics used for the imputation task. This allows to suggest the most suitable methods for the imp...

  8. FCMPSO: An Imputation for Missing Data Features in Heart Disease Classification

    Science.gov (United States)

    Salleh, Mohd Najib Mohd; Ashikin Samat, Nurul

    2017-08-01

    The application of data mining and machine learning in directing clinical research into possible hidden knowledge is becoming greatly influential in medical areas. Heart Disease is a killer disease around the world, and early prevention through efficient methods can help to reduce the mortality number. Medical data may contain many uncertainties, as they are fuzzy and vague in nature. Nonetheless, imprecise features data such as no values and missing values can affect quality of classification results. Nevertheless, the other complete features are still capable to give information in certain features. Therefore, an imputation approach based on Fuzzy C-Means and Particle Swarm Optimization (FCMPSO) is developed in preprocessing stage to help fill in the missing values. Then, the complete dataset is trained in classification algorithm, Decision Tree. The experiment is trained with Heart Disease dataset and the performance is analysed using accuracy, precision, and ROC values. Results show that the performance of Decision Tree is increased after the application of FCMSPO for imputation.

  9. Mapping and modelling the geographical distribution of soil-transmitted helminthiases in Peninsular Malaysia: implications for control approaches

    Directory of Open Access Journals (Sweden)

    Romano Ngui

    2014-05-01

    Full Text Available Soil-transmitted helminth (STH infections in Malaysia are still highly prevalent, especially in rural and remote communities. Complete estimations of the total disease burden in the country has not been performed, since available data are not easily accessible in the public domain. The current study utilised geographical information system (GIS to collate and map the distribution of STH infections from available empirical survey data in Peninsular Malaysia, highlighting areas where information is lacking. The assembled database, comprising surveys conducted between 1970 and 2012 in 99 different locations, represents one of the most comprehensive compilations of STH infections in the country. It was found that the geographical distribution of STH varies considerably with no clear pattern across the surveyed locations. Our attempt to generate predictive risk maps of STH infections on the basis of ecological limits such as climate and other environmental factors shows that the prevalence of Ascaris lumbricoides is low along the western coast and the southern part of the country, whilst the prevalence is high in the central plains and in the North. In the present study, we demonstrate that GIS can play an important role in providing data for the implementation of sustainable and effective STH control programmes to policy-makers and authorities in charge.

  10. Handling missing data for the identification of charged particles in a multilayer detector: A comparison between different imputation methods

    Energy Technology Data Exchange (ETDEWEB)

    Riggi, S., E-mail: sriggi@oact.inaf.it [INAF - Osservatorio Astrofisico di Catania (Italy); Riggi, D. [Keras Strategy - Milano (Italy); Riggi, F. [Dipartimento di Fisica e Astronomia - Università di Catania (Italy); INFN, Sezione di Catania (Italy)

    2015-04-21

    Identification of charged particles in a multilayer detector by the energy loss technique may also be achieved by the use of a neural network. The performance of the network becomes worse when a large fraction of information is missing, for instance due to detector inefficiencies. Algorithms which provide a way to impute missing information have been developed over the past years. Among the various approaches, we focused on normal mixtures’ models in comparison with standard mean imputation and multiple imputation methods. Further, to account for the intrinsic asymmetry of the energy loss data, we considered skew-normal mixture models and provided a closed form implementation in the Expectation-Maximization (EM) algorithm framework to handle missing patterns. The method has been applied to a test case where the energy losses of pions, kaons and protons in a six-layers’ Silicon detector are considered as input neurons to a neural network. Results are given in terms of reconstruction efficiency and purity of the various species in different momentum bins.

  11. Accounting for one-channel depletion improves missing value imputation in 2-dye microarray data.

    Science.gov (United States)

    Ritz, Cecilia; Edén, Patrik

    2008-01-19

    For 2-dye microarray platforms, some missing values may arise from an un-measurably low RNA expression in one channel only. Information of such "one-channel depletion" is so far not included in algorithms for imputation of missing values. Calculating the mean deviation between imputed values and duplicate controls in five datasets, we show that KNN-based imputation gives a systematic bias of the imputed expression values of one-channel depleted spots. Evaluating the correction of this bias by cross-validation showed that the mean square deviation between imputed values and duplicates were reduced up to 51%, depending on dataset. By including more information in the imputation step, we more accurately estimate missing expression values.

  12. A fuzzy approach to a multiple criteria and geographical information system for decision support on suitable locations for biogas plants

    DEFF Research Database (Denmark)

    Franco de los Rios, Camilo Andres; Bojesen, Mikkel; Hougaard, Jens Leth

    The purpose of this paper is to model the multi-criteria decision problem of identifying the most suitable facility locations for biogas plants under an integrated decision support methodology. Here the Geographical Information System (GIS) is used for measuring the attributes of the alternatives...... according to a given set of criteria. Measurements are taken in interval form, expressing the natural imprecision of common data, and the Fuzzy Weighted Overlap Dominance (FWOD) procedure is applied for aggregating and exploiting this kind of data, obtaining suitability degrees for every alternative....... The estimation of criteria weights, which is necessary for applying the FWOD procedure, is done by means of the Analytical Hierarchy Process (AHP), such that a combined AHP-FWOD methodology allows identifying the more suitable sites for building biogas plants. We show that the FWOD relevance-ranking procedure...

  13. A fuzzy approach to a multiple criteria and Geographical Information System for decision support on suitable locations for biogas plants

    DEFF Research Database (Denmark)

    Franco, Camilo; Bojesen, Mikkel; Hougaard, Jens Leth

    2015-01-01

    The purpose of this paper is to model the multi-criteria decision problem of identifying the most suitable facility locations for biogas plants under an integrated decision support methodology. Here the Geographical Information System (GIS) is used for measuring the attributes of the alternatives...... according to a given set of criteria. Measurements are taken in interval form, expressing the natural imprecision of common data, and the Fuzzy Weighted Overlap Dominance (FWOD) procedure is applied for aggregating and exploiting this kind of data, obtaining suitability degrees for every alternative...... suitable sites for building biogas plants. We show that the FWOD relevance-ranking procedure can also be successfully applied over the outcomes of different decision makers, in case a unique social solution is required to exist. The proposed methodology can be used under an integrated decision support...

  14. Geographic variation in species richness, rarity, and the selection of areas for conservation: An integrative approach with Brazilian estuarine fishes

    Science.gov (United States)

    Vilar, Ciro C.; Joyeux, Jean-Christophe; Spach, Henry L.

    2017-09-01

    While the number of species is a key indicator of ecological assemblages, spatial conservation priorities solely identified from species richness are not necessarily efficient to protect other important biological assets. Hence, the results of spatial prioritization analysis would be greatly enhanced if richness were used in association to complementary biodiversity measures. In this study, geographic patterns in estuarine fish species rarity (i.e. the average range size in the study area), endemism and richness, were mapped and integrated to identify regions important for biodiversity conservation along the Brazilian coast. Furthermore, we analyzed the effectiveness of the national system of protected areas to represent these regions. Analyses were performed on presence/absence data of 412 fish species in 0.25° latitudinal bands covering the entire Brazilian biogeographical province. Species richness, rarity and endemism patterns differed and strongly reflected biogeographical limits and regions. However, among the existing 154 latitudinal bands, 48 were recognized as conservation priorities by concomitantly harboring high estuarine fish species richness and assemblages of geographically rare species. Priority areas identified for all estuarine fish species largely differed from those identified for Brazilian endemics. Moreover, there was no significant correlation between the different aspects of the fish assemblages considered (i.e. species richness, endemism or rarity), suggesting that designating reserves based on a single variable may lead to large gaps in the overall protection of biodiversity. Our results further revealed that the existing system of protected areas is insufficient for representing the priority bands we identified. This highlights the urgent need for expanding the national network of protected areas to maintain estuarine ecosystems with high conservation value.

  15. GST M1-T1 null allele frequency patterns in geographically assorted human populations: a phylogenetic approach.

    Directory of Open Access Journals (Sweden)

    Senthilkumar Pitchalu Kasthurinaidu

    Full Text Available Genetic diversity in drug metabolism and disposition is mainly considered as the outcome of the inter-individual genetic variation in polymorphism of drug-xenobiotic metabolizing enzyme (XME. Among the XMEs, glutathione-S-transferases (GST gene loci are an important candidate for the investigation of diversity in allele frequency, as the deletion mutations in GST M1 and T1 genotypes are associated with various cancers and genetic disorders of all major Population Affiliations (PAs. Therefore, the present population based phylogenetic study was focused to uncover the frequency distribution pattern in GST M1 and T1 null genotypes among 45 Geographically Assorted Human Populations (GAHPs. The frequency distribution pattern for GST M1 and T1 null alleles have been detected in this study using the data derived from literatures representing 44 populations affiliated to Africa, Asia, Europe, South America and the genome of PA from Gujarat, a region in western India. Allele frequency counting for Gujarat PA and scattered plot analysis for geographical distribution among the PAs were performed in SPSS-21. The GST M1 and GST T1 null allele frequencies patterns of the PAs were computed in Seqboot, Gendist program of Phylip software package (3.69 versions and Unweighted Pair Group method with Arithmetic Mean in Mega-6 software. Allele frequencies from South African Xhosa tribe, East African Zimbabwe, East African Ethiopia, North African Egypt, Caucasian, South Asian Afghanistan and South Indian Andhra Pradesh have been identified as the probable seven patterns among the 45 GAHPs investigated in this study for GST M1-T1 null genotypes. The patternized null allele frequencies demonstrated in this study for the first time addresses the missing link in GST M1-T1 null allele frequencies among GAHPs.

  16. Performance of genotype imputation for low frequency and rare variants from the 1000 genomes.

    Science.gov (United States)

    Zheng, Hou-Feng; Rong, Jing-Jing; Liu, Ming; Han, Fang; Zhang, Xing-Wei; Richards, J Brent; Wang, Li

    2015-01-01

    Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF ≤ 0.3%), only 0-1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute.

  17. Geographical Tatoos

    Directory of Open Access Journals (Sweden)

    Valéria Cazetta

    2014-08-01

    Full Text Available The article deals with maps tattooed on bodies. My interest in studying the corporeality is inserted in a broader project entitled Geographies and (in Bodies. There is several published research on tattoos, but none in particular about tattooed maps. However some of these works interested me because they present important discussions in contemporary about body modification that helped me locate the body modifications most within the culture than on the nature. At this time, I looked at pictures of geographical tattoos available in several sites of the internet.

  18. Highly accurate sequence imputation enables precise QTL mapping in Brown Swiss cattle.

    Science.gov (United States)

    Frischknecht, Mirjam; Pausch, Hubert; Bapst, Beat; Signer-Hasler, Heidi; Flury, Christine; Garrick, Dorian; Stricker, Christian; Fries, Ruedi; Gredler-Grandl, Birgit

    2017-12-29

    Within the last few years a large amount of genomic information has become available in cattle. Densities of genomic information vary from a few thousand variants up to whole genome sequence information. In order to combine genomic information from different sources and infer genotypes for a common set of variants, genotype imputation is required. In this study we evaluated the accuracy of imputation from high density chips to whole genome sequence data in Brown Swiss cattle. Using four popular imputation programs (Beagle, FImpute, Impute2, Minimac) and various compositions of reference panels, the accuracy of the imputed sequence variant genotypes was high and differences between the programs and scenarios were small. We imputed sequence variant genotypes for more than 1600 Brown Swiss bulls and performed genome-wide association studies for milk fat percentage at two stages of lactation. We found one and three quantitative trait loci for early and late lactation fat content, respectively. Known causal variants that were imputed from the sequenced reference panel were among the most significantly associated variants of the genome-wide association study. Our study demonstrates that whole-genome sequence information can be imputed at high accuracy in cattle populations. Using imputed sequence variant genotypes in genome-wide association studies may facilitate causal variant detection.

  19. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    Science.gov (United States)

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. Imputation of genotypes in Danish two-way crossbred pigs using low density panels

    DEFF Research Database (Denmark)

    Xiang, Tao; Christensen, Ole Fredslund; Legarra, Andres

    Genotype imputation is commonly used as an initial step of genomic selection. Studies on humans, plants and ruminants suggested many factors would affect the performance of imputation. However, studies rarely investigated pigs, especially crossbred pigs. In this study, different scenarios...... of imputation from 5K SNPs to 7K SNPs on Danish Landrace, Yorkshire, and crossbred Landrace-Yorkshire were compared. In conclusion, genotype imputation on crossbreds performs equally well as in purebreds, when parental breeds are used as the reference panel. When the size of reference is considerably large...... SNPs. This dataset will be analyzed for genomic selection in a future study...

  1. Imputation and quality control steps for combining multiple genome-wide datasets

    Directory of Open Access Journals (Sweden)

    Shefali S Verma

    2014-12-01

    Full Text Available The electronic MEdical Records and GEnomics (eMERGE network brings together DNA biobanks linked to electronic health records (EHRs from multiple institutions. Approximately 52,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes, and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2 were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.

  2. Water quality analysis of the commercial boreholes in Mubi Metropolis, Adamawa State, Nigeria: geographic information system approach.

    Science.gov (United States)

    Mayomi, Ikusemoran; Elisha, Ibrahim

    2011-12-01

    It is observed that most of the commercial boreholes in Mubi Metropolis are located along River Yedzeram which is the main river that runs across the town. Unfortunately, due to the geographical location of the town in savanna region with minimal water supply, water related small scale industries such as sachet water, block making, irrigation agriculture, cloth dying, car wash and other pollution activities such as mechanical workshops and public toilets are also located along the same River Yedzeram. Moreover, the inhabitants of the town either dump their refuse in the River or spread it on their farmlands as there is no provision of refuse dump site by the government. Therefore, five parameters (Nitrate, Magnesium, Copper, Calcium and Iron) were used to test thewater quality of water samples that were collected from twenty two commercial boreholes along the river, using the standard examination of water and waste water of the World Health Organization to determine the water quality of the boreholes. The study revealed that only eight out of the twenty two boreholes are of good quality, while the others are either of bad quality or not portable. ArcGIS 9.2 and ILWIS 3.3 software were used to analyze the laboratory results through the use of SQL queries. It was recommended that the government should provide portable water, establish water quality control board and make use of GIS for creation of database and analysis.

  3. Global and Mexican analytical review of the state of the art on Ecosystem and Environmental services: A geographical approach

    Directory of Open Access Journals (Sweden)

    Maria Perevochtchikova

    2013-12-01

    Full Text Available The term Ecosystem Services (ES was introduced in the Rio Declaration in 1992, within a strong international movement for sustainable natural resource management. Back then, the innovative principle concerned the environmental functions that maintain life support systems. To illustrate this further, pollination, oxygen production, temperature regulation, water storage, filtering and distribution, among others, were listed and previously taken for granted until human action contested them. The first compensation schemes for Environmental Services were proposed in 1997 as one of the tools of the new environmental policy directed towards the principles of sustainable development. Since then, the topic of ES has received remarkable global response, which is reflected by the implementation of payment programs and by the development of research in many countries worldwide. This paper analyses the state of the art of the research carried out so far on ES and Environmental Services from the global and the Mexican perspectives. It is based upon the review of 1,781 scientific papers published in international peer reviewed journals between 1992 and 2012. Furthermore, the present study provides a sound geographical overview of the main ES topics studied and of the relative emission of papers per region, country or state. Results are finally presented and discussed in the light of their deficits and of the challenges ahead.

  4. A novel approach to find and optimize bin locations and collection routes using a geographic information system.

    Science.gov (United States)

    Erfani, Seyed Mohammad Hassan; Danesh, Shahnaz; Karrabi, Seyed Mohsen; Shad, Rouzbeh

    2017-07-01

    One of the major challenges in big cities is planning and implementation of an optimized, integrated solid waste management system. This optimization is crucial if environmental problems are to be prevented and the expenses to be reduced. A solid waste management system consists of many stages including collection, transfer and disposal. In this research, an integrated model was proposed and used to optimize two functional elements of municipal solid waste management (storage and collection systems) in the Ahmadabad neighbourhood located in the City of Mashhad - Iran. The integrated model was performed by modelling and solving the location allocation problem and capacitated vehicle routing problem (CVRP) through Geographic Information Systems (GIS). The results showed that the current collection system is not efficient owing to its incompatibility with the existing urban structure and population distribution. Application of the proposed model could significantly improve the storage and collection system. Based on the results of minimizing facilities analyses, scenarios with 100, 150 and 180 m walking distance were considered to find optimal bin locations for Alamdasht, C-metri and Koohsangi. The total number of daily collection tours was reduced to seven as compared to the eight tours carried out in the current system (12.50% reduction). In addition, the total number of required crews was minimized and reduced by 41.70% (24 crews in the current collection system vs 14 in the system provided by the model). The total collection vehicle routing was also optimized such that the total travelled distances during night and day working shifts was cut back by 53%.

  5. Actinobacterial community structure in the Polar Frontal waters of the Southern Ocean of the Antarctica using Geographic Information System (GIS: A novel approach to study Ocean Microbiome

    Directory of Open Access Journals (Sweden)

    P. Sivasankar

    2018-04-01

    Full Text Available Integration of microbiological data and geographical locations is necessary to understand the spatiotemporal patterns of the microbial diversity of an ecosystem. The Geographic Information System (GIS to map and catalogue the data on the actinobacterial diversity of the Southern Ocean waters was completed through sampling and analysis. Water samples collected at two sampling stations viz. Polar Front 1 (Station 1 and Polar Front 2 (Station 2 during 7th Indian Scientific Expedition to the Indian Ocean Sector of the Southern Ocean (SOE-2012-13 were used for analysis. At the outset, two different genera of Actinobacteria were recorded at both sampling stations. Streptomyces was the dominanted with the high score (> 60%, followed by Nocardiopsis (< 30% at both the sampling stations-Polar Front 1 and Polar Front 2-along with other invasive genera such as Agrococcus, Arthrobacter, Cryobacterium, Curtobacterium, Microbacterium, Marisediminicola, Rhodococcus and Kocuria. This data will help to discriminate the diversity and distribution pattern of the Actinobacteria in the Polar Frontal Region of the Southern Ocean waters. It is a novel approach useful for geospatial cataloguing of microbial diversity from extreme niches and in various environmental gradations. Furthermore, this research work will act as the milestone for bioprospecting of microbial communities and their products having potential applications in healthcare, agriculture and beneficial to mankind. Hence, this research work would have significance in creating a database on microbial communities of the Antarctic ecosystem. Keywords: Antarctica, Marine actinobacteria, Southern ocean, GIS, Polar Frontal waters, Microbiome

  6. Improved Correction of Misclassification Bias With Bootstrap Imputation.

    Science.gov (United States)

    van Walraven, Carl

    2018-07-01

    Diagnostic codes used in administrative database research can create bias due to misclassification. Quantitative bias analysis (QBA) can correct for this bias, requires only code sensitivity and specificity, but may return invalid results. Bootstrap imputation (BI) can also address misclassification bias but traditionally requires multivariate models to accurately estimate disease probability. This study compared misclassification bias correction using QBA and BI. Serum creatinine measures were used to determine severe renal failure status in 100,000 hospitalized patients. Prevalence of severe renal failure in 86 patient strata and its association with 43 covariates was determined and compared with results in which renal failure status was determined using diagnostic codes (sensitivity 71.3%, specificity 96.2%). Differences in results (misclassification bias) were then corrected with QBA or BI (using progressively more complex methods to estimate disease probability). In total, 7.4% of patients had severe renal failure. Imputing disease status with diagnostic codes exaggerated prevalence estimates [median relative change (range), 16.6% (0.8%-74.5%)] and its association with covariates [median (range) exponentiated absolute parameter estimate difference, 1.16 (1.01-2.04)]. QBA produced invalid results 9.3% of the time and increased bias in estimates of both disease prevalence and covariate associations. BI decreased misclassification bias with increasingly accurate disease probability estimates. QBA can produce invalid results and increase misclassification bias. BI avoids invalid results and can importantly decrease misclassification bias when accurate disease probability estimates are used.

  7. Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets

    Directory of Open Access Journals (Sweden)

    Min-Wei Huang

    2018-01-01

    Full Text Available Many real-world medical datasets contain some proportion of missing (attribute values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.

  8. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Science.gov (United States)

    2010-10-01

    ... money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS AND... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the representative...

  9. [Imputing missing data in public health: general concepts and application to dichotomous variables].

    Science.gov (United States)

    Hernández, Gilma; Moriña, David; Navarro, Albert

    The presence of missing data in collected variables is common in health surveys, but the subsequent imputation thereof at the time of analysis is not. Working with imputed data may have certain benefits regarding the precision of the estimators and the unbiased identification of associations between variables. The imputation process is probably still little understood by many non-statisticians, who view this process as highly complex and with an uncertain goal. To clarify these questions, this note aims to provide a straightforward, non-exhaustive overview of the imputation process to enable public health researchers ascertain its strengths. All this in the context of dichotomous variables which are commonplace in public health. To illustrate these concepts, an example in which missing data is handled by means of simple and multiple imputation is introduced. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.

  10. Missing data imputation using statistical and machine learning methods in a real breast cancer problem.

    Science.gov (United States)

    Jerez, José M; Molina, Ignacio; García-Laencina, Pedro J; Alba, Emilio; Ribelles, Nuria; Martín, Miguel; Franco, Leonardo

    2010-10-01

    Missing data imputation is an important task in cases where it is crucial to use all available data and not discard records with missing values. This work evaluates the performance of several statistical and machine learning imputation methods that were used to predict recurrence in patients in an extensive real breast cancer data set. Imputation methods based on statistical techniques, e.g., mean, hot-deck and multiple imputation, and machine learning techniques, e.g., multi-layer perceptron (MLP), self-organisation maps (SOM) and k-nearest neighbour (KNN), were applied to data collected through the "El Álamo-I" project, and the results were then compared to those obtained from the listwise deletion (LD) imputation method. The database includes demographic, therapeutic and recurrence-survival information from 3679 women with operable invasive breast cancer diagnosed in 32 different hospitals belonging to the Spanish Breast Cancer Research Group (GEICAM). The accuracies of predictions on early cancer relapse were measured using artificial neural networks (ANNs), in which different ANNs were estimated using the data sets with imputed missing values. The imputation methods based on machine learning algorithms outperformed imputation statistical methods in the prediction of patient outcome. Friedman's test revealed a significant difference (p=0.0091) in the observed area under the ROC curve (AUC) values, and the pairwise comparison test showed that the AUCs for MLP, KNN and SOM were significantly higher (p=0.0053, p=0.0048 and p=0.0071, respectively) than the AUC from the LD-based prognosis model. The methods based on machine learning techniques were the most suited for the imputation of missing values and led to a significant enhancement of prognosis accuracy compared to imputation methods based on statistical procedures. Copyright © 2010 Elsevier B.V. All rights reserved.

  11. Community Based Informatics: Geographical Information Systems, Remote Sensing and Ontology collaboration - A technical hands-on approach

    Science.gov (United States)

    Branch, B. D.; Raskin, R. G.; Rock, B.; Gagnon, M.; Lecompte, M. A.; Hayden, L. B.

    2009-12-01

    With the nation challenged to comply with Executive Order 12906 and its needs to augment the Science, Technology, Engineering and Mathematics (STEM) pipeline, applied focus on geosciences pipelines issue may be at risk. The Geosciences pipeline may require intentional K-12 standard course of study consideration in the form of project based, science based and evidenced based learning. Thus, the K-12 to geosciences to informatics pipeline may benefit from an earth science experience that utilizes a community based “learning by doing” approach. Terms such as Community GIS, Community Remotes Sensing, and Community Based Ontology development are termed Community Informatics. Here, approaches of interdisciplinary work to promote and earth science literacy are affordable, consisting of low cost equipment that renders GIS/remote sensing data processing skills necessary in the workforce. Hence, informal community ontology development may evolve or mature from a local community towards formal scientific community collaboration. Such consideration may become a means to engage educational policy towards earth science paradigms and needs, specifically linking synergy among Math, Computer Science, and Earth Science disciplines.

  12. Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS Data.

    Directory of Open Access Journals (Sweden)

    Ariel W Chan

    Full Text Available Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS methods, such as Genotyping-By-Sequencing (GBS, offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1 can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2 are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'. We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and

  13. Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.

    Science.gov (United States)

    Chan, Ariel W; Hamblin, Martha T; Jannink, Jean-Luc

    2016-01-01

    Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the

  14. Geographic variations in cervical cancer risk in San Luis Potosí state, Mexico: A spatial statistical approach.

    Science.gov (United States)

    Terán-Hernández, Mónica; Ramis-Prieto, Rebeca; Calderón-Hernández, Jaqueline; Garrocho-Rangel, Carlos Félix; Campos-Alanís, Juan; Ávalos-Lozano, José Antonio; Aguilar-Robledo, Miguel

    2016-09-29

    Worldwide, Cervical Cancer (CC) is the fourth most common type of cancer and cause of death in women. It is a significant public health problem, especially in low and middle-income/Gross Domestic Product (GDP) countries. In the past decade, several studies of CC have been published, that identify the main modifiable and non-modifiable CC risk factors for Mexican women. However, there are no studies that attempt to explain the residual spatial variation in CC incidence In Mexico, i.e. spatial variation that cannot be ascribed to known, spatially varying risk factors. This paper uses a spatial statistical methodology that takes into account spatial variation in socio-economic factors and accessibility to health services, whilst allowing for residual, unexplained spatial variation in risk. To describe residual spatial variations in CC risk, we used generalised linear mixed models (GLMM) with both spatially structured and unstructured random effects, using a Bayesian approach to inference. The highest risk is concentrated in the southeast, where the Matlapa and Aquismón municipalities register excessive risk, with posterior probabilities greater than 0.8. The lack of coverage of Cervical Cancer-Screening Programme (CCSP) (RR 1.17, 95 % CI 1.12-1.22), Marginalisation Index (RR 1.05, 95 % CI 1.03-1.08), and lack of accessibility to health services (RR 1.01, 95 % CI 1.00-1.03) were significant covariates. There are substantial differences between municipalities, with high-risk areas mainly in low-resource areas lacking accessibility to health services for CC. Our results clearly indicate the presence of spatial patterns, and the relevance of the spatial analysis for public health intervention. Ignoring the spatial variability means to continue a public policy that does not tackle deficiencies in its national CCSP and to keep disadvantaging and disempowering Mexican women in regard to their health care.

  15. Nonparametric autocovariance estimation from censored time series by Gaussian imputation.

    Science.gov (United States)

    Park, Jung Wook; Genton, Marc G; Ghosh, Sujit K

    2009-02-01

    One of the most frequently used methods to model the autocovariance function of a second-order stationary time series is to use the parametric framework of autoregressive and moving average models developed by Box and Jenkins. However, such parametric models, though very flexible, may not always be adequate to model autocovariance functions with sharp changes. Furthermore, if the data do not follow the parametric model and are censored at a certain value, the estimation results may not be reliable. We develop a Gaussian imputation method to estimate an autocovariance structure via nonparametric estimation of the autocovariance function in order to address both censoring and incorrect model specification. We demonstrate the effectiveness of the technique in terms of bias and efficiency with simulations under various rates of censoring and underlying models. We describe its application to a time series of silicon concentrations in the Arctic.

  16. Geographic information analysis and web-based geoportals to explore malnutrition in Sub-Saharan Africa: a systematic review of approaches.

    Science.gov (United States)

    Marx, Sabrina; Phalkey, Revati; Aranda-Jan, Clara B; Profe, Jörn; Sauerborn, Rainer; Höfle, Bernhard

    2014-11-20

    geographic data at household and local level is a major limitation for an in-depth assessment of malnutrition and links to potential impact factors. We propose that the combination of malnutrition-related studies with most recent GIScience developments such as crowd-sourced geodata collection, (web-based) interoperable spatial health data infrastructures as well as (dynamic) information fusion approaches are beneficial to deepen the understanding of this complex phenomenon.

  17. Differential network analysis with multiply imputed lipidomic data.

    Directory of Open Access Journals (Sweden)

    Maiju Kujala

    Full Text Available The importance of lipids for cell function and health has been widely recognized, e.g., a disorder in the lipid composition of cells has been related to atherosclerosis caused cardiovascular disease (CVD. Lipidomics analyses are characterized by large yet not a huge number of mutually correlated variables measured and their associations to outcomes are potentially of a complex nature. Differential network analysis provides a formal statistical method capable of inferential analysis to examine differences in network structures of the lipids under two biological conditions. It also guides us to identify potential relationships requiring further biological investigation. We provide a recipe to conduct permutation test on association scores resulted from partial least square regression with multiple imputed lipidomic data from the LUdwigshafen RIsk and Cardiovascular Health (LURIC study, particularly paying attention to the left-censored missing values typical for a wide range of data sets in life sciences. Left-censored missing values are low-level concentrations that are known to exist somewhere between zero and a lower limit of quantification. To make full use of the LURIC data with the missing values, we utilize state of the art multiple imputation techniques and propose solutions to the challenges that incomplete data sets bring to differential network analysis. The customized network analysis helps us to understand the complexities of the underlying biological processes by identifying lipids and lipid classes that interact with each other, and by recognizing the most important differentially expressed lipids between two subgroups of coronary artery disease (CAD patients, the patients that had a fatal CVD event and the ones who remained stable during two year follow-up.

  18. An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency.

    Science.gov (United States)

    Guo, Wei-Li; Huang, De-Shuang

    2017-08-22

    Transcription factors (TFs) are DNA-binding proteins that have a central role in regulating gene expression. Identification of DNA-binding sites of TFs is a key task in understanding transcriptional regulation, cellular processes and disease. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) enables genome-wide identification of in vivo TF binding sites. However, it is still difficult to map every TF in every cell line owing to cost and biological material availability, which poses an enormous obstacle for integrated analysis of gene regulation. To address this problem, we propose a novel computational approach, TFBSImpute, for predicting additional TF binding profiles by leveraging information from available ChIP-seq TF binding data. TFBSImpute fuses the dataset to a 3-mode tensor and imputes missing TF binding signals via simultaneous completion of multiple TF binding matrices with positional consistency. We show that signals predicted by our method achieve overall similarity with experimental data and that TFBSImpute significantly outperforms baseline approaches, by assessing the performance of imputation methods against observed ChIP-seq TF binding profiles. Besides, motif analysis shows that TFBSImpute preforms better in capturing binding motifs enriched in observed data compared with baselines, indicating that the higher performance of TFBSImpute is not simply due to averaging related samples. We anticipate that our approach will constitute a useful complement to experimental mapping of TF binding, which is beneficial for further study of regulation mechanisms and disease.

  19. An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.

    Science.gov (United States)

    Liu, Yuzhe; Gopalakrishnan, Vanathi

    2017-03-01

    Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.

  20. Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels

    Directory of Open Access Journals (Sweden)

    Xiaoyi eGao

    2012-06-01

    Full Text Available Genotype imputation is a vital tool in genome-wide association studies (GWAS and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for reference panels. Multiple GWAS and meta-analyses are targeting Latinos, the most populous and fastest growing minority group in the US. However, genotype imputation resources for Latinos are rather limited compared to individuals of European ancestry at present, largely because of the lack of good reference data. One choice of reference panel for Latinos is one derived from the population of Mexican individuals in Los Angeles contained in the HapMap Phase 3 project and the 1000 Genomes Project. However, a detailed evaluation of the quality of the imputed genotypes derived from the public reference panels has not yet been reported. Using simulation studies, the Illumina OmniExpress GWAS data from the Los Angles Latino Eye Study and the MACH software package, we evaluated the accuracy of genotype imputation in Latinos. Our results show that the 1000 Genomes Project AMR+CEU+YRI reference panel provides the highest imputation accuracy for Latinos, and that also including Asian samples in the panel can reduce imputation accuracy. We also provide the imputation accuracy for each autosomal chromosome using the 1000 Genomes Project panel for Latinos. Our results serve as a guide to future imputation-based analysis in Latinos.

  1. Increasing imputation and prediction accuracy for Chinese Holsteins using joint Chinese-Nordic reference population

    DEFF Research Database (Denmark)

    Ma, Peipei; Lund, Mogens Sandø; Ding, X

    2015-01-01

    This study investigated the effect of including Nordic Holsteins in the reference population on the imputation accuracy and prediction accuracy for Chinese Holsteins. The data used in this study include 85 Chinese Holstein bulls genotyped with both 54K chip and 777K (HD) chip, 2862 Chinese cows...... was improved slightly when using the marker data imputed based on the combined HD reference data, compared with using the marker data imputed based on the Chinese HD reference data only. On the other hand, when using the combined reference population including 4398 Nordic Holstein bulls, the accuracy...... to increase reference population rather than increasing marker density...

  2. Introductory comments on the USGS geographic applications program

    Science.gov (United States)

    Gerlach, A. C.

    1970-01-01

    The third phase of remote sensing technologies and potentials applied to the operations of the U.S. Geological Survey is introduced. Remote sensing data with multidisciplinary spatial data from traditional sources is combined with geographic theory and techniques of environmental modeling. These combined imputs are subject to four sequential activities that involve: (1) thermatic mapping of land use and environmental factors; (2) the dynamics of change detection; (3) environmental surveillance to identify sudden changes and general trends; and (4) preparation of statistical model and analytical reports. Geography program functions, products, clients, and goals are presented in graphical form, along with aircraft photo missions, geography test sites, and FY-70.

  3. Multiple imputation of rainfall missing data in the Iberian Mediterranean context

    Science.gov (United States)

    Miró, Juan Javier; Caselles, Vicente; Estrela, María José

    2017-11-01

    Given the increasing need for complete rainfall data networks, in recent years have been proposed diverse methods for filling gaps in observed precipitation series, progressively more advanced that traditional approaches to overcome the problem. The present study has consisted in validate 10 methods (6 linear, 2 non-linear and 2 hybrid) that allow multiple imputation, i.e., fill at the same time missing data of multiple incomplete series in a dense network of neighboring stations. These were applied for daily and monthly rainfall in two sectors in the Júcar River Basin Authority (east Iberian Peninsula), which is characterized by a high spatial irregularity and difficulty of rainfall estimation. A classification of precipitation according to their genetic origin was applied as pre-processing, and a quantile-mapping adjusting as post-processing technique. The results showed in general a better performance for the non-linear and hybrid methods, highlighting that the non-linear PCA (NLPCA) method outperforms considerably the Self Organizing Maps (SOM) method within non-linear approaches. On linear methods, the Regularized Expectation Maximization method (RegEM) was the best, but far from NLPCA. Applying EOF filtering as post-processing of NLPCA (hybrid approach) yielded the best results.

  4. Understanding Africa: A Geographic Approach

    Science.gov (United States)

    2009-01-01

    down primarily through word of mouth and unifying practices, then further through dances, paintings, sculpture, music festivals , and archeological...Petroleum products 13 33 39 U.K. 18 24 81 75 24,900 U.S. 23 21 76 78 52,690 Note: data from 2007. Tourism data from 2000, except Botswana...Intelligence Agency 2008; Tourism : Nevin 2003; U.S. and U.K. Tourism : World Tourism Organization 2001) Despite the previous examples, Africa cannot be

  5. Identification of environmental parameters and risk mapping of visceral leishmaniasis in Ethiopia by using geographical information systems and a statistical approach

    Directory of Open Access Journals (Sweden)

    Teshome Tsegaw

    2013-05-01

    Full Text Available Visceral leishmaniasis (VL, a vector-borne disease strongly influenced by environmental factors, has (re-emerged in Ethiopia during the last two decades and is currently of increasing public health concern. Based on VL incidence in each locality (kebele documented from federal or regional health bureaus and/or hospital records in the country, geographical information systems (GIS, coupled with binary and multivariate logistic regression methods, were employed to develop a risk map for Ethiopia with respect to VL based on soil type, altitude, rainfall, slope and temperature. The risk model was subsequently validated in selected sites. This environmental VL risk model provided an overall prediction accuracy of 86% with mean land surface temperature and soil type found to be the best predictors of VL. The total population at risk was estimated at 3.2 million according to the national population census in 2007. The approach presented here should facilitate the identification of priority areas for intervention and the monitoring of trends as well as providing input for further epidemiological and applied research with regard to this disease in Ethiopia.

  6. A hybrid segmentation approach for geographic atrophy in fundus auto-fluorescence images for diagnosis of age-related macular degeneration.

    Science.gov (United States)

    Lee, Noah; Laine, Andrew F; Smith, R Theodore

    2007-01-01

    Fundus auto-fluorescence (FAF) images with hypo-fluorescence indicate geographic atrophy (GA) of the retinal pigment epithelium (RPE) in age-related macular degeneration (AMD). Manual quantification of GA is time consuming and prone to inter- and intra-observer variability. Automatic quantification is important for determining disease progression and facilitating clinical diagnosis of AMD. In this paper we describe a hybrid segmentation method for GA quantification by identifying hypo-fluorescent GA regions from other interfering retinal vessel structures. First, we employ background illumination correction exploiting a non-linear adaptive smoothing operator. Then, we use the level set framework to perform segmentation of hypo-fluorescent areas. Finally, we present an energy function combining morphological scale-space analysis with a geometric model-based approach to perform segmentation refinement of false positive hypo- fluorescent areas due to interfering retinal structures. The clinically apparent areas of hypo-fluorescence were drawn by an expert grader and compared on a pixel by pixel basis to our segmentation results. The mean sensitivity and specificity of the ROC analysis were 0.89 and 0.98%.

  7. RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

    KAUST Repository

    Kim, Ji-Sung; Gao, Xin; Rzhetsky, Andrey

    2018-01-01

    are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race

  8. Imputation methods for filling missing data in urban air pollution data for Malaysia

    Directory of Open Access Journals (Sweden)

    Nur Afiqah Zakaria

    2018-06-01

    Full Text Available The air quality measurement data obtained from the continuous ambient air quality monitoring (CAAQM station usually contained missing data. The missing observations of the data usually occurred due to machine failure, routine maintenance and human error. In this study, the hourly monitoring data of CO, O3, PM10, SO2, NOx, NO2, ambient temperature and humidity were used to evaluate four imputation methods (Mean Top Bottom, Linear Regression, Multiple Imputation and Nearest Neighbour. The air pollutants observations were simulated into four percentages of simulated missing data i.e. 5%, 10%, 15% and 20%. Performance measures namely the Mean Absolute Error, Root Mean Squared Error, Coefficient of Determination and Index of Agreement were used to describe the goodness of fit of the imputation methods. From the results of the performance measures, Mean Top Bottom method was selected as the most appropriate imputation method for filling in the missing values in air pollutants data.

  9. Flexible Modeling of Survival Data with Covariates Subject to Detection Limits via Multiple Imputation.

    Science.gov (United States)

    Bernhardt, Paul W; Wang, Huixia Judy; Zhang, Daowen

    2014-01-01

    Models for survival data generally assume that covariates are fully observed. However, in medical studies it is not uncommon for biomarkers to be censored at known detection limits. A computationally-efficient multiple imputation procedure for modeling survival data with covariates subject to detection limits is proposed. This procedure is developed in the context of an accelerated failure time model with a flexible seminonparametric error distribution. The consistency and asymptotic normality of the multiple imputation estimator are established and a consistent variance estimator is provided. An iterative version of the proposed multiple imputation algorithm that approximates the EM algorithm for maximum likelihood is also suggested. Simulation studies demonstrate that the proposed multiple imputation methods work well while alternative methods lead to estimates that are either biased or more variable. The proposed methods are applied to analyze the dataset from a recently-conducted GenIMS study.

  10. Comparison of three boosting methods in parent-offspring trios for genotype imputation using simulation study

    Directory of Open Access Journals (Sweden)

    Abbas Mikhchi

    2016-01-01

    Full Text Available Abstract Background Genotype imputation is an important process of predicting unknown genotypes, which uses reference population with dense genotypes to predict missing genotypes for both human and animal genetic variations at a low cost. Machine learning methods specially boosting methods have been used in genetic studies to explore the underlying genetic profile of disease and build models capable of predicting missing values of a marker. Methods In this study strategies and factors affecting the imputation accuracy of parent-offspring trios compared from lower-density SNP panels (5 K to high density (10 K SNP panel using three different Boosting methods namely TotalBoost (TB, LogitBoost (LB and AdaBoost (AB. The methods employed using simulated data to impute the un-typed SNPs in parent-offspring trios. Four different datasets of G1 (100 trios with 5 k SNPs, G2 (100 trios with 10 k SNPs, G3 (500 trios with 5 k SNPs, and G4 (500 trio with 10 k SNPs were simulated. In four datasets all parents were genotyped completely, and offspring genotyped with a lower density panel. Results Comparison of the three methods for imputation showed that the LB outperformed AB and TB for imputation accuracy. The time of computation were different between methods. The AB was the fastest algorithm. The higher SNP densities resulted the increase of the accuracy of imputation. Larger trios (i.e. 500 was better for performance of LB and TB. Conclusions The conclusion is that the three methods do well in terms of imputation accuracy also the dense chip is recommended for imputation of parent-offspring trios.

  11. Simple nuclear norm based algorithms for imputing missing data and forecasting in time series

    OpenAIRE

    Butcher, Holly Louise; Gillard, Jonathan William

    2017-01-01

    There has been much recent progress on the use of the nuclear norm for the so-called matrix completion problem (the problem of imputing missing values of a matrix). In this paper we investigate the use of the nuclear norm for modelling time series, with particular attention to imputing missing data and forecasting. We introduce a simple alternating projections type algorithm based on the nuclear norm for these tasks, and consider a number of practical examples.

  12. The utility of imputed matched sets. Analyzing probabilistically linked databases in a low information setting.

    Science.gov (United States)

    Thomas, A M; Cook, L J; Dean, J M; Olson, L M

    2014-01-01

    To compare results from high probability matched sets versus imputed matched sets across differing levels of linkage information. A series of linkages with varying amounts of available information were performed on two simulated datasets derived from multiyear motor vehicle crash (MVC) and hospital databases, where true matches were known. Distributions of high probability and imputed matched sets were compared against the true match population for occupant age, MVC county, and MVC hour. Regression models were fit to simulated log hospital charges and hospitalization status. High probability and imputed matched sets were not significantly different from occupant age, MVC county, and MVC hour in high information settings (p > 0.999). In low information settings, high probability matched sets were significantly different from occupant age and MVC county (p sets were not (p > 0.493). High information settings saw no significant differences in inference of simulated log hospital charges and hospitalization status between the two methods. High probability and imputed matched sets were significantly different from the outcomes in low information settings; however, imputed matched sets were more robust. The level of information available to a linkage is an important consideration. High probability matched sets are suitable for high to moderate information settings and for situations involving case-specific analysis. Conversely, imputed matched sets are preferable for low information settings when conducting population-based analyses.

  13. Missing Value Imputation Based on Gaussian Mixture Model for the Internet of Things

    Directory of Open Access Journals (Sweden)

    Xiaobo Yan

    2015-01-01

    Full Text Available This paper addresses missing value imputation for the Internet of Things (IoT. Nowadays, the IoT has been used widely and commonly by a variety of domains, such as transportation and logistics domain and healthcare domain. However, missing values are very common in the IoT for a variety of reasons, which results in the fact that the experimental data are incomplete. As a result of this, some work, which is related to the data of the IoT, can’t be carried out normally. And it leads to the reduction in the accuracy and reliability of the data analysis results. This paper, for the characteristics of the data itself and the features of missing data in IoT, divides the missing data into three types and defines three corresponding missing value imputation problems. Then, we propose three new models to solve the corresponding problems, and they are model of missing value imputation based on context and linear mean (MCL, model of missing value imputation based on binary search (MBS, and model of missing value imputation based on Gaussian mixture model (MGI. Experimental results showed that the three models can improve the accuracy, reliability, and stability of missing value imputation greatly and effectively.

  14. Body size and geographic range do not explain long term variation in fish populations: a Bayesian phylogenetic approach to testing assembly processes in stream fish assemblages.

    Directory of Open Access Journals (Sweden)

    Stephen J Jacquemin

    Full Text Available We combine evolutionary biology and community ecology to test whether two species traits, body size and geographic range, explain long term variation in local scale freshwater stream fish assemblages. Body size and geographic range are expected to influence several aspects of fish ecology, via relationships with niche breadth, dispersal, and abundance. These traits are expected to scale inversely with niche breadth or current abundance, and to scale directly with dispersal potential. However, their utility to explain long term temporal patterns in local scale abundance is not known. Comparative methods employing an existing molecular phylogeny were used to incorporate evolutionary relatedness in a test for covariation of body size and geographic range with long term (1983 - 2010 local scale population variation of fishes in West Fork White River (Indiana, USA. The Bayesian model incorporating phylogenetic uncertainty and correlated predictors indicated that neither body size nor geographic range explained significant variation in population fluctuations over a 28 year period. Phylogenetic signal data indicated that body size and geographic range were less similar among taxa than expected if trait evolution followed a purely random walk. We interpret this as evidence that local scale population variation may be influenced less by species-level traits such as body size or geographic range, and instead may be influenced more strongly by a taxon's local scale habitat and biotic assemblages.

  15. Multi-approaches analysis reveals local adaptation in the emmer wheat (Triticum dicoccoides) at macro- but not micro-geographical scale.

    Science.gov (United States)

    Volis, Sergei; Ormanbekova, Danara; Yermekbayev, Kanat; Song, Minshu; Shulgina, Irina

    2015-01-01

    Detecting local adaptation and its spatial scale is one of the most important questions of evolutionary biology. However, recognition of the effect of local selection can be challenging when there is considerable environmental variation across the distance at the whole species range. We analyzed patterns of local adaptation in emmer wheat, Triticum dicoccoides, at two spatial scales, small (inter-population distance less than one km) and large (inter-population distance more than 50 km) using several approaches. Plants originating from four distinct habitats at two geographic scales (cold edge, arid edge and two topographically dissimilar core locations) were reciprocally transplanted and their success over time was measured as 1) lifetime fitness in a year of planting, and 2) population growth four years after planting. In addition, we analyzed molecular (SSR) and quantitative trait variation and calculated the QST/FST ratio. No home advantage was detected at the small spatial scale. At the large spatial scale, home advantage was detected for the core population and the cold edge population in the year of introduction via measuring life-time plant performance. However, superior performance of the arid edge population in its own environment was evident only after several generations via measuring experimental population growth rate through genotyping with SSRs allowing counting the number of plants and seeds per introduced genotype per site. These results highlight the importance of multi-generation surveys of population growth rate in local adaptation testing. Despite predominant self-fertilization of T. dicoccoides and the associated high degree of structuring of genetic variation, the results of the QST - FST comparison were in general agreement with the pattern of local adaptation at the two spatial scales detected by reciprocal transplanting.

  16. Nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks.

    Science.gov (United States)

    Li, YuanYuan; Parker, Lynne E

    2014-01-01

    Missing data is common in Wireless Sensor Networks (WSNs), especially with multi-hop communications. There are many reasons for this phenomenon, such as unstable wireless communications, synchronization issues, and unreliable sensors. Unfortunately, missing data creates a number of problems for WSNs. First, since most sensor nodes in the network are battery-powered, it is too expensive to have the nodes retransmit missing data across the network. Data re-transmission may also cause time delays when detecting abnormal changes in an environment. Furthermore, localized reasoning techniques on sensor nodes (such as machine learning algorithms to classify states of the environment) are generally not robust enough to handle missing data. Since sensor data collected by a WSN is generally correlated in time and space, we illustrate how replacing missing sensor values with spatially and temporally correlated sensor values can significantly improve the network's performance. However, our studies show that it is important to determine which nodes are spatially and temporally correlated with each other. Simple techniques based on Euclidean distance are not sufficient for complex environmental deployments. Thus, we have developed a novel Nearest Neighbor (NN) imputation method that estimates missing data in WSNs by learning spatial and temporal correlations between sensor nodes. To improve the search time, we utilize a k d-tree data structure, which is a non-parametric, data-driven binary search tree. Instead of using traditional mean and variance of each dimension for k d-tree construction, and Euclidean distance for k d-tree search, we use weighted variances and weighted Euclidean distances based on measured percentages of missing data. We have evaluated this approach through experiments on sensor data from a volcano dataset collected by a network of Crossbow motes, as well as experiments using sensor data from a highway traffic monitoring application. Our experimental

  17. Comparison of different methods for imputing genome-wide marker genotypes in Swedish and Finnish Red Cattle

    DEFF Research Database (Denmark)

    Ma, Peipei; Brøndum, Rasmus Froberg; Qin, Zahng

    2013-01-01

    This study investigated the imputation accuracy of different methods, considering both the minor allele frequency and relatedness between individuals in the reference and test data sets. Two data sets from the combined population of Swedish and Finnish Red Cattle were used to test the influence...... coefficient was lower when the minor allele frequency was lower. The results indicate that Beagle and IMPUTE2 provide the most robust and accurate imputation accuracies, but considering computing time and memory usage, FImpute is another alternative method....

  18. Comparison of missing value imputation methods in time series: the case of Turkish meteorological data

    Science.gov (United States)

    Yozgatligil, Ceylan; Aslan, Sipan; Iyigun, Cem; Batmaz, Inci

    2013-04-01

    This study aims to compare several imputation methods to complete the missing values of spatio-temporal meteorological time series. To this end, six imputation methods are assessed with respect to various criteria including accuracy, robustness, precision, and efficiency for artificially created missing data in monthly total precipitation and mean temperature series obtained from the Turkish State Meteorological Service. Of these methods, simple arithmetic average, normal ratio (NR), and NR weighted with correlations comprise the simple ones, whereas multilayer perceptron type neural network and multiple imputation strategy adopted by Monte Carlo Markov Chain based on expectation-maximization (EM-MCMC) are computationally intensive ones. In addition, we propose a modification on the EM-MCMC method. Besides using a conventional accuracy measure based on squared errors, we also suggest the correlation dimension (CD) technique of nonlinear dynamic time series analysis which takes spatio-temporal dependencies into account for evaluating imputation performances. Depending on the detailed graphical and quantitative analysis, it can be said that although computational methods, particularly EM-MCMC method, are computationally inefficient, they seem favorable for imputation of meteorological time series with respect to different missingness periods considering both measures and both series studied. To conclude, using the EM-MCMC algorithm for imputing missing values before conducting any statistical analyses of meteorological data will definitely decrease the amount of uncertainty and give more robust results. Moreover, the CD measure can be suggested for the performance evaluation of missing data imputation particularly with computational methods since it gives more precise results in meteorological time series.

  19. Imputation-based analysis of association studies: candidate regions and quantitative traits.

    Directory of Open Access Journals (Sweden)

    Bertrand Servin

    2007-07-01

    Full Text Available We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest with genotype data at tag SNPs collected on a phenotyped study sample, to estimate ("impute" unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene, the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html.

  20. Multiple imputation for estimating the risk of developing dementia and its impact on survival.

    Science.gov (United States)

    Yu, Binbing; Saczynski, Jane S; Launer, Lenore

    2010-10-01

    Dementia, Alzheimer's disease in particular, is one of the major causes of disability and decreased quality of life among the elderly and a leading obstacle to successful aging. Given the profound impact on public health, much research has focused on the age-specific risk of developing dementia and the impact on survival. Early work has discussed various methods of estimating age-specific incidence of dementia, among which the illness-death model is popular for modeling disease progression. In this article we use multiple imputation to fit multi-state models for survival data with interval censoring and left truncation. This approach allows semi-Markov models in which survival after dementia depends on onset age. Such models can be used to estimate the cumulative risk of developing dementia in the presence of the competing risk of dementia-free death. Simulations are carried out to examine the performance of the proposed method. Data from the Honolulu Asia Aging Study are analyzed to estimate the age-specific and cumulative risks of dementia and to examine the effect of major risk factors on dementia onset and death.

  1. Analysis of Case-Control Association Studies: SNPs, Imputation and Haplotypes

    KAUST Repository

    Chatterjee, Nilanjan

    2009-11-01

    Although prospective logistic regression is the standard method of analysis for case-control data, it has been recently noted that in genetic epidemiologic studies one can use the "retrospective" likelihood to gain major power by incorporating various population genetics model assumptions such as Hardy-Weinberg-Equilibrium (HWE), gene-gene and gene-environment independence. In this article we review these modern methods and contrast them with the more classical approaches through two types of applications (i) association tests for typed and untyped single nucleotide polymorphisms (SNPs) and (ii) estimation of haplotype effects and haplotype-environment interactions in the presence of haplotype-phase ambiguity. We provide novel insights to existing methods by construction of various score-tests and pseudo-likelihoods. In addition, we describe a novel two-stage method for analysis of untyped SNPs that can use any flexible external algorithm for genotype imputation followed by a powerful association test based on the retrospective likelihood. We illustrate applications of the methods using simulated and real data. © Institute of Mathematical Statistics, 2009.

  2. Analysis of Case-Control Association Studies: SNPs, Imputation and Haplotypes

    KAUST Repository

    Chatterjee, Nilanjan; Chen, Yi-Hau; Luo, Sheng; Carroll, Raymond J.

    2009-01-01

    Although prospective logistic regression is the standard method of analysis for case-control data, it has been recently noted that in genetic epidemiologic studies one can use the "retrospective" likelihood to gain major power by incorporating various population genetics model assumptions such as Hardy-Weinberg-Equilibrium (HWE), gene-gene and gene-environment independence. In this article we review these modern methods and contrast them with the more classical approaches through two types of applications (i) association tests for typed and untyped single nucleotide polymorphisms (SNPs) and (ii) estimation of haplotype effects and haplotype-environment interactions in the presence of haplotype-phase ambiguity. We provide novel insights to existing methods by construction of various score-tests and pseudo-likelihoods. In addition, we describe a novel two-stage method for analysis of untyped SNPs that can use any flexible external algorithm for genotype imputation followed by a powerful association test based on the retrospective likelihood. We illustrate applications of the methods using simulated and real data. © Institute of Mathematical Statistics, 2009.

  3. Sensitivity analysis in multiple imputation in effectiveness studies of psychotherapy.

    Science.gov (United States)

    Crameri, Aureliano; von Wyl, Agnes; Koemeda, Margit; Schulthess, Peter; Tschuschke, Volker

    2015-01-01

    The importance of preventing and treating incomplete data in effectiveness studies is nowadays emphasized. However, most of the publications focus on randomized clinical trials (RCT). One flexible technique for statistical inference with missing data is multiple imputation (MI). Since methods such as MI rely on the assumption of missing data being at random (MAR), a sensitivity analysis for testing the robustness against departures from this assumption is required. In this paper we present a sensitivity analysis technique based on posterior predictive checking, which takes into consideration the concept of clinical significance used in the evaluation of intra-individual changes. We demonstrate the possibilities this technique can offer with the example of irregular longitudinal data collected with the Outcome Questionnaire-45 (OQ-45) and the Helping Alliance Questionnaire (HAQ) in a sample of 260 outpatients. The sensitivity analysis can be used to (1) quantify the degree of bias introduced by missing not at random data (MNAR) in a worst reasonable case scenario, (2) compare the performance of different analysis methods for dealing with missing data, or (3) detect the influence of possible violations to the model assumptions (e.g., lack of normality). Moreover, our analysis showed that ratings from the patient's and therapist's version of the HAQ could significantly improve the predictive value of the routine outcome monitoring based on the OQ-45. Since analysis dropouts always occur, repeated measurements with the OQ-45 and the HAQ analyzed with MI are useful to improve the accuracy of outcome estimates in quality assurance assessments and non-randomized effectiveness studies in the field of outpatient psychotherapy.

  4. Dealing with missing data in a multi-question depression scale: a comparison of imputation methods

    Directory of Open Access Journals (Sweden)

    Stuart Heather

    2006-12-01

    Full Text Available Abstract Background Missing data present a challenge to many research projects. The problem is often pronounced in studies utilizing self-report scales, and literature addressing different strategies for dealing with missing data in such circumstances is scarce. The objective of this study was to compare six different imputation techniques for dealing with missing data in the Zung Self-reported Depression scale (SDS. Methods 1580 participants from a surgical outcomes study completed the SDS. The SDS is a 20 question scale that respondents complete by circling a value of 1 to 4 for each question. The sum of the responses is calculated and respondents are classified as exhibiting depressive symptoms when their total score is over 40. Missing values were simulated by randomly selecting questions whose values were then deleted (a missing completely at random simulation. Additionally, a missing at random and missing not at random simulation were completed. Six imputation methods were then considered; 1 multiple imputation, 2 single regression, 3 individual mean, 4 overall mean, 5 participant's preceding response, and 6 random selection of a value from 1 to 4. For each method, the imputed mean SDS score and standard deviation were compared to the population statistics. The Spearman correlation coefficient, percent misclassified and the Kappa statistic were also calculated. Results When 10% of values are missing, all the imputation methods except random selection produce Kappa statistics greater than 0.80 indicating 'near perfect' agreement. MI produces the most valid imputed values with a high Kappa statistic (0.89, although both single regression and individual mean imputation also produced favorable results. As the percent of missing information increased to 30%, or when unbalanced missing data were introduced, MI maintained a high Kappa statistic. The individual mean and single regression method produced Kappas in the 'substantial agreement' range

  5. PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.

    Directory of Open Access Journals (Sweden)

    Oren E Livne

    2015-03-01

    Full Text Available Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm, a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs, from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.

  6. Multiple imputation by chained equations for systematically and sporadically missing multilevel data.

    Science.gov (United States)

    Resche-Rigon, Matthieu; White, Ian R

    2018-06-01

    In multilevel settings such as individual participant data meta-analysis, a variable is 'systematically missing' if it is wholly missing in some clusters and 'sporadically missing' if it is partly missing in some clusters. Previously proposed methods to impute incomplete multilevel data handle either systematically or sporadically missing data, but frequently both patterns are observed. We describe a new multiple imputation by chained equations (MICE) algorithm for multilevel data with arbitrary patterns of systematically and sporadically missing variables. The algorithm is described for multilevel normal data but can easily be extended for other variable types. We first propose two methods for imputing a single incomplete variable: an extension of an existing method and a new two-stage method which conveniently allows for heteroscedastic data. We then discuss the difficulties of imputing missing values in several variables in multilevel data using MICE, and show that even the simplest joint multilevel model implies conditional models which involve cluster means and heteroscedasticity. However, a simulation study finds that the proposed methods can be successfully combined in a multilevel MICE procedure, even when cluster means are not included in the imputation models.

  7. Geographical information systems

    DEFF Research Database (Denmark)

    Möller, Bernd

    2004-01-01

    The chapter gives an introduction to Geographical Information Systems (GIS) with particular focus on their application within environmental management.......The chapter gives an introduction to Geographical Information Systems (GIS) with particular focus on their application within environmental management....

  8. Geographic Media Literacy

    Science.gov (United States)

    Lukinbeal, Chris

    2014-01-01

    While the use of media permeates geographic research and pedagogic practice, the underlying literacies that link geography and media remain uncharted. This article argues that geographic media literacy incorporates visual literacy, information technology literacy, information literacy, and media literacy. Geographic media literacy is the ability…

  9. Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy.

    Science.gov (United States)

    Johnson, Eric O; Hancock, Dana B; Levy, Joshua L; Gaddis, Nathan C; Saccone, Nancy L; Bierut, Laura J; Page, Grier P

    2013-05-01

    A great promise of publicly sharing genome-wide association data is the potential to create composite sets of controls. However, studies often use different genotyping arrays, and imputation to a common set of SNPs has shown substantial bias: a problem which has no broadly applicable solution. Based on the idea that using differing genotyped SNP sets as inputs creates differential imputation errors and thus bias in the composite set of controls, we examined the degree to which each of the following occurs: (1) imputation based on the union of genotyped SNPs (i.e., SNPs available on one or more arrays) results in bias, as evidenced by spurious associations (type 1 error) between imputed genotypes and arbitrarily assigned case/control status; (2) imputation based on the intersection of genotyped SNPs (i.e., SNPs available on all arrays) does not evidence such bias; and (3) imputation quality varies by the size of the intersection of genotyped SNP sets. Imputations were conducted in European Americans and African Americans with reference to HapMap phase II and III data. Imputation based on the union of genotyped SNPs across the Illumina 1M and 550v3 arrays showed spurious associations for 0.2 % of SNPs: ~2,000 false positives per million SNPs imputed. Biases remained problematic for very similar arrays (550v1 vs. 550v3) and were substantial for dissimilar arrays (Illumina 1M vs. Affymetrix 6.0). In all instances, imputing based on the intersection of genotyped SNPs (as few as 30 % of the total SNPs genotyped) eliminated such bias while still achieving good imputation quality.

  10. A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers

    Directory of Open Access Journals (Sweden)

    Concepción Crespo Turrado

    2015-12-01

    Full Text Available Nowadays, data collection is a key process in the study of electrical power networks when searching for harmonics and a lack of balance among phases. In this context, the lack of data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, and current in each phase and power factor adversely affects any time series study performed. When this occurs, a data imputation process must be accomplished in order to substitute the data that is missing for estimated values. This paper presents a novel missing data imputation method based on multivariate adaptive regression splines (MARS and compares it with the well-known technique called multivariate imputation by chained equations (MICE. The results obtained demonstrate how the proposed method outperforms the MICE algorithm.

  11. On multivariate imputation and forecasting of decadal wind speed missing data.

    Science.gov (United States)

    Wesonga, Ronald

    2015-01-01

    This paper demonstrates the application of multiple imputations by chained equations and time series forecasting of wind speed data. The study was motivated by the high prevalence of missing wind speed historic data. Findings based on the fully conditional specification under multiple imputations by chained equations, provided reliable wind speed missing data imputations. Further, the forecasting model shows, the smoothing parameter, alpha (0.014) close to zero, confirming that recent past observations are more suitable for use to forecast wind speeds. The maximum decadal wind speed for Entebbe International Airport was estimated to be 17.6 metres per second at a 0.05 level of significance with a bound on the error of estimation of 10.8 metres per second. The large bound on the error of estimations confirms the dynamic tendencies of wind speed at the airport under study.

  12. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

    Directory of Open Access Journals (Sweden)

    Ward Judson A

    2013-01-01

    Full Text Available Abstract Background Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry. Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Results Genotyping by Sequencing (GBS was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs linked these results to published maps for cross-validation and map comparison. Conclusions GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation

  13. A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation.

    Science.gov (United States)

    Välikangas, Tommi; Suomi, Tomi; Elo, Laura L

    2017-05-31

    Label-free mass spectrometry (MS) has developed into an important tool applied in various fields of biological and life sciences. Several software exist to process the raw MS data into quantified protein abundances, including open source and commercial solutions. Each software includes a set of unique algorithms for different tasks of the MS data processing workflow. While many of these algorithms have been compared separately, a thorough and systematic evaluation of their overall performance is missing. Moreover, systematic information is lacking about the amount of missing values produced by the different proteomics software and the capabilities of different data imputation methods to account for them.In this study, we evaluated the performance of five popular quantitative label-free proteomics software workflows using four different spike-in data sets. Our extensive testing included the number of proteins quantified and the number of missing values produced by each workflow, the accuracy of detecting differential expression and logarithmic fold change and the effect of different imputation and filtering methods on the differential expression results. We found that the Progenesis software performed consistently well in the differential expression analysis and produced few missing values. The missing values produced by the other software decreased their performance, but this difference could be mitigated using proper data filtering or imputation methods. Among the imputation methods, we found that the local least squares (lls) regression imputation consistently increased the performance of the software in the differential expression analysis, and a combination of both data filtering and local least squares imputation increased performance the most in the tested data sets. © The Author 2017. Published by Oxford University Press.

  14. Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation

    Science.gov (United States)

    Wang, Chaolong; Zhan, Xiaowei; Liang, Liming; Abecasis, Gonçalo R.; Lin, Xihong

    2015-01-01

    Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources. PMID:26027497

  15. Analyzing the changing gender wage gap based on multiply imputed right censored wages

    OpenAIRE

    Gartner, Hermann; Rässler, Susanne

    2005-01-01

    "In order to analyze the gender wage gap with the German IAB-employment register we have to solve the problem of censored wages at the upper limit of the social security system. We treat this problem as a missing data problem. We regard the missingness mechanism as not missing at random (NMAR, according to Little and Rubin, 1987, 2002) as well as missing by design. The censored wages are multiply imputed by draws of a random variable from a truncated distribution. The multiple imputation is b...

  16. UniFIeD Univariate Frequency-based Imputation for Time Series Data

    OpenAIRE

    Friese, Martina; Stork, Jörg; Ramos Guerra, Ricardo; Bartz-Beielstein, Thomas; Thaker, Soham; Flasch, Oliver; Zaefferer, Martin

    2013-01-01

    This paper introduces UniFIeD, a new data preprocessing method for time series. UniFIeD can cope with large intervals of missing data. A scalable test function generator, which allows the simulation of time series with different gap sizes, is presented additionally. An experimental study demonstrates that (i) UniFIeD shows a significant better performance than simple imputation methods and (ii) UniFIeD is able to handle situations, where advanced imputation methods fail. The results are indep...

  17. Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis

    NARCIS (Netherlands)

    Eekhout, I.; Wiel, M.A. van de; Heymans, M.W.

    2017-01-01

    Background. Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels

  18. Geographic information system for Long Island: An epidemiologic systems approach to identify environmental breast cancer risks on Long Island. Phase 1

    Energy Technology Data Exchange (ETDEWEB)

    Barancik, J.I.; Kramer, C.F.; Thode, H.C. Jr.

    1995-12-01

    BNL is developing and implementing the project ``Geographic Information System (GIS) for Long Island`` to address the potential relationship of environmental and occupational exposures to breast cancer etiology on Long Island. The project is divided into two major phases: The four month-feasibility project (Phase 1), and the major development and implementation project (Phase 2). This report summarizes the work completed in the four month Phase 1 Project, ``Feasibility of a Geographic Information System for Long Island.`` It provides the baseline information needed to further define and prioritize the scope of work for subsequent tasks. Phase 2 will build upon this foundation to develop an operational GIS for the Long Island Breast Cancer Study Project (LIBCSP).

  19. Sex work in geographic perspective: a multi-disciplinary approach to mapping and understanding female sex work venues in Southwest China.

    Science.gov (United States)

    Lorway, Robert; Khan, Shamshad; Chevrier, Claudyne; Huynh, Anthony; Zhang, Juying; Ma, Xiao; Blanchard, James; Yu, Nancy

    2017-05-01

    This paper examines the findings from an extensive geographic mapping study of female sex work venues located in the south western Chinese city of Zigong, in Sichuan province. Drawing upon the findings from quantitative research, secondary historical sources and field notes, composed during participant observation, we provide a nuanced portrait of how the operation of sex work can be conceptualised in spatial terms, where 'space' is regarded as something socially constructed and historically contingent. The sex work geographies we analyse hold important implications for prevention work conducted in the region. When the sexual practices between sex workers and their clients are viewed against a wider geographic and historical backdrop, focus shifts from the properties and intentionalities of individuals towards the kinds of spaces where sex work operates, the organisation of which are underpinned by economic forces that have given rise to the rapid proliferation of small urban spaces in contemporary China.

  20. Geographic information system for Long Island: An epidemiologic systems approach to identify environmental breast cancer risks on Long Island. Phase 1

    International Nuclear Information System (INIS)

    Barancik, J.I.; Kramer, C.F.; Thode, H.C. Jr.

    1995-01-01

    BNL is developing and implementing the project ''Geographic Information System (GIS) for Long Island'' to address the potential relationship of environmental and occupational exposures to breast cancer etiology on Long Island. The project is divided into two major phases: The four month-feasibility project (Phase 1), and the major development and implementation project (Phase 2). This report summarizes the work completed in the four month Phase 1 Project, ''Feasibility of a Geographic Information System for Long Island.'' It provides the baseline information needed to further define and prioritize the scope of work for subsequent tasks. Phase 2 will build upon this foundation to develop an operational GIS for the Long Island Breast Cancer Study Project (LIBCSP)

  1. A Geographical Information System Based Approach for Integrated Strategies of Tick Surveillance and Control in the Peri-Urban Natural Reserve of Monte Pellegrino (Palermo, Southern Italy)

    OpenAIRE

    Alessandra Torina; Valeria Blanda; Marcellocalogero Blanda; Michelangelo Auteri; Francesco La Russa; Salvatore Scimeca; Rosalia D’Agostino; Rosaria Disclafani; Sara Villari; Vittoria Currò; Santo Caracappa

    2018-01-01

    Ticks (Acari: Ixodidae) are bloodsucking arthropods involved in pathogen transmission in animals and humans. Tick activity depends on various ecological factors such as vegetation, hosts, and temperature. The aim of this study was to analyse the spatial/temporal distribution of ticks in six sites within a peri-urban area of Palermo (Natural Reserve of Monte Pellegrino) and correlate it with field data using Geographical Information System (GIS) data. A total of 3092 ticks were gathered via dr...

  2. Geographic approaches to quantifying the risk environment: a focus on syringe exchange program site access and drug-related law enforcement activities

    Science.gov (United States)

    Cooper, Hannah LF; Bossak, Brian; Tempalski, Barbara; Des Jarlais, Don C.; Friedman, Samuel R.

    2009-01-01

    The concept of the “risk environment” – defined as the “space … [where] factors exogenous to the individual interact to increase the chances of HIV transmission” – draws together the disciplines of public health and geography. Researchers have increasingly turned to geographic methods to quantify dimensions of the risk environment that are both structural and spatial (e.g., local poverty rates). The scientific power of the intersection between public health and geography, however, has yet to be fully mined. In particular, research on the risk environment has rarely applied geographic methods to create neighbourhood-based measures of syringe exchange programs (SEPs) or of drug-related law enforcement activities, despite the fact that these interventions are widely conceptualized as structural and spatial in nature and are two of the most well-established dimensions of the risk environment. To strengthen research on the risk environment, this paper presents a way of using geographic methods to create neighbourhood-based measures of (1) access to SEP sites and (2) exposure to drug-related arrests, and then applies these methods to one setting (New York City). NYC-based results identified substantial cross-neighbourhood variation in SEP site access and in exposure to drug-related arrest rates (even within the subset of neighbourhoods nominally experiencing the same drug-related police strategy). These geographic measures – grounded as they are in conceptualizations of SEPs and drug-related law enforcement strategies – can help develop new arenas of inquiry regarding the impact of these two dimensions of the risk environment on injectors’ health, including exploring whether and how neighbourhood-level access to SEP sites and exposure to drug-related arrests shape a range of outcomes among local injectors. PMID:18963907

  3. A BAYESIAN SPATIAL AND TEMPORAL MODELING APPROACH TO MAPPING GEOGRAPHIC VARIATION IN MORTALITY RATES FOR SUBNATIONAL AREAS WITH R-INLA.

    Science.gov (United States)

    Khana, Diba; Rossen, Lauren M; Hedegaard, Holly; Warner, Margaret

    2018-01-01

    Hierarchical Bayes models have been used in disease mapping to examine small scale geographic variation. State level geographic variation for less common causes of mortality outcomes have been reported however county level variation is rarely examined. Due to concerns about statistical reliability and confidentiality, county-level mortality rates based on fewer than 20 deaths are suppressed based on Division of Vital Statistics, National Center for Health Statistics (NCHS) statistical reliability criteria, precluding an examination of spatio-temporal variation in less common causes of mortality outcomes such as suicide rates (SRs) at the county level using direct estimates. Existing Bayesian spatio-temporal modeling strategies can be applied via Integrated Nested Laplace Approximation (INLA) in R to a large number of rare causes of mortality outcomes to enable examination of spatio-temporal variations on smaller geographic scales such as counties. This method allows examination of spatiotemporal variation across the entire U.S., even where the data are sparse. We used mortality data from 2005-2015 to explore spatiotemporal variation in SRs, as one particular application of the Bayesian spatio-temporal modeling strategy in R-INLA to predict year and county-specific SRs. Specifically, hierarchical Bayesian spatio-temporal models were implemented with spatially structured and unstructured random effects, correlated time effects, time varying confounders and space-time interaction terms in the software R-INLA, borrowing strength across both counties and years to produce smoothed county level SRs. Model-based estimates of SRs were mapped to explore geographic variation.

  4. Relative efficiency of joint-model and full-conditional-specification multiple imputation when conditional models are compatible: The general location model.

    Science.gov (United States)

    Seaman, Shaun R; Hughes, Rachael A

    2018-06-01

    Estimating the parameters of a regression model of interest is complicated by missing data on the variables in that model. Multiple imputation is commonly used to handle these missing data. Joint model multiple imputation and full-conditional specification multiple imputation are known to yield imputed data with the same asymptotic distribution when the conditional models of full-conditional specification are compatible with that joint model. We show that this asymptotic equivalence of imputation distributions does not imply that joint model multiple imputation and full-conditional specification multiple imputation will also yield asymptotically equally efficient inference about the parameters of the model of interest, nor that they will be equally robust to misspecification of the joint model. When the conditional models used by full-conditional specification multiple imputation are linear, logistic and multinomial regressions, these are compatible with a restricted general location joint model. We show that multiple imputation using the restricted general location joint model can be substantially more asymptotically efficient than full-conditional specification multiple imputation, but this typically requires very strong associations between variables. When associations are weaker, the efficiency gain is small. Moreover, full-conditional specification multiple imputation is shown to be potentially much more robust than joint model multiple imputation using the restricted general location model to mispecification of that model when there is substantial missingness in the outcome variable.

  5. Airports Geographic Information System -

    Data.gov (United States)

    Department of Transportation — The Airports Geographic Information System maintains the airport and aeronautical data required to meet the demands of the Next Generation National Airspace System....

  6. Applying an efficient K-nearest neighbor search to forest attribute imputation

    Science.gov (United States)

    Andrew O. Finley; Ronald E. McRoberts; Alan R. Ek

    2006-01-01

    This paper explores the utility of an efficient nearest neighbor (NN) search algorithm for applications in multi-source kNN forest attribute imputation. The search algorithm reduces the number of distance calculations between a given target vector and each reference vector, thereby, decreasing the time needed to discover the NN subset. Results of five trials show gains...

  7. Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

    Science.gov (United States)

    Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

    2009-01-01

    Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....

  8. Mapping change of older forest with nearest-neighbor imputation and Landsat time-series

    Science.gov (United States)

    Janet L. Ohmann; Matthew J. Gregory; Heather M. Roberts; Warren B. Cohen; Robert E. Kennedy; Zhiqiang. Yang

    2012-01-01

    The Northwest Forest Plan (NWFP), which aims to conserve late-successional and old-growth forests (older forests) and associated species, established new policies on federal lands in the Pacific Northwest USA. As part of monitoring for the NWFP, we tested nearest-neighbor imputation for mapping change in older forest, defined by threshold values for forest attributes...

  9. Multiple imputation to account for missing data in a survey: estimating the prevalence of osteoporosis.

    Science.gov (United States)

    Kmetic, Andrew; Joseph, Lawrence; Berger, Claudie; Tenenhouse, Alan

    2002-07-01

    Nonresponse bias is a concern in any epidemiologic survey in which a subset of selected individuals declines to participate. We reviewed multiple imputation, a widely applicable and easy to implement Bayesian methodology to adjust for nonresponse bias. To illustrate the method, we used data from the Canadian Multicentre Osteoporosis Study, a large cohort study of 9423 randomly selected Canadians, designed in part to estimate the prevalence of osteoporosis. Although subjects were randomly selected, only 42% of individuals who were contacted agreed to participate fully in the study. The study design included a brief questionnaire for those invitees who declined further participation in order to collect information on the major risk factors for osteoporosis. These risk factors (which included age, sex, previous fractures, family history of osteoporosis, and current smoking status) were then used to estimate the missing osteoporosis status for nonparticipants using multiple imputation. Both ignorable and nonignorable imputation models are considered. Our results suggest that selection bias in the study is of concern, but only slightly, in very elderly (age 80+ years), both women and men. Epidemiologists should consider using multiple imputation more often than is current practice.

  10. Multiple imputation strategies for zero-inflated cost data in economic evaluations : which method works best?

    NARCIS (Netherlands)

    MacNeil Vroomen, Janet; Eekhout, Iris; Dijkgraaf, Marcel G; van Hout, Hein; de Rooij, Sophia E; Heymans, Martijn W; Bosmans, Judith E

    2016-01-01

    Cost and effect data often have missing data because economic evaluations are frequently added onto clinical studies where cost data are rarely the primary outcome. The objective of this article was to investigate which multiple imputation strategy is most appropriate to use for missing

  11. Missing value imputation in DNA microarrays based on conjugate gradient method.

    Science.gov (United States)

    Dorri, Fatemeh; Azmi, Paeiz; Dorri, Faezeh

    2012-02-01

    Analysis of gene expression profiles needs a complete matrix of gene array values; consequently, imputation methods have been suggested. In this paper, an algorithm that is based on conjugate gradient (CG) method is proposed to estimate missing values. k-nearest neighbors of the missed entry are first selected based on absolute values of their Pearson correlation coefficient. Then a subset of genes among the k-nearest neighbors is labeled as the best similar ones. CG algorithm with this subset as its input is then used to estimate the missing values. Our proposed CG based algorithm (CGimpute) is evaluated on different data sets. The results are compared with sequential local least squares (SLLSimpute), Bayesian principle component analysis (BPCAimpute), local least squares imputation (LLSimpute), iterated local least squares imputation (ILLSimpute) and adaptive k-nearest neighbors imputation (KNNKimpute) methods. The average of normalized root mean squares error (NRMSE) and relative NRMSE in different data sets with various missing rates shows CGimpute outperforms other methods. Copyright © 2011 Elsevier Ltd. All rights reserved.

  12. Statistical Analysis of a Class: Monte Carlo and Multiple Imputation Spreadsheet Methods for Estimation and Extrapolation

    Science.gov (United States)

    Fish, Laurel J.; Halcoussis, Dennis; Phillips, G. Michael

    2017-01-01

    The Monte Carlo method and related multiple imputation methods are traditionally used in math, physics and science to estimate and analyze data and are now becoming standard tools in analyzing business and financial problems. However, few sources explain the application of the Monte Carlo method for individuals and business professionals who are…

  13. Geographic information system planning and monitoring best ...

    African Journals Online (AJOL)

    Poor urbanization policies, inefficient planning and monitoring technologies are evident. The consequences include some of the worst types of environmental hazards. Best urbanization practices require integrated planning approaches that result in environmental conservation. Geographic Information systems (GIS) provide ...

  14. Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.

    Science.gov (United States)

    Sehgal, Muhammad Shoaib B; Gondal, Iqbal; Dooley, Laurence S

    2005-05-15

    Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods. The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE

  15. Factors associated with low birth weight in Nepal using multiple imputation

    Directory of Open Access Journals (Sweden)

    Usha Singh

    2017-02-01

    Full Text Available Abstract Background Survey data from low income countries on birth weight usually pose a persistent problem. The studies conducted on birth weight have acknowledged missing data on birth weight, but they are not included in the analysis. Furthermore, other missing data presented on determinants of birth weight are not addressed. Thus, this study tries to identify determinants that are associated with low birth weight (LBW using multiple imputation to handle missing data on birth weight and its determinants. Methods The child dataset from Nepal Demographic and Health Survey (NDHS, 2011 was utilized in this study. A total of 5,240 children were born between 2006 and 2011, out of which 87% had at least one measured variable missing and 21% had no recorded birth weight. All the analyses were carried out in R version 3.1.3. Transform-then impute method was applied to check for interaction between explanatory variables and imputed missing data. Survey package was applied to each imputed dataset to account for survey design and sampling method. Survey logistic regression was applied to identify the determinants associated with LBW. Results The prevalence of LBW was 15.4% after imputation. Women with the highest autonomy on their own health compared to those with health decisions involving husband or others (adjusted odds ratio (OR 1.87, 95% confidence interval (95% CI = 1.31, 2.67, and husband and women together (adjusted OR 1.57, 95% CI = 1.05, 2.35 were less likely to give birth to LBW infants. Mothers using highly polluting cooking fuels (adjusted OR 1.49, 95% CI = 1.03, 2.22 were more likely to give birth to LBW infants than mothers using non-polluting cooking fuels. Conclusion The findings of this study suggested that obtaining the prevalence of LBW from only the sample of measured birth weight and ignoring missing data results in underestimation.

  16. Registration of terrestrial mobile laser data on 2D or 3D geographic database by use of a non-rigid ICP approach.

    Science.gov (United States)

    Monnier, F.; Vallet, B.; Paparoditis, N.; Papelard, J.-P.; David, N.

    2013-10-01

    This article presents a generic and efficient method to register terrestrial mobile data with imperfect location on a geographic database with better overall accuracy but less details. The registration method proposed in this paper is based on a semi-rigid point to plane ICP ("Iterative Closest Point"). The main applications of such registration is to improve existing geographic databases, particularly in terms of accuracy, level of detail and diversity of represented objects. Other applications include fine geometric modelling and fine façade texturing, object extraction such as trees, poles, road signs marks, facilities, vehicles, etc. The geopositionning system of mobile mapping systems is affected by GPS masks that are only partially corrected by an Inertial Navigation System (INS) which can cause an important drift. As this drift varies non-linearly, but slowly in time, it will be modelled by a translation defined as a piecewise linear function of time which variation over time will be minimized (rigidity term). For each iteration of the ICP, the drift is estimated in order to minimise the distance between laser points and planar model primitives (data attachment term). The method has been tested on real data (a scan of the city of Paris of 3.6 million laser points registered on a 3D model of approximately 71,400 triangles).

  17. Inclusion of Population-specific Reference Panel from India to the 1000 Genomes Phase 3 Panel Improves Imputation Accuracy.

    Science.gov (United States)

    Ahmad, Meraj; Sinha, Anubhav; Ghosh, Sreya; Kumar, Vikrant; Davila, Sonia; Yajnik, Chittaranjan S; Chandak, Giriraj R

    2017-07-27

    Imputation is a computational method based on the principle of haplotype sharing allowing enrichment of genome-wide association study datasets. It depends on the haplotype structure of the population and density of the genotype data. The 1000 Genomes Project led to the generation of imputation reference panels which have been used globally. However, recent studies have shown that population-specific panels provide better enrichment of genome-wide variants. We compared the imputation accuracy using 1000 Genomes phase 3 reference panel and a panel generated from genome-wide data on 407 individuals from Western India (WIP). The concordance of imputed variants was cross-checked with next-generation re-sequencing data on a subset of genomic regions. Further, using the genome-wide data from 1880 individuals, we demonstrate that WIP works better than the 1000 Genomes phase 3 panel and when merged with it, significantly improves the imputation accuracy throughout the minor allele frequency range. We also show that imputation using only South Asian component of the 1000 Genomes phase 3 panel works as good as the merged panel, making it computationally less intensive job. Thus, our study stresses that imputation accuracy using 1000 Genomes phase 3 panel can be further improved by including population-specific reference panels from South Asia.

  18. Accuracy of hemoglobin A1c imputation using fasting plasma glucose in diabetes research using electronic health records data

    Directory of Open Access Journals (Sweden)

    Stanley Xu

    2014-05-01

    Full Text Available In studies that use electronic health record data, imputation of important data elements such as Glycated hemoglobin (A1c has become common. However, few studies have systematically examined the validity of various imputation strategies for missing A1c values. We derived a complete dataset using an incident diabetes population that has no missing values in A1c, fasting and random plasma glucose (FPG and RPG, age, and gender. We then created missing A1c values under two assumptions: missing completely at random (MCAR and missing at random (MAR. We then imputed A1c values, compared the imputed values to the true A1c values, and used these data to assess the impact of A1c on initiation of antihyperglycemic therapy. Under MCAR, imputation of A1c based on FPG 1 estimated a continuous A1c within ± 1.88% of the true A1c 68.3% of the time; 2 estimated a categorical A1c within ± one category from the true A1c about 50% of the time. Including RPG in imputation slightly improved the precision but did not improve the accuracy. Under MAR, including gender and age in addition to FPG improved the accuracy of imputed continuous A1c but not categorical A1c. Moreover, imputation of up to 33% of missing A1c values did not change the accuracy and precision and did not alter the impact of A1c on initiation of antihyperglycemic therapy. When using A1c values as a predictor variable, a simple imputation algorithm based only on age, sex, and fasting plasma glucose gave acceptable results.

  19. Comparison of results from different imputation techniques for missing data from an anti-obesity drug trial

    DEFF Research Database (Denmark)

    Jørgensen, Anders W.; Lundstrøm, Lars H; Wetterslev, Jørn

    2014-01-01

    BACKGROUND: In randomised trials of medical interventions, the most reliable analysis follows the intention-to-treat (ITT) principle. However, the ITT analysis requires that missing outcome data have to be imputed. Different imputation techniques may give different results and some may lead to bias...... of handling missing data in a 60-week placebo controlled anti-obesity drug trial on topiramate. METHODS: We compared an analysis of complete cases with datasets where missing body weight measurements had been replaced using three different imputation methods: LOCF, baseline carried forward (BOCF) and MI...

  20. iVAR: a program for imputing missing data in multivariate time series using vector autoregressive models.

    Science.gov (United States)

    Liu, Siwei; Molenaar, Peter C M

    2014-12-01

    This article introduces iVAR, an R program for imputing missing data in multivariate time series on the basis of vector autoregressive (VAR) models. We conducted a simulation study to compare iVAR with three methods for handling missing data: listwise deletion, imputation with sample means and variances, and multiple imputation ignoring time dependency. The results showed that iVAR produces better estimates for the cross-lagged coefficients than do the other three methods. We demonstrate the use of iVAR with an empirical example of time series electrodermal activity data and discuss the advantages and limitations of the program.

  1. Application of a geographical information system approach for risk analysis of fascioliasis in southern Espírito Santo state, Brazil.

    Science.gov (United States)

    Martins, Isabella Vilhena Freire; de Avelar, Barbara Rauta; Pereira, Maria Julia Salim; da Fonseca, Adevair Henrique

    2012-09-01

    A model based on geographical information systems for mapping the risk of fascioliasis was developed for the southern part of Espírito Santo state, Brazil. The determinants investigated were precipitation, temperature, elevation, slope, soil type and land use. Weightings and grades were assigned to determinants and their categories according to their relevance with respect to fascioliasis. Theme maps depicting the spatial distribution of risk areas indicate that over 50% of southern Espírito Santo is either at high or at very high risk for fascioliasis. These areas were found to be characterized by comparatively high temperature but relatively low slope, low precipitation and low elevation corresponding to periodically flooded grasslands or soils that promote water retention.

  2. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data.

    Science.gov (United States)

    Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.

  3. Geographical National Condition and Complex System

    Directory of Open Access Journals (Sweden)

    WANG Jiayao

    2016-01-01

    Full Text Available The significance of studying the complex system of geographical national conditions lies in rationally expressing the complex relationships of the “resources-environment-ecology-economy-society” system. Aiming to the problems faced by the statistical analysis of geographical national conditions, including the disunity of research contents, the inconsistency of range, the uncertainty of goals, etc.the present paper conducted a range of discussions from the perspectives of concept, theory and method, and designed some solutions based on the complex system theory and coordination degree analysis methods.By analyzing the concepts of geographical national conditions, geographical national conditions survey and geographical national conditions statistical analysis, as well as investigating the relationships between theirs, the statistical contents and the analytical range of geographical national conditions are clarified and defined. This investigation also clarifies the goals of the statistical analysis by analyzing the basic characteristics of the geographical national conditions and the complex system, and the consistency between the analysis of the degree of coordination and statistical analyses. It outlines their goals, proposes a concept for the complex system of geographical national conditions, and it describes the concept. The complex system theory provides new theoretical guidance for the statistical analysis of geographical national conditions. The degree of coordination offers new approaches on how to undertake the analysis based on the measurement method and decision-making analysis scheme upon which the complex system of geographical national conditions is based. It analyzes the overall trend via the degree of coordination of the complex system on a macro level, and it determines the direction of remediation on a micro level based on the degree of coordination among various subsystems and of single systems. These results establish

  4. Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations

    DEFF Research Database (Denmark)

    Dassonneville, R; Brøndum, Rasmus Froberg; Druet, T

    2011-01-01

    The purpose of this study was to investigate the imputation error and loss of reliability of direct genomic values (DGV) or genomically enhanced breeding values (GEBV) when using genotypes imputed from a 3,000-marker single nucleotide polymorphism (SNP) panel to a 50,000-marker SNP panel. Data...... of missing markers and prediction of breeding values were performed using 2 different reference populations in each country: either a national reference population or a combined EuroGenomics reference population. Validation for accuracy of imputation and genomic prediction was done based on national test...... with a national reference data set gave an absolute loss of 0.05 in mean reliability of GEBV in the French study, whereas a loss of 0.03 was obtained for reliability of DGV in the Nordic study. When genotypes were imputed using the EuroGenomics reference, a loss of 0.02 in mean reliability of GEBV was detected...

  5. Semiautomatic imputation of activity travel diaries : use of global positioning system traces, prompted recall, and context-sensitive learning algorithms

    NARCIS (Netherlands)

    Moiseeva, A.; Jessurun, A.J.; Timmermans, H.J.P.; Stopher, P.

    2016-01-01

    Anastasia Moiseeva, Joran Jessurun and Harry Timmermans (2010), ‘Semiautomatic Imputation of Activity Travel Diaries: Use of Global Positioning System Traces, Prompted Recall, and Context-Sensitive Learning Algorithms’, Transportation Research Record: Journal of the Transportation Research Board,

  6. A Geographical Information System Based Approach for Integrated Strategies of Tick Surveillance and Control in the Peri-Urban Natural Reserve of Monte Pellegrino (Palermo, Southern Italy).

    Science.gov (United States)

    Torina, Alessandra; Blanda, Valeria; Blanda, Marcellocalogero; Auteri, Michelangelo; La Russa, Francesco; Scimeca, Salvatore; D'Agostino, Rosalia; Disclafani, Rosaria; Villari, Sara; Currò, Vittoria; Caracappa, Santo

    2018-02-27

    Ticks (Acari: Ixodidae) are bloodsucking arthropods involved in pathogen transmission in animals and humans. Tick activity depends on various ecological factors such as vegetation, hosts, and temperature. The aim of this study was to analyse the spatial/temporal distribution of ticks in six sites within a peri-urban area of Palermo (Natural Reserve of Monte Pellegrino) and correlate it with field data using Geographical Information System (GIS) data. A total of 3092 ticks were gathered via dragging method from June 2012 to May 2014. The species collected were: Ixodes ventalloi (46.09%), Hyalomma lusitanicum (19.99%), Rhipicephalus sanguineus (17.34%), Rhipicephalus pusillus (16.11%), Haemaphisalis sulcata (0.36%), Dermacentor marginatus (0.10%), and Rhipicephalus turanicus (0.03%). GIS analysis revealed environmental characteristics of each site, and abundance of each tick species was analysed in relation to time (monthly trend) and space (site-specific abundance). A relevant presence of I. ventalloi in site 2 and H. lusitanicum in site 5 was observed, suggesting the possible exposure of animals and humans to tick-borne pathogens. Our study shows the importance of surveillance of ticks in peri-urban areas and the useful implementation of GIS analysis in vector ecology; studies on temporal and spatial distribution of ticks correlated to GIS-based ecological analysis represent an integrated strategy for decision support in public health.

  7. A Geographical Information System Based Approach for Integrated Strategies of Tick Surveillance and Control in the Peri-Urban Natural Reserve of Monte Pellegrino (Palermo, Southern Italy

    Directory of Open Access Journals (Sweden)

    Alessandra Torina

    2018-02-01

    Full Text Available Ticks (Acari: Ixodidae are bloodsucking arthropods involved in pathogen transmission in animals and humans. Tick activity depends on various ecological factors such as vegetation, hosts, and temperature. The aim of this study was to analyse the spatial/temporal distribution of ticks in six sites within a peri-urban area of Palermo (Natural Reserve of Monte Pellegrino and correlate it with field data using Geographical Information System (GIS data. A total of 3092 ticks were gathered via dragging method from June 2012 to May 2014. The species collected were: Ixodes ventalloi (46.09%, Hyalomma lusitanicum (19.99%, Rhipicephalus sanguineus (17.34%, Rhipicephalus pusillus (16.11%, Haemaphisalis sulcata (0.36%, Dermacentor marginatus (0.10%, and Rhipicephalus turanicus (0.03%. GIS analysis revealed environmental characteristics of each site, and abundance of each tick species was analysed in relation to time (monthly trend and space (site-specific abundance. A relevant presence of I. ventalloi in site 2 and H. lusitanicum in site 5 was observed, suggesting the possible exposure of animals and humans to tick-borne pathogens. Our study shows the importance of surveillance of ticks in peri-urban areas and the useful implementation of GIS analysis in vector ecology; studies on temporal and spatial distribution of ticks correlated to GIS-based ecological analysis represent an integrated strategy for decision support in public health.

  8. Childhood leukemia near nuclear plants in the United Kingdom: The evolution of a systematic approach to studying rare disease in small geographic areas

    International Nuclear Information System (INIS)

    Beral, V.

    1990-01-01

    A cluster of childhood leukemia in a village near a nuclear plant in northern England prompted further studies of cancer in the vicinity of other nuclear plants in the United Kingdom. These studies demonstrated that the risk of childhood leukemia was increased near certain other nuclear plants. Although the reasons for the increase are still unclear, the scientific debate stimulated by these findings has clarified some of the special methodological problems encountered when studying rare diseases in small areas. Firstly, unless a specific hypothesis is defined in advance, the relevance of a single geographic cluster of disease can rarely be interpreted. Even when a prior hypothesis exists, the small number of cases which generally occur in a small area make the findings highly sensitive to reporting, diagnostic, or classification errors. The statistical power of such investigations is also usually low and only marked increases in risk can be detected. Furthermore, conventional statistical tests may be inappropriate if the underlying spatial distribution of the disease is not random; and little is known about the background distribution of disease in small areas. Investigations of specific hypotheses about defined sources of environmental contamination, especially if they can be replicated, are more likely to result in conclusive findings that are in-depth studies of individual clusters

  9. Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population

    Directory of Open Access Journals (Sweden)

    Danai Jattawa

    2016-04-01

    Full Text Available The objective of this study was to investigate the accuracy of imputation from low density (LDC to moderate density SNP chips (MDC in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244 from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570, GGP26K (n = 540 and GGP80K (n = 134 chips. After checking for single nucleotide polymorphism (SNP quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912 and a test group (n = 332. The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652. The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm, FImpute 2.2 (combined family- and population-based algorithms and Findhap 4 (combined family- and population-based algorithms. Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94% than Findhap (84.64% and Beagle (76.79%. Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73% or low (80% imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart. Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information.

  10. Inference for multivariate regression model based on multiply imputed synthetic data generated via posterior predictive sampling

    Science.gov (United States)

    Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.

    2017-06-01

    The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.

  11. Using mi impute chained to fit ANCOVA models in randomized trials with censored dependent and independent variables

    DEFF Research Database (Denmark)

    Andersen, Andreas; Rieckmann, Andreas

    2016-01-01

    In this article, we illustrate how to use mi impute chained with intreg to fit an analysis of covariance analysis of censored and nondetectable immunological concentrations measured in a randomized pretest–posttest design.......In this article, we illustrate how to use mi impute chained with intreg to fit an analysis of covariance analysis of censored and nondetectable immunological concentrations measured in a randomized pretest–posttest design....

  12. Imputing historical statistics, soils information, and other land-use data to crop area

    Science.gov (United States)

    Perry, C. R., Jr.; Willis, R. W.; Lautenschlager, L.

    1982-01-01

    In foreign crop condition monitoring, satellite acquired imagery is routinely used. To facilitate interpretation of this imagery, it is advantageous to have estimates of the crop types and their extent for small area units, i.e., grid cells on a map represent, at 60 deg latitude, an area nominally 25 by 25 nautical miles in size. The feasibility of imputing historical crop statistics, soils information, and other ancillary data to crop area for a province in Argentina is studied.

  13. Construction and application of a Korean reference panel for imputing classical alleles and amino acids of human leukocyte antigen genes.

    Science.gov (United States)

    Kim, Kwangwoo; Bang, So-Young; Lee, Hye-Soon; Bae, Sang-Cheol

    2014-01-01

    Genetic variations of human leukocyte antigen (HLA) genes within the major histocompatibility complex (MHC) locus are strongly associated with disease susceptibility and prognosis for many diseases, including many autoimmune diseases. In this study, we developed a Korean HLA reference panel for imputing classical alleles and amino acid residues of several HLA genes. An HLA reference panel has potential for use in identifying and fine-mapping disease associations with the MHC locus in East Asian populations, including Koreans. A total of 413 unrelated Korean subjects were analyzed for single nucleotide polymorphisms (SNPs) at the MHC locus and six HLA genes, including HLA-A, -B, -C, -DRB1, -DPB1, and -DQB1. The HLA reference panel was constructed by phasing the 5,858 MHC SNPs, 233 classical HLA alleles, and 1,387 amino acid residue markers from 1,025 amino acid positions as binary variables. The imputation accuracy of the HLA reference panel was assessed by measuring concordance rates between imputed and genotyped alleles of the HLA genes from a subset of the study subjects and East Asian HapMap individuals. Average concordance rates were 95.6% and 91.1% at 2-digit and 4-digit allele resolutions, respectively. The imputation accuracy was minimally affected by SNP density of a test dataset for imputation. In conclusion, the Korean HLA reference panel we developed was highly suitable for imputing HLA alleles and amino acids from MHC SNPs in East Asians, including Koreans.

  14. Construction and application of a Korean reference panel for imputing classical alleles and amino acids of human leukocyte antigen genes.

    Directory of Open Access Journals (Sweden)

    Kwangwoo Kim

    Full Text Available Genetic variations of human leukocyte antigen (HLA genes within the major histocompatibility complex (MHC locus are strongly associated with disease susceptibility and prognosis for many diseases, including many autoimmune diseases. In this study, we developed a Korean HLA reference panel for imputing classical alleles and amino acid residues of several HLA genes. An HLA reference panel has potential for use in identifying and fine-mapping disease associations with the MHC locus in East Asian populations, including Koreans. A total of 413 unrelated Korean subjects were analyzed for single nucleotide polymorphisms (SNPs at the MHC locus and six HLA genes, including HLA-A, -B, -C, -DRB1, -DPB1, and -DQB1. The HLA reference panel was constructed by phasing the 5,858 MHC SNPs, 233 classical HLA alleles, and 1,387 amino acid residue markers from 1,025 amino acid positions as binary variables. The imputation accuracy of the HLA reference panel was assessed by measuring concordance rates between imputed and genotyped alleles of the HLA genes from a subset of the study subjects and East Asian HapMap individuals. Average concordance rates were 95.6% and 91.1% at 2-digit and 4-digit allele resolutions, respectively. The imputation accuracy was minimally affected by SNP density of a test dataset for imputation. In conclusion, the Korean HLA reference panel we developed was highly suitable for imputing HLA alleles and amino acids from MHC SNPs in East Asians, including Koreans.

  15. Design of a bovine low-density SNP array optimized for imputation.

    Directory of Open Access Journals (Sweden)

    Didier Boichard

    Full Text Available The Illumina BovineLD BeadChip was designed to support imputation to higher density genotypes in dairy and beef breeds by including single-nucleotide polymorphisms (SNPs that had a high minor allele frequency as well as uniform spacing across the genome except at the ends of the chromosome where densities were increased. The chip also includes SNPs on the Y chromosome and mitochondrial DNA loci that are useful for determining subspecies classification and certain paternal and maternal breed lineages. The total number of SNPs was 6,909. Accuracy of imputation to Illumina BovineSNP50 genotypes using the BovineLD chip was over 97% for most dairy and beef populations. The BovineLD imputations were about 3 percentage points more accurate than those from the Illumina GoldenGate Bovine3K BeadChip across multiple populations. The improvement was greatest when neither parent was genotyped. The minor allele frequencies were similar across taurine beef and dairy breeds as was the proportion of SNPs that were polymorphic. The new BovineLD chip should facilitate low-cost genomic selection in taurine beef and dairy cattle.

  16. Missing Data Imputation of Solar Radiation Data under Different Atmospheric Conditions

    Science.gov (United States)

    Turrado, Concepción Crespo; López, María del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; de Cos Juez, Francisco Javier

    2014-01-01

    Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW. PMID:25356644

  17. TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION.

    Science.gov (United States)

    Allen, Genevera I; Tibshirani, Robert

    2010-06-01

    Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable , meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal , in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.

  18. Missing Data Imputation of Solar Radiation Data under Different Atmospheric Conditions

    Directory of Open Access Journals (Sweden)

    Concepción Crespo Turrado

    2014-10-01

    Full Text Available Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE. This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW and Multiple Linear Regression (MLR. The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW.

  19. Missing data imputation of solar radiation data under different atmospheric conditions.

    Science.gov (United States)

    Turrado, Concepción Crespo; López, María Del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; Juez, Francisco Javier de Cos

    2014-10-29

    Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW.

  20. A novel approach to parasite population genetics: experimental infection reveals geographic differentiation, recombination and host-mediated population structure in Pasteuria ramosa, a bacterial parasite of Daphnia.

    Science.gov (United States)

    Andras, J P; Ebert, D

    2013-02-01

    The population structure of parasites is central to the ecology and evolution of host-parasite systems. Here, we investigate the population genetics of Pasteuria ramosa, a bacterial parasite of Daphnia. We used natural P. ramosa spore banks from the sediments of two geographically well-separated ponds to experimentally infect a panel of Daphnia magna host clones whose resistance phenotypes were previously known. In this way, we were able to assess the population structure of P. ramosa based on geography, host resistance phenotype and host genotype. Overall, genetic diversity of P. ramosa was high, and nearly all infected D. magna hosted more than one parasite haplotype. On the basis of the observation of recombinant haplotypes and relatively low levels of linkage disequilibrium, we conclude that P. ramosa engages in substantial recombination. Isolates were strongly differentiated by pond, indicating that gene flow is spatially restricted. Pasteuria ramosa isolates within one pond were segregated completely based on the resistance phenotype of the host-a result that, to our knowledge, has not been previously reported for a nonhuman parasite. To assess the comparability of experimental infections with natural P. ramosa isolates, we examined the population structure of naturally infected D. magna native to one of the two source ponds. We found that experimental and natural infections of the same host resistance phenotype from the same source pond were indistinguishable, indicating that experimental infections provide a means to representatively sample the diversity of P. ramosa while reducing the sampling bias often associated with studies of parasite epidemics. These results expand our knowledge of this model parasite, provide important context for the large existing body of research on this system and will guide the design of future studies of this host-parasite system. © 2012 Blackwell Publishing Ltd.

  1. Social-geographic approaches to application of economic-mathematical modeling in predicting the place of Ukrainian farming economies in food market commoditization

    Directory of Open Access Journals (Sweden)

    Valeriy Rudenko

    2017-11-01

    Full Text Available Social-geographic analysis of farmery with application of economic-mathematical modeling allowed for prediction of farming economies’ role in food market commoditization. The equation of potential demand was suggested. Actual consumption and its recommended rates with respect to meat and meat products, milk and milk products, eggs, fish and fish products, bread and cereal products, potatoes, vegetables, fruits and berries, etc, were compared. Cartographic model of Ukrainian domestic food market’s potential capacity (within good-money relations was developed. The low level of purchasing power, especially in rural population, makes a high percentage of foodstuffs be beyond the goods-money relations. In rural areas, they (inclusive of farmers produce and consume a significant portion of foodstuffs that escaped the goods-money relations, or such foodstuffs were given to them by the relatives. We regard that in the process of assessment of the capacity of domestic food market, this share of products should also be taken into account. The assessment also necessitates consideration of the number of urban and rural population in Ukrainian regions; manufacturing of certain types of agricultural production; needs in this or that type of product as prescribed by minimal and rational consumption rates. When predicting, with the use of economic-mathematical modeling, the places of farming economies in commoditization of food market, it is reasonable to apply the parameters of time series of the number of farming economies and the areas of lands used by them with consideration of the dynamics of population number and the level of its (population self-provision with agricultural production. Application of predictive linear models shows that the share of production manufactured by farming economies will be most essential before 2020 on the market of potatoes and vegetables (reaching 15 %. Despite the predicted double increase in animal production, its share

  2. Feedback mechanisms between water availability and water use in a semi-arid river basin: a geographically explicit multi-agent simulation approach

    NARCIS (Netherlands)

    van Oel, P.R.; Krol, Martinus S.; Hoekstra, Arjen Ysbert; Taddei, Renzo R.

    2010-01-01

    Understanding the processes responsible for the distribution of water availability over space and time is of great importance to spatial planning in a semi-arid river basin. In this study the usefulness of a multi-agent simulation (MAS) approach for representing these processes is discussed. A MAS

  3. Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa.

    Directory of Open Access Journals (Sweden)

    Katya L Masconi

    Full Text Available Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation.Data from the Cape Town Bellville-South cohort served as the basis for this study. Imputation methods and models were identified via recent systematic reviews. Models' discrimination was assessed and compared using C-statistic and non-parametric methods, before and after recalibration through simple intercept adjustment.The study sample consisted of 1256 individuals, of whom 173 were excluded due to previously diagnosed diabetes. Of the final 1083 individuals, 329 (30.4% had missing data. Family history had the highest proportion of missing data (25%. Imputation of the outcome, undiagnosed diabetes, was highest in stochastic regression imputation (163 individuals. Overall, deletion resulted in the lowest model performances while simple imputation yielded the highest C-statistic for the Cambridge Diabetes Risk model, Kuwaiti Risk model, Omani Diabetes Risk model and Rotterdam Predictive model. Multiple imputation only yielded the highest C-statistic for the Rotterdam Predictive model, which were matched by simpler imputation methods.Deletion was confirmed as a poor technique for handling missing data. However, despite the emphasized disadvantages of simpler imputation methods, this study showed that implementing these methods results in similar predictive utility for undiagnosed diabetes when compared to multiple imputation.

  4. Comparing strategies for selection of low-density SNPs for imputation-mediated genomic prediction in U. S. Holsteins.

    Science.gov (United States)

    He, Jun; Xu, Jiaqi; Wu, Xiao-Lin; Bauck, Stewart; Lee, Jungjae; Morota, Gota; Kachman, Stephen D; Spangler, Matthew L

    2018-04-01

    SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821-0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825-0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction.

  5. Missing Value Imputation Improves Mortality Risk Prediction Following Cardiac Surgery: An Investigation of an Australian Patient Cohort.

    Science.gov (United States)

    Karim, Md Nazmul; Reid, Christopher M; Tran, Lavinia; Cochrane, Andrew; Billah, Baki

    2017-03-01

    The aim of this study was to evaluate the impact of missing values on the prediction performance of the model predicting 30-day mortality following cardiac surgery as an example. Information from 83,309 eligible patients, who underwent cardiac surgery, recorded in the Australia and New Zealand Society of Cardiac and Thoracic Surgeons (ANZSCTS) database registry between 2001 and 2014, was used. An existing 30-day mortality risk prediction model developed from ANZSCTS database was re-estimated using the complete cases (CC) analysis and using multiple imputation (MI) analysis. Agreement between the risks generated by the CC and MI analysis approaches was assessed by the Bland-Altman method. Performances of the two models were compared. One or more missing predictor variables were present in 15.8% of the patients in the dataset. The Bland-Altman plot demonstrated significant disagreement between the risk scores (prisk of mortality. Compared to CC analysis, MI analysis resulted in an average of 8.5% decrease in standard error, a measure of uncertainty. The MI model provided better prediction of mortality risk (observed: 2.69%; MI: 2.63% versus CC: 2.37%, Pvalues improved the 30-day mortality risk prediction following cardiac surgery. Copyright © 2016 Australian and New Zealand Society of Cardiac and Thoracic Surgeons (ANZSCTS) and the Cardiac Society of Australia and New Zealand (CSANZ). Published by Elsevier B.V. All rights reserved.

  6. Characterising an intense PM pollution episode in March 2015 in France from multi-site approach and near real time data: Climatology, variabilities, geographical origins and model evaluation

    Science.gov (United States)

    Petit, J.-E.; Amodeo, T.; Meleux, F.; Bessagnet, B.; Menut, L.; Grenier, D.; Pellan, Y.; Ockler, A.; Rocq, B.; Gros, V.; Sciare, J.; Favez, O.

    2017-04-01

    During March 2015, a severe and large-scale particulate matter (PM) pollution episode occurred in France. Measurements in near real-time of the major chemical composition at four different urban background sites across the country (Paris, Creil, Metz and Lyon) allowed the investigation of spatiotemporal variabilities during this episode. A climatology approach showed that all sites experienced clear unusual rain shortage, a pattern that is also found on a longer timescale, highlighting the role of synoptic conditions over Wester-Europe. This episode is characterized by a strong predominance of secondary pollution, and more particularly of ammonium nitrate, which accounted for more than 50% of submicron aerosols at all sites during the most intense period of the episode. Pollution advection is illustrated by similar variabilities in Paris and Creil (distant of around 100 km), as well as trajectory analyses applied on nitrate and sulphate. Local sources, especially wood burning, are however found to contribute to local/regional sub-episodes, notably in Metz. Finally, simulated concentrations from Chemistry-Transport model CHIMERE were compared to observed ones. Results highlighted different patterns depending on the chemical components and the measuring site, reinforcing the need of such exercises over other pollution episodes and sites.

  7. Randomly and Non-Randomly Missing Renal Function Data in the Strong Heart Study: A Comparison of Imputation Methods.

    Directory of Open Access Journals (Sweden)

    Nawar Shara

    Full Text Available Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS. Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989-1991, 2 (1993-1995, and 3 (1998-1999 was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results.

  8. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.

    Science.gov (United States)

    Lazar, Cosmin; Gatto, Laurent; Ferro, Myriam; Bruley, Christophe; Burger, Thomas

    2016-04-01

    Missing values are a genuine issue in label-free quantitative proteomics. Recent works have surveyed the different statistical methods to conduct imputation and have compared them on real or simulated data sets and recommended a list of missing value imputation methods for proteomics application. Although insightful, these comparisons do not account for two important facts: (i) depending on the proteomics data set, the missingness mechanism may be of different natures and (ii) each imputation method is devoted to a specific type of missingness mechanism. As a result, we believe that the question at stake is not to find the most accurate imputation method in general but instead the most appropriate one. We describe a series of comparisons that support our views: For instance, we show that a supposedly "under-performing" method (i.e., giving baseline average results), if applied at the "appropriate" time in the data-processing pipeline (before or after peptide aggregation) on a data set with the "appropriate" nature of missing values, can outperform a blindly applied, supposedly "better-performing" method (i.e., the reference method from the state-of-the-art). This leads us to formulate few practical guidelines regarding the choice and the application of an imputation method in a proteomics context.

  9. Imputation of Baseline LDL Cholesterol Concentration in Patients with Familial Hypercholesterolemia on Statins or Ezetimibe.

    Science.gov (United States)

    Ruel, Isabelle; Aljenedil, Sumayah; Sadri, Iman; de Varennes, Émilie; Hegele, Robert A; Couture, Patrick; Bergeron, Jean; Wanneh, Eric; Baass, Alexis; Dufour, Robert; Gaudet, Daniel; Brisson, Diane; Brunham, Liam R; Francis, Gordon A; Cermakova, Lubomira; Brophy, James M; Ryomoto, Arnold; Mancini, G B John; Genest, Jacques

    2018-02-01

    Familial hypercholesterolemia (FH) is the most frequent genetic disorder seen clinically and is characterized by increased LDL cholesterol (LDL-C) (>95th percentile), family history of increased LDL-C, premature atherosclerotic cardiovascular disease (ASCVD) in the patient or in first-degree relatives, presence of tendinous xanthomas or premature corneal arcus, or presence of a pathogenic mutation in the LDLR , PCSK9 , or APOB genes. A diagnosis of FH has important clinical implications with respect to lifelong risk of ASCVD and requirement for intensive pharmacological therapy. The concentration of baseline LDL-C (untreated) is essential for the diagnosis of FH but is often not available because the individual is already on statin therapy. To validate a new algorithm to impute baseline LDL-C, we examined 1297 patients. The baseline LDL-C was compared with the imputed baseline obtained within 18 months of the initiation of therapy. We compared the percent reduction in LDL-C on treatment from baseline with the published percent reductions. After eliminating individuals with missing data, nonstandard doses of statins, or medications other than statins or ezetimibe, we provide data on 951 patients. The mean ± SE baseline LDL-C was 243.0 (2.2) mg/dL [6.28 (0.06) mmol/L], and the mean ± SE imputed baseline LDL-C was 244.2 (2.6) mg/dL [6.31 (0.07) mmol/L] ( P = 0.48). There was no difference in response according to the patient's sex or in percent reduction between observed and expected for individual doses or types of statin or ezetimibe. We provide a validated estimation of baseline LDL-C for patients with FH that may help clinicians in making a diagnosis. © 2017 American Association for Clinical Chemistry.

  10. Using the Superpopulation Model for Imputations and Variance Computation in Survey Sampling

    Directory of Open Access Journals (Sweden)

    Petr Novák

    2012-03-01

    Full Text Available This study is aimed at variance computation techniques for estimates of population characteristics based on survey sampling and imputation. We use the superpopulation regression model, which means that the target variable values for each statistical unit are treated as random realizations of a linear regression model with weighted variance. We focus on regression models with one auxiliary variable and no intercept, which have many applications and straightforward interpretation in business statistics. Furthermore, we deal with caseswhere the estimates are not independent and thus the covariance must be computed. We also consider chained regression models with auxiliary variables as random variables instead of constants.

  11. Missing Value Imputation Based on Gaussian Mixture Model for the Internet of Things

    OpenAIRE

    Yan, Xiaobo; Xiong, Weiqing; Hu, Liang; Wang, Feng; Zhao, Kuo

    2015-01-01

    This paper addresses missing value imputation for the Internet of Things (IoT). Nowadays, the IoT has been used widely and commonly by a variety of domains, such as transportation and logistics domain and healthcare domain. However, missing values are very common in the IoT for a variety of reasons, which results in the fact that the experimental data are incomplete. As a result of this, some work, which is related to the data of the IoT, can’t be carried out normally. And it leads to the red...

  12. Candidate gene analysis using imputed genotypes: cell cycle single-nucleotide polymorphisms and ovarian cancer risk

    DEFF Research Database (Denmark)

    Goode, Ellen L; Fridley, Brooke L; Vierkant, Robert A

    2009-01-01

    Polymorphisms in genes critical to cell cycle control are outstanding candidates for association with ovarian cancer risk; numerous genes have been interrogated by multiple research groups using differing tagging single-nucleotide polymorphism (SNP) sets. To maximize information gleaned from......, and rs3212891; CDK2 rs2069391, rs2069414, and rs17528736; and CCNE1 rs3218036. These results exemplify the utility of imputation in candidate gene studies and lend evidence to a role of cell cycle genes in ovarian cancer etiology, suggest a reduced set of SNPs to target in additional cases and controls....

  13. Non-imputability, criminal dangerousness and curative safety measures: myths and realities

    Directory of Open Access Journals (Sweden)

    Frank Harbottle Quirós

    2017-04-01

    Full Text Available The curative safety measures are imposed in a criminal proceeding to the non-imputable people provided that through a prognosis it is concluded in an affirmative way about its criminal dangerousness. Although this statement seems very elementary, in judicial practice several myths remain in relation to these legal institutes whose versions may vary, to a greater or lesser extent, between the different countries of the world. In this context, the present article formulates ten myths based on the experience of Costa Rica and provides an explanation that seeks to weaken or knock them down, inviting the reader to reflect on them.

  14. Acesso aos serviços de saúde: uma abordagem de geografia em saúde pública Access to health services: a geographical approach to public health

    Directory of Open Access Journals (Sweden)

    Carmen Vieira de Sousa Unglert

    1987-10-01

    Full Text Available O acesso da população aos serviços de saúde é um pré-requisito de fundamental importância para uma eficiente assistência à saúde. A localização geográfica dos serviços é um dos fatores que interferem nessa acessibilidade. Pretendeu-se estudar a localização dos serviços de saúde. A proposta básica foi a de apresentação de uma metodologia considerando-se as relações de variáveis geográficas, demográficas e sociais. Enfatizou-se, no processo, a participação da comunidade. Efetuou-se o estudo da adequação dessa metodologia às características da região de Santo Amaro, Município de São Paulo, Brasil. A contribuição dada pela abordagem geográfica abre ampla perspectiva quanto ao estabelecimento de novas linhas de estudo, planejamento e gestão, advindas do intercâmbio entre a Geografia Humana e a Saúde Pública, numa área que se sugere denominar Geografia em Saúde Pública.The access of the population to the health services is a requirement of basic importance for the efficiency of health assistance. The geographical localization of the services is one of the factors that interfere with this accessibility. It is intended to make a contribution to the study of the localization of health services. The basic proposal introduces a method which takes into account the relationships between geographical, demographical and social variables. Emphasis is placed on community participation in the process. The study of the adequacy of this method was undertaken under the regional characteristics of Santo Amaro, a suburb of the city of S. Paulo, Brazil. The contribution furnished by the geographical approach in this work opens up a broad perspective for the setting up of new lines of research, planning and administration resulting from the interation between human geography and public health within the common field for which it is suggested Geography of Public Health.

  15. Remote sensing research in geographic education: An alternative view

    Science.gov (United States)

    Wilson, H.; Cary, T. K.; Goward, S. N.

    1981-01-01

    It is noted that within many geography departments remote sensing is viewed as a mere technique a student should learn in order to carry out true geographic research. This view inhibits both students and faculty from investigation of remotely sensed data as a new source of geographic knowledge that may alter our understanding of the Earth. The tendency is for geographers to accept these new data and analysis techniques from engineers and mathematicians without questioning the accompanying premises. This black-box approach hinders geographic applications of the new remotely sensed data and limits the geographer's contribution to further development of remote sensing observation systems. It is suggested that geographers contribute to the development of remote sensing through pursuit of basic research. This research can be encouraged, particularly among students, by demonstrating the links between geographic theory and remotely sensed observations, encouraging a healthy skepticism concerning the current understanding of these data.

  16. Trend in BMI z-score among Private Schools’ Students in Delhi using Multiple Imputation for Growth Curve Model

    Directory of Open Access Journals (Sweden)

    Vinay K Gupta

    2016-06-01

    Full Text Available Objective: The aim of the study is to assess the trend in mean BMI z-score among private schools’ students from their anthropometric records when there were missing values in the outcome. Methodology: The anthropometric measurements of student from class 1 to 12 were taken from the records of two private schools in Delhi, India from 2005 to 2010. These records comprise of an unbalanced longitudinal data that is not all the students had measurements recorded at each year. The trend in mean BMI z-score was estimated through growth curve model. Prior to that, missing values of BMI z-score were imputed through multiple imputation using the same model. A complete case analysis was also performed after excluding missing values to compare the results with those obtained from analysis of multiply imputed data. Results: The mean BMI z-score among school student significantly decreased over time in imputed data (β= -0.2030, se=0.0889, p=0.0232 after adjusting age, gender, class and school. Complete case analysis also shows a decrease in mean BMI z-score though it was not statistically significant (β= -0.2861, se=0.0987, p=0.065. Conclusions: The estimates obtained from multiple imputation analysis were better than those of complete data after excluding missing values in terms of lower standard errors. We showed that anthropometric measurements from schools records can be used to monitor the weight status of children and adolescents and multiple imputation using growth curve model can be useful while analyzing such data

  17. Teaching Geographic Field Methods Using Paleoecology

    Science.gov (United States)

    Walsh, Megan K.

    2014-01-01

    Field-based undergraduate geography courses provide numerous pedagogical benefits including an opportunity for students to acquire employable skills in an applied context. This article presents one unique approach to teaching geographic field methods using paleoecological research. The goals of this course are to teach students key geographic…

  18. Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans.

    Science.gov (United States)

    Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne; Bourgeois, Stephane; Svensson, Peter J; Wadelius, Mia; Deloukas, Panos; Montgomery, Stephen B; Altman, Russ B

    2017-11-24

    Genome-wide association studies are useful for discovering genotype-phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into "gene level" effects. Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression-on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort.

  19. Multiple imputation to account for measurement error in marginal structural models

    Science.gov (United States)

    Edwards, Jessie K.; Cole, Stephen R.; Westreich, Daniel; Crane, Heidi; Eron, Joseph J.; Mathews, W. Christopher; Moore, Richard; Boswell, Stephen L.; Lesko, Catherine R.; Mugavero, Michael J.

    2015-01-01

    Background Marginal structural models are an important tool for observational studies. These models typically assume that variables are measured without error. We describe a method to account for differential and non-differential measurement error in a marginal structural model. Methods We illustrate the method estimating the joint effects of antiretroviral therapy initiation and current smoking on all-cause mortality in a United States cohort of 12,290 patients with HIV followed for up to 5 years between 1998 and 2011. Smoking status was likely measured with error, but a subset of 3686 patients who reported smoking status on separate questionnaires composed an internal validation subgroup. We compared a standard joint marginal structural model fit using inverse probability weights to a model that also accounted for misclassification of smoking status using multiple imputation. Results In the standard analysis, current smoking was not associated with increased risk of mortality. After accounting for misclassification, current smoking without therapy was associated with increased mortality [hazard ratio (HR): 1.2 (95% CI: 0.6, 2.3)]. The HR for current smoking and therapy (0.4 (95% CI: 0.2, 0.7)) was similar to the HR for no smoking and therapy (0.4; 95% CI: 0.2, 0.6). Conclusions Multiple imputation can be used to account for measurement error in concert with methods for causal inference to strengthen results from observational studies. PMID:26214338

  20. Multiple Imputation to Account for Measurement Error in Marginal Structural Models.

    Science.gov (United States)

    Edwards, Jessie K; Cole, Stephen R; Westreich, Daniel; Crane, Heidi; Eron, Joseph J; Mathews, W Christopher; Moore, Richard; Boswell, Stephen L; Lesko, Catherine R; Mugavero, Michael J

    2015-09-01

    Marginal structural models are an important tool for observational studies. These models typically assume that variables are measured without error. We describe a method to account for differential and nondifferential measurement error in a marginal structural model. We illustrate the method estimating the joint effects of antiretroviral therapy initiation and current smoking on all-cause mortality in a United States cohort of 12,290 patients with HIV followed for up to 5 years between 1998 and 2011. Smoking status was likely measured with error, but a subset of 3,686 patients who reported smoking status on separate questionnaires composed an internal validation subgroup. We compared a standard joint marginal structural model fit using inverse probability weights to a model that also accounted for misclassification of smoking status using multiple imputation. In the standard analysis, current smoking was not associated with increased risk of mortality. After accounting for misclassification, current smoking without therapy was associated with increased mortality (hazard ratio [HR]: 1.2 [95% confidence interval [CI] = 0.6, 2.3]). The HR for current smoking and therapy [0.4 (95% CI = 0.2, 0.7)] was similar to the HR for no smoking and therapy (0.4; 95% CI = 0.2, 0.6). Multiple imputation can be used to account for measurement error in concert with methods for causal inference to strengthen results from observational studies.

  1. Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans

    Directory of Open Access Journals (Sweden)

    Assaf Gottlieb

    2017-11-01

    Full Text Available Abstract Background Genome-wide association studies are useful for discovering genotype–phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into “gene level” effects. Methods Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression—on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. Results We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Conclusions Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort

  2. Using beta coefficients to impute missing correlations in meta-analysis research: Reasons for caution.

    Science.gov (United States)

    Roth, Philip L; Le, Huy; Oh, In-Sue; Van Iddekinge, Chad H; Bobko, Philip

    2018-06-01

    Meta-analysis has become a well-accepted method for synthesizing empirical research about a given phenomenon. Many meta-analyses focus on synthesizing correlations across primary studies, but some primary studies do not report correlations. Peterson and Brown (2005) suggested that researchers could use standardized regression weights (i.e., beta coefficients) to impute missing correlations. Indeed, their beta estimation procedures (BEPs) have been used in meta-analyses in a wide variety of fields. In this study, the authors evaluated the accuracy of BEPs in meta-analysis. We first examined how use of BEPs might affect results from a published meta-analysis. We then developed a series of Monte Carlo simulations that systematically compared the use of existing correlations (that were not missing) to data sets that incorporated BEPs (that impute missing correlations from corresponding beta coefficients). These simulations estimated ρ̄ (mean population correlation) and SDρ (true standard deviation) across a variety of meta-analytic conditions. Results from both the existing meta-analysis and the Monte Carlo simulations revealed that BEPs were associated with potentially large biases when estimating ρ̄ and even larger biases when estimating SDρ. Using only existing correlations often substantially outperformed use of BEPs and virtually never performed worse than BEPs. Overall, the authors urge a return to the standard practice of using only existing correlations in meta-analysis. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  3. Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes

    Directory of Open Access Journals (Sweden)

    Lotz Meredith J

    2008-01-01

    Full Text Available Abstract Background Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method has its own advantages, but the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of eight current imputation methods on multiple types of microarray experiments, including time series, multiple exposures, and multiple exposures × time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data set. Results We found that the optimal imputation algorithms (LSA, LLS, and BPCA are all highly competitive with each other, and that no method is uniformly superior in all the data sets we examined. The success of each method can also depend on the underlying "complexity" of the expression data, where we take complexity to indicate the difficulty in mapping the gene expression matrix to a lower-dimensional subspace. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS scheme is useful for selecting an appropriate imputation algorithm. We further propose a simulation-based self-training selection (STS scheme. This technique has been used previously for microarray data imputation, but for different purposes. The scheme selects the optimal or near-optimal method with high accuracy but at an increased computational cost. Conclusion Our findings provide insight into the problem of which imputation method is optimal for a given data set. Three top-performing methods (LSA, LLS and BPCA are competitive with each other. Global-based imputation methods (PLS, SVD, BPCA

  4. Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes.

    Science.gov (United States)

    Brock, Guy N; Shaffer, John R; Blakesley, Richard E; Lotz, Meredith J; Tseng, George C

    2008-01-10

    Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method has its own advantages, but the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of eight current imputation methods on multiple types of microarray experiments, including time series, multiple exposures, and multiple exposures x time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data set. We found that the optimal imputation algorithms (LSA, LLS, and BPCA) are all highly competitive with each other, and that no method is uniformly superior in all the data sets we examined. The success of each method can also depend on the underlying "complexity" of the expression data, where we take complexity to indicate the difficulty in mapping the gene expression matrix to a lower-dimensional subspace. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS) scheme is useful for selecting an appropriate imputation algorithm. We further propose a simulation-based self-training selection (STS) scheme. This technique has been used previously for microarray data imputation, but for different purposes. The scheme selects the optimal or near-optimal method with high accuracy but at an increased computational cost. Our findings provide insight into the problem of which imputation method is optimal for a given data set. Three top-performing methods (LSA, LLS and BPCA) are competitive with each other. Global-based imputation methods (PLS, SVD, BPCA) performed better on mcroarray data with lower complexity

  5. Local exome sequences facilitate imputation of less common variants and increase power of genome wide association studies.

    Directory of Open Access Journals (Sweden)

    Peter K Joshi

    Full Text Available The analysis of less common variants in genome-wide association studies promises to elucidate complex trait genetics but is hampered by low power to reliably detect association. We show that addition of population-specific exome sequence data to global reference data allows more accurate imputation, particularly of less common SNPs (minor allele frequency 1-10% in two very different European populations. The imputation improvement corresponds to an increase in effective sample size of 28-38%, for SNPs with a minor allele frequency in the range 1-3%.

  6. Partial F-tests with multiply imputed data in the linear regression framework via coefficient of determination.

    Science.gov (United States)

    Chaurasia, Ashok; Harel, Ofer

    2015-02-10

    Tests for regression coefficients such as global, local, and partial F-tests are common in applied research. In the framework of multiple imputation, there are several papers addressing tests for regression coefficients. However, for simultaneous hypothesis testing, the existing methods are computationally intensive because they involve calculation with vectors and (inversion of) matrices. In this paper, we propose a simple method based on the scalar entity, coefficient of determination, to perform (global, local, and partial) F-tests with multiply imputed data. The proposed method is evaluated using simulated data and applied to suicide prevention data. Copyright © 2014 John Wiley & Sons, Ltd.

  7. Random Forest as an Imputation Method for Education and Psychology Research: Its Impact on Item Fit and Difficulty of the Rasch Model

    Science.gov (United States)

    Golino, Hudson F.; Gomes, Cristiano M. A.

    2016-01-01

    This paper presents a non-parametric imputation technique, named random forest, from the machine learning field. The random forest procedure has two main tuning parameters: the number of trees grown in the prediction and the number of predictors used. Fifty experimental conditions were created in the imputation procedure, with different…

  8. Volunteered Geographic Information in Wikipedia

    Science.gov (United States)

    Hardy, Darren

    2010-01-01

    Volunteered geographic information (VGI) refers to the geographic subset of online user-generated content. Through Geobrowsers and online mapping services, which use geovisualization and Web technologies to share and produce VGI, a global digital commons of geographic information has emerged. A notable example is Wikipedia, an online collaborative…

  9. Understanding Amphibian Declines Through Geographic Approaches

    Science.gov (United States)

    Gallant, Alisa

    2006-01-01

    Growing concern over worldwide amphibian declines warrants serious examination. Amphibians are important to the proper functioning of ecosystems and provide many direct benefits to humans in the form of pest and disease control, pharmaceutical compounds, and even food. Amphibians have permeable skin and rely on both aquatic and terrestrial ecosystems during different seasons and stages of their lives. Their association with these ecosystems renders them likely to serve as sensitive indicators of environmental change. While much research on amphibian declines has centered on mysterious causes, or on causes that directly affect humans (global warming, chemical pollution, ultraviolet-B radiation), most declines are the result of habitat loss and habitat alteration. Improving our ability to characterize, model, and monitor the interactions between environmental variables and amphibian habitats is key to addressing amphibian conservation. In 2000, the U.S. Geological Survey (USGS) initiated the Amphibian Research and Monitoring Initiative (ARMI) to address issues surrounding amphibian declines.

  10. Imputation by the mean score should be avoided when validating a Patient Reported Outcomes questionnaire by a Rasch model in presence of informative missing data

    LENUS (Irish Health Repository)

    Hardouin, Jean-Benoit

    2011-07-14

    Abstract Background Nowadays, more and more clinical scales consisting in responses given by the patients to some items (Patient Reported Outcomes - PRO), are validated with models based on Item Response Theory, and more specifically, with a Rasch model. In the validation sample, presence of missing data is frequent. The aim of this paper is to compare sixteen methods for handling the missing data (mainly based on simple imputation) in the context of psychometric validation of PRO by a Rasch model. The main indexes used for validation by a Rasch model are compared. Methods A simulation study was performed allowing to consider several cases, notably the possibility for the missing values to be informative or not and the rate of missing data. Results Several imputations methods produce bias on psychometrical indexes (generally, the imputation methods artificially improve the psychometric qualities of the scale). In particular, this is the case with the method based on the Personal Mean Score (PMS) which is the most commonly used imputation method in practice. Conclusions Several imputation methods should be avoided, in particular PMS imputation. From a general point of view, it is important to use an imputation method that considers both the ability of the patient (measured for example by his\\/her score), and the difficulty of the item (measured for example by its rate of favourable responses). Another recommendation is to always consider the addition of a random process in the imputation method, because such a process allows reducing the bias. Last, the analysis realized without imputation of the missing data (available case analyses) is an interesting alternative to the simple imputation in this context.

  11. Geographic Ontologies, Gazetteers and Multilingualism

    Directory of Open Access Journals (Sweden)

    Robert Laurini

    2015-01-01

    Full Text Available Different languages imply different visions of space, so that terminologies are different in geographic ontologies. In addition to their geometric shapes, geographic features have names, sometimes different in diverse languages. In addition, the role of gazetteers, as dictionaries of place names (toponyms, is to maintain relations between place names and location. The scope of geographic information retrieval is to search for geographic information not against a database, but against the whole Internet: but the Internet stores information in different languages, and it is of paramount importance not to remain stuck to a unique language. In this paper, our first step is to clarify the links between geographic objects as computer representations of geographic features, ontologies and gazetteers designed in various languages. Then, we propose some inference rules for matching not only types, but also relations in geographic ontologies with the assistance of gazetteers.

  12. The Adoption of "Thinking through Geography" Strategies and Their Impact on Teaching Geographical Reasoning in Dutch Secondary Schools

    Science.gov (United States)

    Hooghuis, Fer; van der Schee, Joop; van der Velde, Martin; Imants, Jeroen; Volman, Monique

    2014-01-01

    The development of geographical reasoning is essential in geographical education. Strategies developed by the English "Thinking Through Geography" group (TTG) offer a promising approach to promote geographical reasoning. In the last decade, the TTG approach has become a regular element in geographical education in several countries.…

  13. The adoption of Thinking Through Geography strategies and their impact on teaching geographical reasoning in Dutch secondary schools

    NARCIS (Netherlands)

    Hooghuis, F.; van der Schee, J.; van der Velde, M.; Imants, J.; Volman, M.

    2014-01-01

    The development of geographical reasoning is essential in geographical education. Strategies developed by the English Thinking Through Geography group (TTG) offer a promising approach to promote geographical reasoning. In the last decade, the TTG approach has become a regular element in geographical

  14. The adoption of Thinking Through Geography Strategies and their impact on teaching geographical reasoning in Dutch secondary schools

    NARCIS (Netherlands)

    Hooghuis, F.; van der Schee, J.A.; van der Velde, M.; Imants, J.; Volman, M.

    2014-01-01

    The development of geographical reasoning is essential in geographical education. Strategies developed by the English Thinking Through Geography group (TTG) offer a promising approach to promote geographical reasoning. In the last decade, the TTG approach has become a regular element in geographical

  15. The adoption of Thinking Through Geography Strategies and their impact on teaching geographical reasoning in Dutch secondary schools

    NARCIS (Netherlands)

    Hooghuis, Fer; van der Schee, Joop; van der Velde, Martin; Imants, Jeroen; Volman, Monique

    The development of geographical reasoning is essential in geographical education. Strategies developed by the English Thinking Through Geography group (TTG) offer a promising approach to promote geographical reasoning. In the last decade, the TTG approach has become a regular element in geographical

  16. A Multi-Faceted Approach to Analyse the Effects of Environmental Variables on Geographic Range and Genetic Structure of a Perennial Psammophilous Geophyte: The Case of the Sea Daffodil Pancratium maritimum L. in the Mediterranean Basin.

    Directory of Open Access Journals (Sweden)

    Olga De Castro

    Full Text Available The Mediterranean coastline is a dynamic and complex system which owes its complexity to its past and present vicissitudes, e.g. complex tectonic history, climatic fluctuations, and prolonged coexistence with human activities. A plant species that is widespread in this habitat is the sea daffodil, Pancratium maritimum (Amaryllidaceae, which is a perennial clonal geophyte of the coastal sands of the Mediterranean and neighbouring areas, well adapted to the stressful conditions of sand dune environments. In this study, an integrated approach was used, combining genetic and environmental data with a niche modelling approach, aimed to investigate: (1 the effect of climate change on the geographic range of this species at different times {past (last inter-glacial, LIG; and last glacial maximum, LGM, present (CURR, near-future (FUT} and (2 the possible influence of environmental variables on the genetic structure of this species in the current period. The genetic results show that 48 sea daffodil populations (867 specimens display a good genetic diversity in which the marginal populations (i.e. Atlantic Sea populations present lower values. Recent genetic signature of bottleneck was detected in few populations (8%. The molecular variation was higher within the populations (77% and two genetic pools were well represented. Comparing the different climatic simulations in time, the global range of this plant increased, and a further extension is foreseen in the near future thanks to projections on the climate of areas currently-more temperate, where our model suggested a forecast for a climate more similar to the Mediterranean coast. A significant positive correlation was observed between the genetic distance and Precipitation of Coldest Quarter variable in current periods. Our analyses support the hypothesis that geomorphology of the Mediterranean coasts, sea currents, and climate have played significant roles in shaping the current genetic structure of

  17. A Multi-Faceted Approach to Analyse the Effects of Environmental Variables on Geographic Range and Genetic Structure of a Perennial Psammophilous Geophyte: The Case of the Sea Daffodil Pancratium maritimum L. in the Mediterranean Basin.

    Science.gov (United States)

    De Castro, Olga; Di Maio, Antonietta; Di Febbraro, Mirko; Imparato, Gennaro; Innangi, Michele; Véla, Errol; Menale, Bruno

    2016-01-01

    The Mediterranean coastline is a dynamic and complex system which owes its complexity to its past and present vicissitudes, e.g. complex tectonic history, climatic fluctuations, and prolonged coexistence with human activities. A plant species that is widespread in this habitat is the sea daffodil, Pancratium maritimum (Amaryllidaceae), which is a perennial clonal geophyte of the coastal sands of the Mediterranean and neighbouring areas, well adapted to the stressful conditions of sand dune environments. In this study, an integrated approach was used, combining genetic and environmental data with a niche modelling approach, aimed to investigate: (1) the effect of climate change on the geographic range of this species at different times {past (last inter-glacial, LIG; and last glacial maximum, LGM), present (CURR), near-future (FUT)} and (2) the possible influence of environmental variables on the genetic structure of this species in the current period. The genetic results show that 48 sea daffodil populations (867 specimens) display a good genetic diversity in which the marginal populations (i.e. Atlantic Sea populations) present lower values. Recent genetic signature of bottleneck was detected in few populations (8%). The molecular variation was higher within the populations (77%) and two genetic pools were well represented. Comparing the different climatic simulations in time, the global range of this plant increased, and a further extension is foreseen in the near future thanks to projections on the climate of areas currently-more temperate, where our model suggested a forecast for a climate more similar to the Mediterranean coast. A significant positive correlation was observed between the genetic distance and Precipitation of Coldest Quarter variable in current periods. Our analyses support the hypothesis that geomorphology of the Mediterranean coasts, sea currents, and climate have played significant roles in shaping the current genetic structure of the sea

  18. application of geographic information system (gis) in industrial land ...

    African Journals Online (AJOL)

    DEPHILIHS

    Land capability index mapping using Geographic Information System (GIS) principles was used for this study. The study was undertaken using Arc View ... Geographic Information Systems (GIS) is one of the best approaches for this type of ..... western segments and to a small extent the east. Some of the available lands are ...

  19. Imputation of single nucleotide polymorhpism genotypes of Hereford cattle: reference panel size, family relationship and population structure

    Science.gov (United States)

    The objective of this study is to investigate single nucleotide polymorphism (SNP) genotypes imputation of Hereford cattle. Purebred Herefords were from two sources, Line 1 Hereford (N=240) and representatives of Industry Herefords (N=311). Using different reference panels of 62 and 494 males with 1...

  20. 21 CFR 1404.630 - May the Office of National Drug Control Policy impute conduct of one person to another?

    Science.gov (United States)

    2010-04-01

    ... 21 Food and Drugs 9 2010-04-01 2010-04-01 false May the Office of National Drug Control Policy impute conduct of one person to another? 1404.630 Section 1404.630 Food and Drugs OFFICE OF NATIONAL DRUG CONTROL POLICY GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1404.630...

  1. The Use of Imputed Sibling Genotypes in Sibship-Based Association Analysis: On Modeling Alternatives, Power and Model Misspecification

    NARCIS (Netherlands)

    Minica, C.C.; Dolan, C.V.; Willemsen, G.; Vink, J.M.; Boomsma, D.I.

    2013-01-01

    When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of

  2. Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research

    Directory of Open Access Journals (Sweden)

    Hardt Jochen

    2012-12-01

    Full Text Available Abstract Background Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. Methods A simulation study of a linear regression with a response Y and two predictors X1 and X2 was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10 vs. moderate correlations (r=.50 with X’s and Y. Results The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful. Conclusion More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.

  3. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    DEFF Research Database (Denmark)

    Huang, Jie; Howie, Bryan; Mccarthy, Shane

    2015-01-01

    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low de...

  4. 29 CFR 1471.630 - May the Federal Mediation and Conciliation Service impute conduct of one person to another?

    Science.gov (United States)

    2010-07-01

    ... 29 Labor 4 2010-07-01 2010-07-01 false May the Federal Mediation and Conciliation Service impute...) FEDERAL MEDIATION AND CONCILIATION SERVICE GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1471.630 May the Federal Mediation and...

  5. Age at menopause: imputing age at menopause for women with a hysterectomy with application to risk of postmenopausal breast cancer

    Science.gov (United States)

    Rosner, Bernard; Colditz, Graham A.

    2011-01-01

    Purpose Age at menopause, a major marker in the reproductive life, may bias results for evaluation of breast cancer risk after menopause. Methods We follow 38,948 premenopausal women in 1980 and identify 2,586 who reported hysterectomy without bilateral oophorectomy, and 31,626 who reported natural menopause during 22 years of follow-up. We evaluate risk factors for natural menopause, impute age at natural menopause for women reporting hysterectomy without bilateral oophorectomy and estimate the hazard of reaching natural menopause in the next 2 years. We apply this imputed age at menopause to both increase sample size and to evaluate the relation between postmenopausal exposures and risk of breast cancer. Results Age, cigarette smoking, age at menarche, pregnancy history, body mass index, history of benign breast disease, and history of breast cancer were each significantly related to age at natural menopause; duration of oral contraceptive use and family history of breast cancer were not. The imputation increased sample size substantially and although some risk factors after menopause were weaker in the expanded model (height, and alcohol use), use of hormone therapy is less biased. Conclusions Imputing age at menopause increases sample size, broadens generalizability making it applicable to women with hysterectomy, and reduces bias. PMID:21441037

  6. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    NARCIS (Netherlands)

    J. Huang (Jie); B. Howie (Bryan); S. McCarthy (Shane); Y. Memari (Yasin); K. Walter (Klaudia); J.L. Min (Josine L.); P. Danecek (Petr); G. Malerba (Giovanni); E. Trabetti (Elisabetta); H.-F. Zheng (Hou-Feng); G. Gambaro (Giovanni); J.B. Richards (Brent); R. Durbin (Richard); N.J. Timpson (Nicholas); J. Marchini (Jonathan); N. Soranzo (Nicole); S.H. Al Turki (Saeed); A. Amuzu (Antoinette); C. Anderson (Carl); R. Anney (Richard); D. Antony (Dinu); M.S. Artigas; M. Ayub (Muhammad); S. Bala (Senduran); J.C. Barrett (Jeffrey); I.E. Barroso (Inês); P.L. Beales (Philip); M. Benn (Marianne); J. Bentham (Jamie); S. Bhattacharya (Shoumo); E. Birney (Ewan); D.H.R. Blackwood (Douglas); M. Bobrow (Martin); E. Bochukova (Elena); P.F. Bolton (Patrick F.); R. Bounds (Rebecca); C. Boustred (Chris); G. Breen (Gerome); M. Calissano (Mattia); K. Carss (Keren); J.P. Casas (Juan Pablo); J.C. Chambers (John C.); R. Charlton (Ruth); K. Chatterjee (Krishna); L. Chen (Lu); A. Ciampi (Antonio); S. Cirak (Sebahattin); P. Clapham (Peter); G. Clement (Gail); G. Coates (Guy); M. Cocca (Massimiliano); D.A. Collier (David); C. Cosgrove (Catherine); T. Cox (Tony); N.J. Craddock (Nick); L. Crooks (Lucy); S. Curran (Sarah); D. Curtis (David); A. Daly (Allan); I.N.M. Day (Ian N.M.); A.G. Day-Williams (Aaron); G.V. Dedoussis (George); T. Down (Thomas); Y. Du (Yuanping); C.M. van Duijn (Cornelia); I. Dunham (Ian); T. Edkins (Ted); R. Ekong (Rosemary); P. Ellis (Peter); D.M. Evans (David); I.S. Farooqi (I. Sadaf); D.R. Fitzpatrick (David R.); P. Flicek (Paul); J. Floyd (James); A.R. Foley (A. Reghan); C.S. Franklin (Christopher S.); M. Futema (Marta); L. Gallagher (Louise); P. Gasparini (Paolo); T.R. Gaunt (Tom); M. Geihs (Matthias); D. Geschwind (Daniel); C.M.T. Greenwood (Celia); H. Griffin (Heather); D. Grozeva (Detelina); X. Guo (Xiaosen); X. Guo (Xueqin); H. Gurling (Hugh); D. Hart (Deborah); A.E. Hendricks (Audrey E.); P.A. Holmans (Peter A.); L. Huang (Liren); T. Hubbard (Tim); S.E. Humphries (Steve E.); M.E. Hurles (Matthew); P.G. Hysi (Pirro); V. Iotchkova (Valentina); A. Isaacs (Aaron); D.K. Jackson (David K.); Y. Jamshidi (Yalda); J. Johnson (Jon); C. Joyce (Chris); K.J. Karczewski (Konrad); J. Kaye (Jane); T. Keane (Thomas); J.P. Kemp (John); K. Kennedy (Karen); A. Kent (Alastair); J. Keogh (Julia); F. Khawaja (Farrah); M.E. Kleber (Marcus); M. Van Kogelenberg (Margriet); A. Kolb-Kokocinski (Anja); J.S. Kooner (Jaspal S.); G. Lachance (Genevieve); C. Langenberg (Claudia); C. Langford (Cordelia); D. Lawson (Daniel); I. Lee (Irene); E.M. van Leeuwen (Elisa); M. Lek (Monkol); R. Li (Rui); Y. Li (Yingrui); J. Liang (Jieqin); H. Lin (Hong); R. Liu (Ryan); J. Lönnqvist (Jouko); L.R. Lopes (Luis R.); M.C. Lopes (Margarida); J. Luan; D.G. MacArthur (Daniel G.); M. Mangino (Massimo); G. Marenne (Gaëlle); W. März (Winfried); J. Maslen (John); A. Matchan (Angela); I. Mathieson (Iain); P. McGuffin (Peter); A.M. McIntosh (Andrew); A.G. McKechanie (Andrew G.); A. McQuillin (Andrew); S. Metrustry (Sarah); N. Migone (Nicola); H.M. Mitchison (Hannah M.); A. Moayyeri (Alireza); J. Morris (James); R. Morris (Richard); D. Muddyman (Dawn); F. Muntoni; B.G. Nordestgaard (Børge G.); K. Northstone (Kate); M.C. O'donovan (Michael); S. O'Rahilly (Stephen); A. Onoufriadis (Alexandros); K. Oualkacha (Karim); M.J. Owen (Michael J.); A. Palotie (Aarno); K. Panoutsopoulou (Kalliope); V. Parker (Victoria); J.R. Parr (Jeremy R.); L. Paternoster (Lavinia); T. Paunio (Tiina); F. Payne (Felicity); S.J. Payne (Stewart J.); J.R.B. Perry (John); O.P.H. Pietiläinen (Olli); V. Plagnol (Vincent); R.C. Pollitt (Rebecca C.); S. Povey (Sue); M.A. Quail (Michael A.); L. Quaye (Lydia); L. Raymond (Lucy); K. Rehnström (Karola); C.K. Ridout (Cheryl K.); S.M. Ring (Susan); G.R.S. Ritchie (Graham R.S.); N. Roberts (Nicola); R.L. Robinson (Rachel L.); D.B. Savage (David); P.J. Scambler (Peter); S. Schiffels (Stephan); M. Schmidts (Miriam); N. Schoenmakers (Nadia); R.H. Scott (Richard H.); R.A. Scott (Robert); R.K. Semple (Robert K.); E. Serra (Eva); S.I. Sharp (Sally I.); A.C. Shaw (Adam C.); H.A. Shihab (Hashem A.); S.-Y. Shin (So-Youn); D. Skuse (David); K.S. Small (Kerrin); C. Smee (Carol); G.D. Smith; L. Southam (Lorraine); O. Spasic-Boskovic (Olivera); T.D. Spector (Timothy); D. St. Clair (David); B. St Pourcain (Beate); J. Stalker (Jim); E. Stevens (Elizabeth); J. Sun (Jianping); G. Surdulescu (Gabriela); J. Suvisaari (Jaana); P. Syrris (Petros); I. Tachmazidou (Ioanna); R. Taylor (Rohan); J. Tian (Jing); M.D. Tobin (Martin); D. Toniolo (Daniela); M. Traglia (Michela); A. Tybjaerg-Hansen; A.M. Valdes; A.M. Vandersteen (Anthony M.); A. Varbo (Anette); P. Vijayarangakannan (Parthiban); P.M. Visscher (Peter); L.V. Wain (Louise); J.T. Walters (James); G. Wang (Guangbiao); J. Wang (Jun); Y. Wang (Yu); K. Ward (Kirsten); E. Wheeler (Eleanor); P.H. Whincup (Peter); T. Whyte (Tamieka); H.J. Williams (Hywel J.); K.A. Williamson (Kathleen); C. Wilson (Crispian); S.G. Wilson (Scott); K. Wong (Kim); C. Xu (Changjiang); J. Yang (Jian); G. Zaza (Gianluigi); E. Zeggini (Eleftheria); F. Zhang (Feng); P. Zhang (Pingbo); W. Zhang (Weihua)

    2015-01-01

    textabstractImputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced

  7. Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels

    NARCIS (Netherlands)

    van Leeuwen, E.M.; Karssen, L.C.; Deelen, J.; Isaacs, A.; Medina-Gomez, C.; Mbarek, H.; Kanterakis, A.; Trompet, S.; Postmus, I.; Verweij, N.; van Enckevort, D.; Huffman, J.E.; White, C.C.; Feitosa, M.F.; Bartz, T.M.; Manichaikul, A.; Joshi, P.K.; Peloso, G.M.; Deelen, P.; Dijk, F.; Willemsen, G.; de Geus, E.J.C.; Milaneschi, Y.; Penninx, B.W.J.H.; Francioli, L.C.; Menelaou, A.; Pulit, S.L.; Rivadeneira, F.; Hofman, A.; Oostra, B.A.; Franco, O.H.; Mateo Leach, I.; Beekman, M.; de Craen, A.J.; Uh, H.W.; Trochet, H.; Hocking, L.J.; Porteous, D.J.; Sattar, N.; Packard, C.J.; Buckley, B.M.; Brody, J.A.; Bis, J.C.; Rotter, J.I.; Mychaleckyj, J.C.; Campbell, H.; Duan, Q.; Lange, L.A.; Wilson, J.F.; Hayward, C.; Polasek, O.; Vitart, V.; Rudan, I.; Wright, A.F.; Rich, S.S.; Psaty, B.M.; Borecki, I.B.; Kearney, P.M.; Stott, D.J.; Cupples, L.A.; Jukema, J.W.; van der Harst, P.; Sijbrands, E.J.; Hottenga, J.J.; Uitterlinden, A.G.; Swertz, M.A.; van Ommen, G.J.B; Bakker, P.I.W.; Slagboom, P.E.; Boomsma, D.I.; Wijmenga, C.; van Duijn, C.M.

    2015-01-01

    Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (∼35,000 samples) with the population-specific reference panel created

  8. 31 CFR 19.630 - May the Department of the Treasury impute conduct of one person to another?

    Science.gov (United States)

    2010-07-01

    ... 31 Money and Finance: Treasury 1 2010-07-01 2010-07-01 false May the Department of the Treasury impute conduct of one person to another? 19.630 Section 19.630 Money and Finance: Treasury Office of the Secretary of the Treasury GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles...

  9. Using geographic information systems

    International Nuclear Information System (INIS)

    Winsor, R.W.

    1997-01-01

    A true Geographic Information System (GIS) is a computer mapping system with spatial analysis ability and cartographic accuracy that will offer many different projections. GIS has evolved to become an everyday tool for a wide range of users including oil companies, worldwide. Other systems are designed to allow oil and gas companies to keep their upstream data in the same format. Among these are the Public Petroleum Data Model developed by Gulf Canada, Digitech and Applied Terravision Systems of Calgary, the system developed and marketed by the Petrotechnical Open Software Corporation in the United States, and the Mercury projects by IBM. These have been developed in an effort to define an industry standard. The advantages and disadvantages of open and closed systems were discussed. Factors to consider when choosing a GIS system such as overall performance, area of use and query complexity, were reviewed. 3 figs

  10. Impute DC link (IDCL) cell based power converters and control thereof

    Science.gov (United States)

    Divan, Deepakraj M.; Prasai, Anish; Hernendez, Jorge; Moghe, Rohit; Iyer, Amrit; Kandula, Rajendra Prasad

    2016-04-26

    Power flow controllers based on Imputed DC Link (IDCL) cells are provided. The IDCL cell is a self-contained power electronic building block (PEBB). The IDCL cell may be stacked in series and parallel to achieve power flow control at higher voltage and current levels. Each IDCL cell may comprise a gate drive, a voltage sharing module, and a thermal management component in order to facilitate easy integration of the cell into a variety of applications. By providing direct AC conversion, the IDCL cell based AC/AC converters reduce device count, eliminate the use of electrolytic capacitors that have life and reliability issues, and improve system efficiency compared with similarly rated back-to-back inverter system.

  11. ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

    Directory of Open Access Journals (Sweden)

    Kamatani Naoyuki

    2011-05-01

    Full Text Available Abstract Background Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required. Results We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo. Conclusions ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.

  12. GEOGRAPHIC NAMES INFORMATION SYSTEM (GNIS) ...

    Science.gov (United States)

    The Geographic Names Information System (GNIS), developed by the U.S. Geological Survey in cooperation with the U.S. Board on Geographic Names (BGN), contains information about physical and cultural geographic features in the United States and associated areas, both current and historical, but not including roads and highways. The database also contains geographic names in Antarctica. The database holds the Federally recognized name of each feature and defines the location of the feature by state, county, USGS topographic map, and geographic coordinates. Other feature attributes include names or spellings other than the official name, feature designations, feature class, historical and descriptive information, and for some categories of features the geometric boundaries. The database assigns a unique feature identifier, a random number, that is a key for accessing, integrating, or reconciling GNIS data with other data sets. The GNIS is our Nation's official repository of domestic geographic feature names information.

  13. An imputation/copula-based stochastic individual tree growth model for mixed species Acadian forests: a case study using the Nova Scotia permanent sample plot network

    Directory of Open Access Journals (Sweden)

    John A. KershawJr

    2017-09-01

    Full Text Available Background A novel approach to modelling individual tree growth dynamics is proposed. The approach combines multiple imputation and copula sampling to produce a stochastic individual tree growth and yield projection system. Methods The Nova Scotia, Canada permanent sample plot network is used as a case study to develop and test the modelling approach. Predictions from this model are compared to predictions from the Acadian variant of the Forest Vegetation Simulator, a widely used statistical individual tree growth and yield model. Results Diameter and height growth rates were predicted with error rates consistent with those produced using statistical models. Mortality and ingrowth error rates were higher than those observed for diameter and height, but also were within the bounds produced by traditional approaches for predicting these rates. Ingrowth species composition was very poorly predicted. The model was capable of reproducing a wide range of stand dynamic trajectories and in some cases reproduced trajectories that the statistical model was incapable of reproducing. Conclusions The model has potential to be used as a benchmarking tool for evaluating statistical and process models and may provide a mechanism to separate signal from noise and improve our ability to analyze and learn from large regional datasets that often have underlying flaws in sample design.

  14. Coloring geographical threshold graphs

    Energy Technology Data Exchange (ETDEWEB)

    Bradonjic, Milan [Los Alamos National Laboratory; Percus, Allon [Los Alamos National Laboratory; Muller, Tobias [EINDHOVEN UNIV. OF TECH

    2008-01-01

    We propose a coloring algorithm for sparse random graphs generated by the geographical threshold graph (GTG) model, a generalization of random geometric graphs (RGG). In a GTG, nodes are distributed in a Euclidean space, and edges are assigned according to a threshold function involving the distance between nodes as well as randomly chosen node weights. The motivation for analyzing this model is that many real networks (e.g., wireless networks, the Internet, etc.) need to be studied by using a 'richer' stochastic model (which in this case includes both a distance between nodes and weights on the nodes). Here, we analyze the GTG coloring algorithm together with the graph's clique number, showing formally that in spite of the differences in structure between GTG and RGG, the asymptotic behavior of the chromatic number is identical: {chi}1n 1n n / 1n n (1 + {omicron}(1)). Finally, we consider the leading corrections to this expression, again using the coloring algorithm and clique number to provide bounds on the chromatic number. We show that the gap between the lower and upper bound is within C 1n n / (1n 1n n){sup 2}, and specify the constant C.

  15. Comparison of Imputation Methods for Handling Missing Categorical Data with Univariate Pattern|| Una comparación de métodos de imputación de variables categóricas con patrón univariado

    Directory of Open Access Journals (Sweden)

    Torres Munguía, Juan Armando

    2014-06-01

    Full Text Available This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random forests. Results showed that the most effective methods for dealing with missing categorical data in most of the scenarios assessed in this paper were hot-deck and polytomous regression approaches. || El presente estudio examina la estimación de proporciones muestrales en la presencia de valores faltantes en una variable categórica. Se utiliza una encuesta de consumo de tabaco (Encuesta Nacional de Adicciones de México 2011 para crear bases de datos simuladas pero reales con 5% y 15% de valores perdidos para cada mecanismo de no respuesta MCAR, MAR y MNAR. Se evalúa el desempeño de seis métodos para tratar la falta de respuesta: listwise, imputación de moda, imputación aleatoria, hot-deck, imputación por regresión politómica y árboles de clasificación. Los resultados de las simulaciones indican que los métodos más efectivos para el tratamiento de la no respuesta en variables categóricas, bajo los escenarios simulados, son hot-deck y la regresión politómica.

  16. Determinants of Dentists' Geographic Distribution.

    Science.gov (United States)

    Beazoglou, Tryfon J.; And Others

    1992-01-01

    A model for explaining the geographic distribution of dentists' practice locations is presented and applied to particular market areas in Connecticut. Results show geographic distribution is significantly related to a few key variables, including demography, disposable income, and housing prices. Implications for helping students make practice…

  17. Geographic diversification in banking

    NARCIS (Netherlands)

    Fang, Yiwei; van Lelyveld, Iman

    2014-01-01

    In the aftermath of the 2007-2009 crisis, banks claiming positive diversification benefits are being met with skepticism. Nevertheless, diversification might be important and sizable for some large internationally active banking groups. We use a universally applicable correlation matrix approach to

  18. Geographic information systems: introduction.

    Science.gov (United States)

    Calistri, Paolo; Conte, Annamaria; Freier, Jerome E; Ward, Michael P

    2007-01-01

    The recent exponential growth of the science and technology of geographic information systems (GIS) has made a tremendous contribution to epidemiological analysis and has led to the development of new powerful tools for the surveillance of animal diseases. GIS, spatial analysis and remote sensing provide valuable methods to collect and manage information for epidemiological surveys. Spatial patterns and trends of disease can be correlated with climatic and environmental information, thus contributing to a better understanding of the links between disease processes and explanatory spatial variables. Until recently, these tools were underexploited in the field of veterinary public health, due to the prohibitive cost of hardware and the complexity of GIS software that required a high level of expertise. The revolutionary developments in computer performance of the last decade have not only reduced the costs of equipment but have made available easy-to-use Web-based software which in turn have meant that GIS are more widely accessible by veterinary services at all levels. At the same time, the increased awareness of the possibilities offered by these tools has created new opportunities for decision-makers to enhance their planning, analysis and monitoring capabilities. These technologies offer a new way of sharing and accessing spatial and non-spatial data across groups and institutions. The series of papers included in this compilation aim to: - define the state of the art in the use of GIS in veterinary activities - identify priority needs in the development of new GIS tools at the international level for the surveillance of animal diseases and zoonoses - define practical proposals for their implementation. The topics addressed are presented in the following order in this book: - importance of GIS for the monitoring of animal diseases and zoonoses - GIS application in surveillance activities - spatial analysis in veterinary epidemiology - data collection and remote

  19. Avoid Filling Swiss Cheese with Whipped Cream; Imputation Techniques and Evaluation Procedures for Cross-Country Time Series

    OpenAIRE

    Michael Weber; Michaela Denk

    2011-01-01

    International organizations collect data from national authorities to create multivariate cross-sectional time series for their analyses. As data from countries with not yet well-established statistical systems may be incomplete, the bridging of data gaps is a crucial challenge. This paper investigates data structures and missing data patterns in the cross-sectional time series framework, reviews missing value imputation techniques used for micro data in official statistics, and discusses the...

  20. Defining, evaluating, and removing bias induced by linear imputation in longitudinal clinical trials with MNAR missing data.

    Science.gov (United States)

    Helms, Ronald W; Reece, Laura Helms; Helms, Russell W; Helms, Mary W

    2011-03-01

    Missing not at random (MNAR) post-dropout missing data from a longitudinal clinical trial result in the collection of "biased data," which leads to biased estimators and tests of corrupted hypotheses. In a full rank linear model analysis the model equation, E[Y] = Xβ, leads to the definition of the primary parameter β = (X'X)(-1)X'E[Y], and the definition of linear secondary parameters of the form θ = Lβ = L(X'X)(-1)X'E[Y], including, for example, a parameter representing a "treatment effect." These parameters depend explicitly on E[Y], which raises the questions: What is E[Y] when some elements of the incomplete random vector Y are not observed and MNAR, or when such a Y is "completed" via imputation? We develop a rigorous, readily interpretable definition of E[Y] in this context that leads directly to definitions of β, Bias(β) = E[β] - β, Bias(θ) = E[θ] - Lβ, and the extent of hypothesis corruption. These definitions provide a basis for evaluating, comparing, and removing biases induced by various linear imputation methods for MNAR incomplete data from longitudinal clinical trials. Linear imputation methods use earlier data from a subject to impute values for post-dropout missing values and include "Last Observation Carried Forward" (LOCF) and "Baseline Observation Carried Forward" (BOCF), among others. We illustrate the methods of evaluating, comparing, and removing biases and the effects of testing corresponding corrupted hypotheses via a hypothetical but very realistic longitudinal analgesic clinical trial.

  1. Assessment of Consequences of Replacement of System of the Uniform Tax on Imputed Income Patent System of the Taxation

    Directory of Open Access Journals (Sweden)

    Galina A. Manokhina

    2012-11-01

    Full Text Available The article highlights the main questions concerning possible consequences of replacement of nowadays operating system in the form of a single tax in reference to imputed income with patent system of the taxation. The main advantages and drawbacks of new system of the taxation are shown, including the opinion that not the replacement of one special mode of the taxation with another is more effective, but the introduction of patent a taxation system as an auxilary system.

  2. A note on the relationships between multiple imputation, maximum likelihood and fully Bayesian methods for missing responses in linear regression models.

    Science.gov (United States)

    Chen, Qingxia; Ibrahim, Joseph G

    2014-07-01

    Multiple Imputation, Maximum Likelihood and Fully Bayesian methods are the three most commonly used model-based approaches in missing data problems. Although it is easy to show that when the responses are missing at random (MAR), the complete case analysis is unbiased and efficient, the aforementioned methods are still commonly used in practice for this setting. To examine the performance of and relationships between these three methods in this setting, we derive and investigate small sample and asymptotic expressions of the estimates and standard errors, and fully examine how these estimates are related for the three approaches in the linear regression model when the responses are MAR. We show that when the responses are MAR in the linear model, the estimates of the regression coefficients using these three methods are asymptotically equivalent to the complete case estimates under general conditions. One simulation and a real data set from a liver cancer clinical trial are given to compare the properties of these methods when the responses are MAR.

  3. RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning.

    Directory of Open Access Journals (Sweden)

    Ji-Sung Kim

    2018-04-01

    Full Text Available Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest, support vector machines, and gradient-boosted decision trees. RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error, precision, recall, and area under the curve for receiver operating characteristic plots (all p < 10-9. We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1 a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2 uneven accessibility and subjective importance of prophylactic health, (3 possible variation in lifestyle, such as dietary habits, and (4 differences in background genetic variation which predispose to diseases.

  4. A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method.

    Science.gov (United States)

    Yang, Jun-He; Cheng, Ching-Hsue; Chan, Chia-Pan

    2017-01-01

    Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir's water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir's water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.

  5. A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method

    Directory of Open Access Journals (Sweden)

    Jun-He Yang

    2017-01-01

    Full Text Available Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir’s water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir’s water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.

  6. RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

    KAUST Repository

    Kim, Ji-Sung

    2018-04-26

    Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest, support vector machines, and gradient-boosted decision trees). RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error), precision, recall, and area under the curve for receiver operating characteristic plots (all p < 10-9). We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1) a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2) uneven accessibility and subjective importance of prophylactic health, (3) possible variation in lifestyle, such as dietary habits, and (4) differences in background genetic variation which predispose to diseases.

  7. NEPR Geographic Zone Map 2015

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This geographic zone map was created by interpreting satellite and aerial imagery, seafloor topography (bathymetry model), and the new NEPR Benthic Habitat Map...

  8. Ecoscapes: Geographical Patternings of Relations

    Directory of Open Access Journals (Sweden)

    Aimar Ventsel

    2012-06-01

    Full Text Available Book review of the publication Ecoscapes: Geographical Patternings of Relations. Edited by Gary Backhaus and John Murungi. Lanham, Boulder, New York, Toronto, Oxford, Lexington Books, 2006, xxxiii+241 pp.

  9. Ecoscapes: Geographical Patternings of Relations

    Directory of Open Access Journals (Sweden)

    Aimar Ventsel

    2013-01-01

    Full Text Available Book review of the publication Ecoscapes: Geographical Patternings of Relations. Edited by Gary Backhaus and John Murungi. Lanham, Boulder, New York, Toronto, Oxford, Lexington Books, 2006, xxxiii+241 pp.

  10. Geographically isolated wetlands: Rethinking a misnomer

    Science.gov (United States)

    Mushet, David M.; Calhoun, Aram J.K.; Alexander, Laurie C.; Cohen, Matthew J.; DeKeyser, Edward S.; Fowler, Laurie G.; Lane, Charles R.; Lang, Megan W.; Rains, Mark C.; Walls, Susan

    2015-01-01

    We explore the category “geographically isolated wetlands” (GIWs; i.e., wetlands completely surrounded by uplands at the local scale) as used in the wetland sciences. As currently used, the GIW category (1) hampers scientific efforts by obscuring important hydrological and ecological differences among multiple wetland functional types, (2) aggregates wetlands in a manner not reflective of regulatory and management information needs, (3) implies wetlands so described are in some way “isolated,” an often incorrect implication, (4) is inconsistent with more broadly used and accepted concepts of “geographic isolation,” and (5) has injected unnecessary confusion into scientific investigations and discussions. Instead, we suggest other wetland classification systems offer more informative alternatives. For example, hydrogeomorphic (HGM) classes based on well-established scientific definitions account for wetland functional diversity thereby facilitating explorations into questions of connectivity without an a priori designation of “isolation.” Additionally, an HGM-type approach could be used in combination with terms reflective of current regulatory or policymaking needs. For those rare cases in which the condition of being surrounded by uplands is the relevant distinguishing characteristic, use of terminology that does not unnecessarily imply isolation (e.g., “upland embedded wetlands”) would help alleviate much confusion caused by the “geographically isolated wetlands” misnomer.

  11. Campus-Based Geographic Learning: A Field Oriented Teaching Scenario

    Science.gov (United States)

    Jennings, Steven A.; Huber, Thomas P.

    2003-01-01

    The use of field classes and the need for university master planning are presented as a way to enhance learning. This field-oriented, goal-oriented approach to learning is proposed as a general model for university-level geographic education. This approach is presented for physical geography classes, but could also be applied to other subdivisions…

  12. Evolution of research in health geographics through the International Journal of Health Geographics (2002-2015).

    Science.gov (United States)

    Pérez, Sandra; Laperrière, Vincent; Borderon, Marion; Padilla, Cindy; Maignant, Gilles; Oliveau, Sébastien

    2016-01-20

    Health geographics is a fast-developing research area. Subjects broached in scientific literature are most varied, ranging from vectorial diseases to access to healthcare, with a recent revival of themes such as the implication of health in the Smart City, or a predominantly individual-centered approach. Far beyond standard meta-analyses, the present study deliberately adopts the standpoint of questioning space in its foundations, through various authors of the International Journal of Health Geographics, a highly influential journal in that field. The idea is to find space as the common denominator in this specialized literature, as well as its relation to spatial analysis, without for all that trying to tend towards exhaustive approaches. 660 articles have being published in the journal since launch, but 359 articles were selected based on the presence of the word "Space" in either the title, or the abstract or the text over 13 years of the journal's existence. From that database, a lexical analysis (tag cloud) reveals the perception of space in literature, and shows how approaches are evolving, thus underlining that the scope of health geographics is far from narrowing.

  13. Multiple sclerosis: a geographical hypothesis.

    Science.gov (United States)

    Carlyle, I P

    1997-12-01

    Multiple sclerosis remains a rare neurological disease of unknown aetiology, with a unique distribution, both geographically and historically. Rare in equatorial regions, it becomes increasingly common in higher latitudes; historically, it was first clinically recognized in the early nineteenth century. A hypothesis, based on geographical reasoning, is here proposed: that the disease is the result of a specific vitamin deficiency. Different individuals suffer the deficiency in separate and often unique ways. Evidence to support the hypothesis exists in cultural considerations, in the global distribution of the disease, and in its historical prevalence.

  14. Using imputed genotype data in the joint score tests for genetic association and gene-environment interactions in case-control studies.

    Science.gov (United States)

    Song, Minsun; Wheeler, William; Caporaso, Neil E; Landi, Maria Teresa; Chatterjee, Nilanjan

    2018-03-01

    Genome-wide association studies (GWAS) are now routinely imputed for untyped single nucleotide polymorphisms (SNPs) based on various powerful statistical algorithms for imputation trained on reference datasets. The use of predicted allele counts for imputed SNPs as the dosage variable is known to produce valid score test for genetic association. In this paper, we investigate how to best handle imputed SNPs in various modern complex tests for genetic associations incorporating gene-environment interactions. We focus on case-control association studies where inference for an underlying logistic regression model can be performed using alternative methods that rely on varying degree on an assumption of gene-environment independence in the underlying population. As increasingly large-scale GWAS are being performed through consortia effort where it is preferable to share only summary-level information across studies, we also describe simple mechanisms for implementing score tests based on standard meta-analysis of "one-step" maximum-likelihood estimates across studies. Applications of the methods in simulation studies and a dataset from GWAS of lung cancer illustrate ability of the proposed methods to maintain type-I error rates for the underlying testing procedures. For analysis of imputed SNPs, similar to typed SNPs, the retrospective methods can lead to considerable efficiency gain for modeling of gene-environment interactions under the assumption of gene-environment independence. Methods are made available for public use through CGEN R software package. © 2017 WILEY PERIODICALS, INC.

  15. Improving accuracy of genomic prediction in Brangus cattle by adding animals with imputed low-density SNP genotypes.

    Science.gov (United States)

    Lopes, F B; Wu, X-L; Li, H; Xu, J; Perkins, T; Genho, J; Ferretti, R; Tait, R G; Bauck, S; Rosa, G J M

    2018-02-01

    Reliable genomic prediction of breeding values for quantitative traits requires the availability of sufficient number of animals with genotypes and phenotypes in the training set. As of 31 October 2016, there were 3,797 Brangus animals with genotypes and phenotypes. These Brangus animals were genotyped using different commercial SNP chips. Of them, the largest group consisted of 1,535 animals genotyped by the GGP-LDV4 SNP chip. The remaining 2,262 genotypes were imputed to the SNP content of the GGP-LDV4 chip, so that the number of animals available for training the genomic prediction models was more than doubled. The present study showed that the pooling of animals with both original or imputed 40K SNP genotypes substantially increased genomic prediction accuracies on the ten traits. By supplementing imputed genotypes, the relative gains in genomic prediction accuracies on estimated breeding values (EBV) were from 12.60% to 31.27%, and the relative gain in genomic prediction accuracies on de-regressed EBV was slightly small (i.e. 0.87%-18.75%). The present study also compared the performance of five genomic prediction models and two cross-validation methods. The five genomic models predicted EBV and de-regressed EBV of the ten traits similarly well. Of the two cross-validation methods, leave-one-out cross-validation maximized the number of animals at the stage of training for genomic prediction. Genomic prediction accuracy (GPA) on the ten quantitative traits was validated in 1,106 newly genotyped Brangus animals based on the SNP effects estimated in the previous set of 3,797 Brangus animals, and they were slightly lower than GPA in the original data. The present study was the first to leverage currently available genotype and phenotype resources in order to harness genomic prediction in Brangus beef cattle. © 2018 Blackwell Verlag GmbH.

  16. A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods.

    Science.gov (United States)

    Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A

    2015-12-01

    Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.

  17. Changes at the National Geographic Society

    Science.gov (United States)

    Schwille, Kathleen

    2016-01-01

    For more than 125 years, National Geographic has explored the planet, unlocking its secrets and sharing them with the world. For almost thirty of those years, National Geographic has been committed to K-12 educators and geographic education through its Network of Alliances. As National Geographic begins a new chapter, they remain committed to the…

  18. Geographical differences in food allergy.

    Science.gov (United States)

    Bartra, Joan; García-Moral, Alba; Enrique, Ernesto

    2016-06-01

    Food allergy represents a health problem worldwide and leads to life-threatening reactions and even impairs quality of life. Epidemiological data during the past decades is very heterogeneous because of the use of different diagnostic procedures, and most studies have only been performed in specific geographical areas. The aim of this article is to review the available data on the geographical distribution of food allergies at the food source and molecular level and to link food allergy patterns to the aeroallergen influence in each area. Systematic reviews, meta-analysis, studies performed within the EuroPrevall Project and EAACI position papers regarding food allergy were analysed. The prevalence of food allergy sensitization differs between geographical areas, probably as a consequence of differences among populations, their habits and the influence of the cross-reactivity of aeroallergens and other sources of allergens. Geographical differences in food allergy are clearly evident at the allergenic molecular level, which seems to be directly influenced by the aeroallergens of each region and associated with specific clinical patterns.

  19. Educational Geographers and Applied Geography.

    Science.gov (United States)

    Frazier, John W.

    1979-01-01

    Describes the development of applied geography programs and restructuring of curricula with an emphasis on new technique and methodology courses, though retaining the liberal arts role. Educational geographers can help the programs to succeed through curriculum analysis, auditing, advising students, and liaison with other geography sources. (CK)

  20. IL FENOMENO VOLUNTEERED GEOGRAPHIC INFORMATION

    Directory of Open Access Journals (Sweden)

    Flavio Lupia

    2014-12-01

    Full Text Available The contribution addresses the phenomenon of Voluntereed Geographic Informationexplaining these new and burgeoning sources of information offers multidisciplinary scientists an unprecedented opportunity to conduct research on a variety of topics at multiple spatial and temporal scales. In particular the contribution refers to two COST Actions which have been recently activated on the subject which areparticularly relevant for the growing of the European scientific community.

  1. a Conceptual Framework for Virtual Geographic Environments Knowledge Engineering

    Science.gov (United States)

    You, Lan; Lin, Hui

    2016-06-01

    VGE geographic knowledge refers to the abstract and repeatable geo-information which is related to the geo-science problem, geographical phenomena and geographical laws supported by VGE. That includes expert experiences, evolution rule, simulation processes and prediction results in VGE. This paper proposes a conceptual framework for VGE knowledge engineering in order to effectively manage and use geographic knowledge in VGE. Our approach relies on previous well established theories on knowledge engineering and VGE. The main contribution of this report is following: (1) The concepts of VGE knowledge and VGE knowledge engineering which are defined clearly; (2) features about VGE knowledge different with common knowledge; (3) geographic knowledge evolution process that help users rapidly acquire knowledge in VGE; and (4) a conceptual framework for VGE knowledge engineering providing the supporting methodologies system for building an intelligent VGE. This conceptual framework systematically describes the related VGE knowledge theories and key technologies. That will promote the rapid transformation from geodata to geographic knowledge, and furtherly reduce the gap between the data explosion and knowledge absence.

  2. Imputation of genotypes from low density (50,000 markers) to high density (700,000 markers) of cows from research herds in Europe, North America, and Australasia using 2 reference populations

    DEFF Research Database (Denmark)

    Pryce, J E; Johnston, J; Hayes, B J

    2014-01-01

    detection in genome-wide association studies and the accuracy of genomic selection may increase when the low-density genotypes are imputed to higher density. Genotype data were available from 10 research herds: 5 from Europe [Denmark, Germany, Ireland, the Netherlands, and the United Kingdom (UK)], 2 from...... reference populations. Although it was not possible to use a combined reference population, which would probably result in the highest accuracies of imputation, differences arising from using 2 high-density reference populations on imputing 50,000-marker genotypes of 583 animals (from the UK) were...... information exploited. The UK animals were also included in the North American data set (n = 1,579) that was imputed to high density using a reference population of 2,018 bulls. After editing, 591,213 genotypes on 5,999 animals from 10 research herds remained. The correlation between imputed allele...

  3. Plants and geographical names in Croatia.

    Science.gov (United States)

    Cargonja, Hrvoje; Daković, Branko; Alegro, Antun

    2008-09-01

    The main purpose of this paper is to present some general observations, regularities and insights into a complex relationship between plants and people through symbolic systems like geographical names on the territory of Croatia. The basic sources of data for this research were maps from atlas of Croatia of the scale 1:100000. Five groups of maps or areas were selected in order to represent main Croatian phytogeographic regions. A selection of toponyms from each of the map was made in which the name for a plant in Croatian language was recognized (phytotoponyms). Results showed that of all plant names recognized in geographical names the most represented are trees, and among them birch and oak the most. Furthermore, an attempt was made to explain the presence of the most represented plant species in the phytotoponyms in the light of general phytogeographical and sociocultural differences and similarities of comparing areas. The findings confirm an expectation that the genera of climazonal vegetation of particular area are the most represented among the phytotoponyms. Nevertheless, there are ample examples where representation of a plant name in the names of human environment can only be ascribed to ethno-linguistic and socio-cultural motives. Despite the reductionist character of applied methodology, this research also points out some advantages of this approach for ethnobotanic and ethnolinguistic studies of greater areas of human environment.

  4. Leveraging community health worker system to map a mountainous rural district in low resource setting: a low-cost approach to expand use of geographic information systems for public health.

    Science.gov (United States)

    Munyaneza, Fabien; Hirschhorn, Lisa R; Amoroso, Cheryl L; Nyirazinyoye, Laetitia; Birru, Ermyas; Mugunga, Jean Claude; Murekatete, Rachel M; Ntaganira, Joseph

    2014-12-06

    Geographic Information Systems (GIS) have become an important tool in monitoring and improving health services, particularly at local levels. However, GIS data are often unavailable in rural settings and village-level mapping is resource-intensive. This study describes the use of community health workers' (CHW) supervisors to map villages in a mountainous rural district of Northern Rwanda and subsequent use of these data to map village-level variability in safe water availability. We developed a low literacy and skills-focused training in the local language (Kinyarwanda) to train 86 CHW Supervisors and 25 nurses in charge of community health at the health center (HC) and health post (HP) levels to collect the geographic coordinates of the villages using Global Positioning Systems (GPS). Data were validated through meetings with key stakeholders at the sub-district and district levels and joined using ArcMap 10 Geo-processing tools. Costs were calculated using program budgets and activities' records, and compared with the estimated costs of mapping using a separate, trained GIS team. To demonstrate the usefulness of this work, we mapped drinking water sources (DWS) from data collected by CHW supervisors from the chief of the village. DWSs were categorized as safe versus unsafe using World Health Organization definitions. Following training, each CHW Supervisor spent five days collecting data on the villages in their coverage area. Over 12 months, the CHW supervisors mapped the district's 573 villages using 12 shared GPS devices. Sector maps were produced and distributed to local officials. The cost of mapping using CHW supervisors was $29,692, about two times less than the estimated cost of mapping using a trained and dedicated GIS team ($60,112). The availability of local mapping was able to rapidly identify village-level disparities in DWS, with lower access in populations living near to lakes and wetlands (p villages even in mountainous rural areas. These data

  5. Geographical influence of heat stress on milk production of Holstein ...

    African Journals Online (AJOL)

    To model the influence of heat stress on milk production of Holstein dairy herds on pasture in South Africa, the maximum entropy (Maxent) modelling technique was used in a novel approach to model and map optimal milk-producing areas. Geographical locations of farms with top milk-producing Holstein herds on pasture ...

  6. Crowdsourcing sensor tasks to a socio-geographic network

    NARCIS (Netherlands)

    Lasnia, Damian; Broering, Arne; Jirka, Simon; Remke, Albert; Pianho, M.; Santos, M.Y.; Pundt, H.

    2010-01-01

    This work describes an approach of a socio-geographic network for crowdsourcing sensor tasks to a human sensor web. Users can register as human sensors at the system by defining their skills and impact area. Based on that information, submitted sensor tasks are forwarded to the most suitable human

  7. Fungi identify the geographic origin of dust samples.

    Directory of Open Access Journals (Sweden)

    Neal S Grantham

    Full Text Available There is a long history of archaeologists and forensic scientists using pollen found in a dust sample to identify its geographic origin or history. Such palynological approaches have important limitations as they require time-consuming identification of pollen grains, a priori knowledge of plant species distributions, and a sufficient diversity of pollen types to permit spatial or temporal identification. We demonstrate an alternative approach based on DNA sequencing analyses of the fungal diversity found in dust samples. Using nearly 1,000 dust samples collected from across the continental U.S., our analyses identify up to 40,000 fungal taxa from these samples, many of which exhibit a high degree of geographic endemism. We develop a statistical learning algorithm via discriminant analysis that exploits this geographic endemicity in the fungal diversity to correctly identify samples to within a few hundred kilometers of their geographic origin with high probability. In addition, our statistical approach provides a measure of certainty for each prediction, in contrast with current palynology methods that are almost always based on expert opinion and devoid of statistical inference. Fungal taxa found in dust samples can therefore be used to identify the origin of that dust and, more importantly, we can quantify our degree of certainty that a sample originated in a particular place. This work opens up a new approach to forensic biology that could be used by scientists to identify the origin of dust or soil samples found on objects, clothing, or archaeological artifacts.

  8. Geographic Object-Based Image Analysis: Towards a new paradigm

    NARCIS (Netherlands)

    Blaschke, T.; Hay, G.J.; Kelly, M.; Lang, S.; Hofmann, P.; Addink, E.A.|info:eu-repo/dai/nl/224281216; Queiroz Feitosa, R.; van der Meer, F.D.|info:eu-repo/dai/nl/138940908; van der Werff, H.M.A.; van Coillie, F.; Tiede, A.

    2014-01-01

    The amount of scientific literature on (Geographic) Object-based Image Analysis – GEOBIA has been and still is sharply increasing. These approaches to analysing imagery have antecedents in earlier research on image segmentation and use GIS-like spatial analysis within classification and feature

  9. Spatial variation of vulnerability in geographic areas of North Lebanon

    NARCIS (Netherlands)

    Issa, Sahar; van der Molen, I.; Nader, M.R.; Lovett, Jonathan Cranidge

    2014-01-01

    This paper examines the spatial variation in vulnerability between different geographical areas of the northern coastal region of Lebanon within the context of armed conflict. The study is based on the ‘vulnerability of space’ approach and will be positioned in the academic debate on vulnerability

  10. Towards mapping land use patterns from volunteered geographic information

    NARCIS (Netherlands)

    Jokar Arsanjani, J.; Helbich, M.|info:eu-repo/dai/nl/370530349; Bakillah, M.; Hagenauer, J.; Zipf, A.

    2013-01-01

    A large number of applications have been launched to gather geo-located information from the public. This article introduces an approach toward generating land-use patterns from volunteered geographic information (VGI) without applying remote-sensing techniques and/or engaging official data. Hence,

  11. A review of geographic variation and Geographic Information Systems (GIS) applications in prescription drug use research.

    Science.gov (United States)

    Wangia, Victoria; Shireman, Theresa I

    2013-01-01

    While understanding geography's role in healthcare has been an area of research for over 40 years, the application of geography-based analyses to prescription medication use is limited. The body of literature was reviewed to assess the current state of such studies to demonstrate the scale and scope of projects in order to highlight potential research opportunities. To review systematically how researchers have applied geography-based analyses to medication use data. Empiric, English language research articles were identified through PubMed and bibliographies. Original research articles were independently reviewed as to the medications or classes studied, data sources, measures of medication exposure, geographic units of analysis, geospatial measures, and statistical approaches. From 145 publications matching key search terms, forty publications met the inclusion criteria. Cardiovascular and psychotropic classes accounted for the largest proportion of studies. Prescription drug claims were the primary source, and medication exposure was frequently captured as period prevalence. Medication exposure was documented across a variety of geopolitical units such as countries, provinces, regions, states, and postal codes. Most results were descriptive and formal statistical modeling capitalizing on geospatial techniques was rare. Despite the extensive research on small area variation analysis in healthcare, there are a limited number of studies that have examined geographic variation in medication use. Clearly, there is opportunity to collaborate with geographers and GIS professionals to harness the power of GIS technologies and to strengthen future medication studies by applying more robust geospatial statistical methods. Copyright © 2013 Elsevier Inc. All rights reserved.

  12. Geographic Names Information System (GNIS) Structures

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  13. Geographic Names Information System (GNIS) Historical Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  14. Geographic Names Information System (GNIS) Admin Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  15. Geographic Names Information System (GNIS) Hydrography Lines

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  16. Geographic Names Information System (GNIS) Cultural Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  17. Geographic Names Information System (GNIS) Landform Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  18. Geographic Names Information System (GNIS) Hydrography Points

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  19. Geographic Names Information System (GNIS) Community Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  20. Geographic Names Information System (GNIS) Transportation Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  1. Geographic Names Information System (GNIS) Antarctica Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  2. Education and health and well-being: direct and indirect effects with multiple mediators and interactions with multiple imputed data in Stata.

    Science.gov (United States)

    Sheikh, Mashhood Ahmed; Abelsen, Birgit; Olsen, Jan Abel

    2017-11-01

    Previous methods for assessing mediation assume no multiplicative interactions. The inverse odds weighting (IOW) approach has been presented as a method that can be used even when interactions exist. The substantive aim of this study was to assess the indirect effect of education on health and well-being via four indicators of adult socioeconomic status (SES): income, management position, occupational hierarchy position and subjective social status. 8516 men and women from the Tromsø Study (Norway) were followed for 17 years. Education was measured at age 25-74 years, while SES and health and well-being were measured at age 42-91 years. Natural direct and indirect effects (NIE) were estimated using weighted Poisson regression models with IOW. Stata code is provided that makes it easy to assess mediation in any multiple imputed dataset with multiple mediators and interactions. Low education was associated with lower SES. Consequently, low SES was associated with being unhealthy and having a low level of well-being. The effect (NIE) of education on health and well-being is mediated by income, management position, occupational hierarchy position and subjective social status. This study contributes to the literature on mediation analysis, as well as the literature on the importance of education for health-related quality of life and subjective well-being. The influence of education on health and well-being had different pathways in this Norwegian sample. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  3. Conservation Easements and Management by Family Forest Owners: A Propensity Score Matching Approach with Multi-Imputations of Survey Data

    Science.gov (United States)

    Nianfu Song; Francisco X. Aguilar; Brett J. Butler

    2014-01-01

    Increasingly, private landowners are participating in conservation easement programs, but their effects on land management remain to be addressed. Data from the USDA Forest Service National Woodland Owner Survey for the US Northern Region were used to investigate how conservation easement participation is associated with selected past and future forest management...

  4. On Matrix Sampling and Imputation of Context Questionnaires with Implications for the Generation of Plausible Values in Large-Scale Assessments

    Science.gov (United States)

    Kaplan, David; Su, Dan

    2016-01-01

    This article presents findings on the consequences of matrix sampling of context questionnaires for the generation of plausible values in large-scale assessments. Three studies are conducted. Study 1 uses data from PISA 2012 to examine several different forms of missing data imputation within the chained equations framework: predictive mean…

  5. GRIMP: A web- and grid-based tool for high-speed analysis of large-scale genome-wide association using imputed data.

    NARCIS (Netherlands)

    K. Estrada Gil (Karol); A. Abuseiris (Anis); F.G. Grosveld (Frank); A.G. Uitterlinden (André); T.A. Knoch (Tobias); F. Rivadeneira Ramirez (Fernando)

    2009-01-01

    textabstractThe current fast growth of genome-wide association studies (GWAS) combined with now common computationally expensive imputation requires the online access of large user groups to high-performance computing resources capable of analyzing rapidly and efficiently millions of genetic

  6. Estimating Stand Height and Tree Density in Pinus taeda plantations using in-situ data, airborne LiDAR and k-Nearest Neighbor Imputation

    Directory of Open Access Journals (Sweden)

    CARLOS ALBERTO SILVA

    Full Text Available ABSTRACT Accurate forest inventory is of great economic importance to optimize the entire supply chain management in pulp and paper companies. The aim of this study was to estimate stand dominate and mean heights (HD and HM and tree density (TD of Pinus taeda plantations located in South Brazil using in-situ measurements, airborne Light Detection and Ranging (LiDAR data and the non- k-nearest neighbor (k-NN imputation. Forest inventory attributes and LiDAR derived metrics were calculated at 53 regular sample plots and we used imputation models to retrieve the forest attributes at plot and landscape-levels. The best LiDAR-derived metrics to predict HD, HM and TD were H99TH, HSD, SKE and HMIN. The Imputation model using the selected metrics was more effective for retrieving height than tree density. The model coefficients of determination (adj.R2 and a root mean squared difference (RMSD for HD, HM and TD were 0.90, 0.94, 0.38m and 6.99, 5.70, 12.92%, respectively. Our results show that LiDAR and k-NN imputation can be used to predict stand heights with high accuracy in Pinus taeda. However, furthers studies need to be realized to improve the accuracy prediction of TD and to evaluate and compare the cost of acquisition and processing of LiDAR data against the conventional inventory procedures.

  7. Estimating Stand Height and Tree Density in Pinus taeda plantations using in-situ data, airborne LiDAR and k-Nearest Neighbor Imputation.

    Science.gov (United States)

    Silva, Carlos Alberto; Klauberg, Carine; Hudak, Andrew T; Vierling, Lee A; Liesenberg, Veraldo; Bernett, Luiz G; Scheraiber, Clewerson F; Schoeninger, Emerson R

    2018-01-01

    Accurate forest inventory is of great economic importance to optimize the entire supply chain management in pulp and paper companies. The aim of this study was to estimate stand dominate and mean heights (HD and HM) and tree density (TD) of Pinus taeda plantations located in South Brazil using in-situ measurements, airborne Light Detection and Ranging (LiDAR) data and the non- k-nearest neighbor (k-NN) imputation. Forest inventory attributes and LiDAR derived metrics were calculated at 53 regular sample plots and we used imputation models to retrieve the forest attributes at plot and landscape-levels. The best LiDAR-derived metrics to predict HD, HM and TD were H99TH, HSD, SKE and HMIN. The Imputation model using the selected metrics was more effective for retrieving height than tree density. The model coefficients of determination (adj.R2) and a root mean squared difference (RMSD) for HD, HM and TD were 0.90, 0.94, 0.38m and 6.99, 5.70, 12.92%, respectively. Our results show that LiDAR and k-NN imputation can be used to predict stand heights with high accuracy in Pinus taeda. However, furthers studies need to be realized to improve the accuracy prediction of TD and to evaluate and compare the cost of acquisition and processing of LiDAR data against the conventional inventory procedures.

  8. 5 CFR 536.303 - Geographic conversion.

    Science.gov (United States)

    2010-01-01

    ... after geographic conversion is the employee's existing payable rate of basic pay in effect immediately before the action. (b) Geographic conversion when a retained rate employee's official worksite is changed... 5 Administrative Personnel 1 2010-01-01 2010-01-01 false Geographic conversion. 536.303 Section...

  9. Geographic analysis of shigellosis in Vietnam.

    Science.gov (United States)

    Kim, Deok Ryun; Ali, Mohammad; Thiem, Vu Dinh; Park, Jin-Kyung; von Seidlein, Lorenz; Clemens, John

    2008-12-01

    Geographic and ecological analysis may provide investigators useful ecological information for the control of shigellosis. This paper provides distribution of individual Shigella species in space, and ecological covariates for shigellosis in Nha Trang, Vietnam. Data on shigellosis in neighborhoods were used to identify ecological covariates. A Bayesian hierarchical model was used to obtain joint posterior distribution of model parameters and to construct smoothed risk maps for shigellosis. Neighborhoods with a high proportion of worshippers of traditional religion, close proximity to hospital, or close proximity to the river had increased risk for shigellosis. The ecological covariates associated with Shigella flexneri differed from the covariates for Shigella sonnei. In contrast the spatial distribution of the two species was similar. The disease maps can help identify high-risk areas of shigellosis that can be targeted for interventions. This approach may be useful for the selection of populations and the analysis of vaccine trials.

  10. Geographic Knowledge Extraction and Semantic Similarity in OpenStreetMap

    OpenAIRE

    Ballatore, Andrea; Bertolotto, Michela; Wilson, David C.

    2012-01-01

    In recent years, a web phenomenon known as Volunteered Geographic Information (VGI) has produced large crowdsourced geographic data sets. OpenStreetMap (OSM), the leading VGI project, aims at building an open-content world map through user contributions. OSM semantics consists of a set of properties (called 'tags') describing geographic classes, whose usage is defined by project contributors on a dedicated Wiki website. Because of its simple and open semantic structure, the OSM approach often...

  11. [Differentiation of geographic biovariants of smallpox virus by PCR].

    Science.gov (United States)

    Babkin, I V; Babkina, I N

    2010-01-01

    Comparative analysis of amino acid and nucleotides sequences of ORFs located in extended segments of the terminal variable regions in variola virus genome detected a promising locus for viral genotyping according to the geographic origin. This is ORF O1L of VARV. The primers were calculated for synthesis of this ORF fragment by PCR, which makes it possible to distinguish South America-Western Africa genotype from other VARV strains. Subsequent RFLP analysis reliably differentiated Asian strains from African strains (except Western Africa isolates). This method has been tested using 16 VARV strains from various geographic regions. The developed approach is simple, fast and reliable.

  12. Natural Scales in Geographical Patterns

    Science.gov (United States)

    Menezes, Telmo; Roth, Camille

    2017-04-01

    Human mobility is known to be distributed across several orders of magnitude of physical distances, which makes it generally difficult to endogenously find or define typical and meaningful scales. Relevant analyses, from movements to geographical partitions, seem to be relative to some ad-hoc scale, or no scale at all. Relying on geotagged data collected from photo-sharing social media, we apply community detection to movement networks constrained by increasing percentiles of the distance distribution. Using a simple parameter-free discontinuity detection algorithm, we discover clear phase transitions in the community partition space. The detection of these phases constitutes the first objective method of characterising endogenous, natural scales of human movement. Our study covers nine regions, ranging from cities to countries of various sizes and a transnational area. For all regions, the number of natural scales is remarkably low (2 or 3). Further, our results hint at scale-related behaviours rather than scale-related users. The partitions of the natural scales allow us to draw discrete multi-scale geographical boundaries, potentially capable of providing key insights in fields such as epidemiology or cultural contagion where the introduction of spatial boundaries is pivotal.

  13. OUTDOOR EDUCATION AND GEOGRAPHICAL EDUCATION

    Directory of Open Access Journals (Sweden)

    ANDREA GUARAN

    2016-01-01

    Full Text Available This paper focuses on the reflection on the relationship between values and methodological principles of Outdoor Education and spatial and geographical education perspectives, especially in pre-school and primary school, which relates to the age between 3 and 10 years. Outdoor Education is an educational practice that is already rooted in the philosophical thought of the 16th and the 17th centuries, from John Locke to Jean-Jacques Rousseau, and in the pedagogical thought, in particular Friedrich Fröbel, and it has now a quite stable tradition in Northern Europe countries. In Italy, however, there are still few experiences and they usually do not have a systematic and structural modality, but rather a temporarily and experimentally outdoor organization. In the first part, this paper focuses on the reasons that justify a particular attention to educational paths that favour outdoors activities, providing also a definition of outdoor education and highlighting its values. It is also essential to understand that educational programs in open spaces, such as a forest or simply the schoolyard, surely offers the possibility to learn geographical situations. Therefore, the question that arises is how to finalize the best stimulus that the spatial location guarantees for the acquisition of knowledge, skills and abilities about space and geography.

  14. Geographic profiling and animal foraging.

    Science.gov (United States)

    Le Comber, Steven C; Nicholls, Barry; Rossmo, D Kim; Racey, Paul A

    2006-05-21

    Geographic profiling was originally developed as a statistical tool for use in criminal cases, particularly those involving serial killers and rapists. It is designed to help police forces prioritize lists of suspects by using the location of crime scenes to identify the areas in which the criminal is most likely to live. Two important concepts are the buffer zone (criminals are less likely to commit crimes in the immediate vicinity of their home) and distance decay (criminals commit fewer crimes as the distance from their home increases). In this study, we show how the techniques of geographic profiling may be applied to animal data, using as an example foraging patterns in two sympatric colonies of pipistrelle bats, Pipistrellus pipistrellus and P. pygmaeus, in the northeast of Scotland. We show that if model variables are fitted to known roost locations, these variables may be used as numerical descriptors of foraging patterns. We go on to show that these variables can be used to differentiate patterns of foraging in these two species.

  15. Estimating Classification Errors Under Edit Restrictions in Composite Survey-Register Data Using Multiple Imputation Latent Class Modelling (MILC

    Directory of Open Access Journals (Sweden)

    Boeschoten Laura

    2017-12-01

    Full Text Available Both registers and surveys can contain classification errors. These errors can be estimated by making use of a composite data set. We propose a new method based on latent class modelling to estimate the number of classification errors across several sources while taking into account impossible combinations with scores on other variables. Furthermore, the latent class model, by multiply imputing a new variable, enhances the quality of statistics based on the composite data set. The performance of this method is investigated by a simulation study, which shows that whether or not the method can be applied depends on the entropy R2 of the latent class model and the type of analysis a researcher is planning to do. Finally, the method is applied to public data from Statistics Netherlands.

  16. Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels

    Science.gov (United States)

    van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Deelen, Joris; Isaacs, Aaron; Medina-Gomez, Carolina; Mbarek, Hamdi; Kanterakis, Alexandros; Trompet, Stella; Postmus, Iris; Verweij, Niek; van Enckevort, David J.; Huffman, Jennifer E.; White, Charles C.; Feitosa, Mary F.; Bartz, Traci M.; Manichaikul, Ani; Joshi, Peter K.; Peloso, Gina M.; Deelen, Patrick; van Dijk, Freerk; Willemsen, Gonneke; de Geus, Eco J.; Milaneschi, Yuri; Penninx, Brenda W.J.H.; Francioli, Laurent C.; Menelaou, Androniki; Pulit, Sara L.; Rivadeneira, Fernando; Hofman, Albert; Oostra, Ben A.; Franco, Oscar H.; Leach, Irene Mateo; Beekman, Marian; de Craen, Anton J.M.; Uh, Hae-Won; Trochet, Holly; Hocking, Lynne J.; Porteous, David J.; Sattar, Naveed; Packard, Chris J.; Buckley, Brendan M.; Brody, Jennifer A.; Bis, Joshua C.; Rotter, Jerome I.; Mychaleckyj, Josyf C.; Campbell, Harry; Duan, Qing; Lange, Leslie A.; Wilson, James F.; Hayward, Caroline; Polasek, Ozren; Vitart, Veronique; Rudan, Igor; Wright, Alan F.; Rich, Stephen S.; Psaty, Bruce M.; Borecki, Ingrid B.; Kearney, Patricia M.; Stott, David J.; Adrienne Cupples, L.; Neerincx, Pieter B.T.; Elbers, Clara C.; Francesco Palamara, Pier; Pe'er, Itsik; Abdellaoui, Abdel; Kloosterman, Wigard P.; van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F.J.; Stoneking, Mark; de Knijff, Peter; Kayser, Manfred; Veldink, Jan H.; van den Berg, Leonard H.; Byelas, Heorhiy; den Dunnen, Johan T.; Dijkstra, Martijn; Amin, Najaf; Joeri van der Velde, K.; van Setten, Jessica; Kattenberg, Mathijs; van Schaik, Barbera D.C.; Bot, Jan; Nijman, Isaäc J.; Mei, Hailiang; Koval, Vyacheslav; Ye, Kai; Lameijer, Eric-Wubbo; Moed, Matthijs H.; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Sunyaev, Shamil R.; Sohail, Mashaal; Hormozdiari, Fereydoun; Marschall, Tobias; Schönhuth, Alexander; Guryev, Victor; Suchiman, H. Eka D.; Wolffenbuttel, Bruce H.; Platteel, Mathieu; Pitts, Steven J.; Potluri, Shobha; Cox, David R.; Li, Qibin; Li, Yingrui; Du, Yuanping; Chen, Ruoyan; Cao, Hongzhi; Li, Ning; Cao, Sujie; Wang, Jun; Bovenberg, Jasper A.; Jukema, J. Wouter; van der Harst, Pim; Sijbrands, Eric J.; Hottenga, Jouke-Jan; Uitterlinden, Andre G.; Swertz, Morris A.; van Ommen, Gert-Jan B.; de Bakker, Paul I.W.; Eline Slagboom, P.; Boomsma, Dorret I.; Wijmenga, Cisca; van Duijn, Cornelia M.

    2015-01-01

    Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (~35,000 samples) with the population-specific reference panel created by the Genome of the Netherlands Project and perform association testing with blood lipid levels. We report the discovery of five novel associations at four loci (P value <6.61 × 10−4), including a rare missense variant in ABCA6 (rs77542162, p.Cys1359Arg, frequency 0.034), which is predicted to be deleterious. The frequency of this ABCA6 variant is 3.65-fold increased in the Dutch and its effect (βLDL-C=0.135, βTC=0.140) is estimated to be very similar to those observed for single variants in well-known lipid genes, such as LDLR. PMID:25751400

  17. ROMANIA: GEOGRAPHICAL AND GEOPOLITICAL POSITION

    Directory of Open Access Journals (Sweden)

    Ciprian Beniamin Benea

    2016-12-01

    Full Text Available The paper intends to bring to the reader’s attention the importance of understanding the role education plays in creating a good geopolitical position for a state which has a good geographical position, and which is well endowed in natural resources. The case of Romania is the main focus of the paper. There is presented a peculiar strange situation of a country (Romania which is very well located from geographical point of view but which is incapable to exploit its natural endowments and special location. One reason for this situation is the fact that most people living in present Romania belong to a category named in this paper ‘individuals’. Individuals are not aware of their country’s geography and history, let alone its possible future development possibilities. They do not know the role their country could play, and living in an atomized society, they choose emigration as the easiest way to escape harsh social and economic environment. Contrary to this attitude is that of a citizen, a man conscious about his country’s potential, and which is dedicated to work hardly together with his fellows in order to promote national interests in a peaceful manner. Even there was found remnants of an ancient city close to present day Romanian territory – proves of well endowed environment – moral and psychological factors have contributed after 1990 in an crucial manner to push Romania from its civilization path back to the archaic spirit, from active urban spirit to rural mentality. In such a situation it is not uncommon for a nation to lose its means for projecting power, which could promote the value and the importance of a geographical position – transportation; rural mentality has nothing to do with modern transportation as they are technical tools with geopolitical essence for controlling space. It is a well known fact that transportation and geopolitics are closely interrelated. Furthermore, social dissolution in post communist

  18. Estimation of Tree Lists from Airborne Laser Scanning Using Tree Model Clustering and k-MSN Imputation

    Directory of Open Access Journals (Sweden)

    Jörgen Wallerman

    2013-04-01

    Full Text Available Individual tree crowns may be delineated from airborne laser scanning (ALS data by segmentation of surface models or by 3D analysis. Segmentation of surface models benefits from using a priori knowledge about the proportions of tree crowns, which has not yet been utilized for 3D analysis to any great extent. In this study, an existing surface segmentation method was used as a basis for a new tree model 3D clustering method applied to ALS returns in 104 circular field plots with 12 m radius in pine-dominated boreal forest (64°14'N, 19°50'E. For each cluster below the tallest canopy layer, a parabolic surface was fitted to model a tree crown. The tree model clustering identified more trees than segmentation of the surface model, especially smaller trees below the tallest canopy layer. Stem attributes were estimated with k-Most Similar Neighbours (k-MSN imputation of the clusters based on field-measured trees. The accuracy at plot level from the k-MSN imputation (stem density root mean square error or RMSE 32.7%; stem volume RMSE 28.3% was similar to the corresponding results from the surface model (stem density RMSE 33.6%; stem volume RMSE 26.1% with leave-one-out cross-validation for one field plot at a time. Three-dimensional analysis of ALS data should also be evaluated in multi-layered forests since it identified a larger number of small trees below the tallest canopy layer.

  19. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

    Directory of Open Access Journals (Sweden)

    Momoko Horikoshi

    2015-07-01

    Full Text Available Reference panels from the 1000 Genomes (1000G Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS, supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI at genome-wide significance, and two for fasting glucose (FG, none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3 and FG (GCK and G6PC2. The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.

  20. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

    Science.gov (United States)

    Horikoshi, Momoko; Mӓgi, Reedik; van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; Hӓgg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S; Winkler, Thomas W; Willems, Sara M; Pervjakova, Natalia; Esko, Tõnu; Beekman, Marian; Nelson, Christopher P; Willenborg, Christina; Wiltshire, Steven; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K E; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R; Groves, Christopher J; Bennett, Amanda J; Lehtimӓki, Terho; Viikari, Jorma S; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M; Herder, Christian; Grallert, Harald; Müller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M; Karssen, Lennart C; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J; de Craen, Anton J M; Deelen, Joris; Havulinna, Aki S; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D; Samani, Nilesh J; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M; Slagboom, P Eline; Metspalu, Andres; van Duijn, Cornelia M; Eriksson, Johan G; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T; Power, Chris; Penninx, Brenda W J H; de Geus, Eco; Smit, Johannes H; Boomsma, Dorret I; Pedersen, Nancy L; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I; Morris, Andrew P

    2015-07-01

    Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.

  1. Exploring the potentials of volunteered geographic Information as a source for spatial data acquisition

    International Nuclear Information System (INIS)

    Ariffin, Izyana; Solemon, Badariah; Anwar, Rina Md; Din, Marina Md; Azmi, Nor Nashrah

    2014-01-01

    The advancement of technologies nowadays enables participation by nonprofessionals, known as volunteers to participate in producing, sharing and consuming geographic information. Such information, termed as volunteered geographic Information (VGI) has created a new approach of gathering geographic information. This paper discusses the traditional way of acquiring geographic information and potentials of VGI as an information source in GIS applications. We also review four commonly cited applications which rely on volunteers for their geographic information based on five criteria; the geometry type available in the applications, availability of user profile, average number of attributes used in the applications, data type of the information (raster or vector) and the domain the application belongs to. This review serves as a preliminarv study in designing a GIS application used for asset management which aims at exploiting volunteers to produce geographic information related to assets

  2. Random-growth urban model with geographical fitness

    Science.gov (United States)

    Kii, Masanobu; Akimoto, Keigo; Doi, Kenji

    2012-12-01

    This paper formulates a random-growth urban model with a notion of geographical fitness. Using techniques of complex-network theory, we study our system as a type of preferential-attachment model with fitness, and we analyze its macro behavior to clarify the properties of the city-size distributions it predicts. First, restricting the geographical fitness to take positive values and using a continuum approach, we show that the city-size distributions predicted by our model asymptotically approach Pareto distributions with coefficients greater than unity. Then, allowing the geographical fitness to take negative values, we perform local coefficient analysis to show that the predicted city-size distributions can deviate from Pareto distributions, as is often observed in actual city-size distributions. As a result, the model we propose can generate a generic class of city-size distributions, including but not limited to Pareto distributions. For applications to city-population projections, our simple model requires randomness only when new cities are created, not during their subsequent growth. This property leads to smooth trajectories of city population growth, in contrast to other models using Gibrat’s law. In addition, a discrete form of our dynamical equations can be used to estimate past city populations based on present-day data; this fact allows quantitative assessment of the performance of our model. Further study is needed to determine appropriate formulas for the geographical fitness.

  3. Representations built from a true geographic database

    DEFF Research Database (Denmark)

    Bodum, Lars

    2005-01-01

    the whole world in 3d and with a spatial reference given by geographic coordinates. Built on top of this is a customised viewer, based on the Xith(Java) scenegraph. The viewer reads the objects directly from the database and solves the question about Level-Of-Detail on buildings, orientation in relation...... a representation based on geographic and geospatial principles. The system GRIFINOR, developed at 3DGI, Aalborg University, DK, is capable of creating this object-orientation and furthermore does this on top of a true Geographic database. A true Geographic database can be characterized as a database that can cover...

  4. The Oklahoma Geographic Information Retrieval System

    Science.gov (United States)

    Blanchard, W. A.

    1982-01-01

    The Oklahoma Geographic Information Retrieval System (OGIRS) is a highly interactive data entry, storage, manipulation, and display software system for use with geographically referenced data. Although originally developed for a project concerned with coal strip mine reclamation, OGIRS is capable of handling any geographically referenced data for a variety of natural resource management applications. A special effort has been made to integrate remotely sensed data into the information system. The timeliness and synoptic coverage of satellite data are particularly useful attributes for inclusion into the geographic information system.

  5. CONTEMPORARY TRENDS IN GEOGRAPHICAL EDUCATION

    Directory of Open Access Journals (Sweden)

    M. Wasileva

    2017-01-01

    Full Text Available The geography includes rich, diverse and comprehensive themes that give us an understanding of our changing environment and interconnected world. It includes the study of the physical environment and resources; cultures, economies and societies; people and places; and global development and civic participation. As a subject, geography is particularly valuable because it provides information for exploring contemporary issues from a different perspective. This geographical information affects us all at work and in our daily lives and helps us make informed decisions that shape our future. All these facts result in a wide discussion on many topical issues in contemporary geography didactics. Subjects of research are the new geography and economics curriculum as well as construction of modern learning process. The paper presents briefly some of the current trends and key issues of geodidactics. As central notions we consider and analyze the training/educational goals, geography curriculum, target groups and environment of geography training, training methods as well as the information sources used in geography education. We adhere that all the above-mentioned finds its reflection in planning, analysis and assessment of education and thus in its quality and effectiveness.

  6. Geographical variation in cardiovascular incidence: results from the British Women's Heart and Health Study

    Directory of Open Access Journals (Sweden)

    Ebrahim Shah

    2010-11-01

    Full Text Available Abstract Background Prevalence of cardiovascular disease (CVD in women shows regional variations not explained by common risk factors. Analysis of CVD incidence will provide insight into whether there is further divergence between regions with increasing age. Methods Seven-year follow-up data on 2685 women aged 59-80 (mean 69 at baseline from 23 towns in the UK were available from the British Women's Heart and Health Study. Time to fatal or non-fatal CVD was analyzed using Cox regression with adjustment for risk factors, using multiple imputation for missing values. Results Compared to South England, CVD incidence is similar in North England (HR 1.05 (95% CI 0.84, 1.31 and Scotland (0.93 (0.68, 1.27, but lower in Midlands/Wales (0.85 (0.64, 1.12. Event severity influenced regional variation, with South England showing lower fatal incident CVD than other regions, but higher non-fatal incident CVD. Kaplan-Meier plots suggested that regional divergence in CVD occurred before baseline (before mean baseline age of 69. Conclusions In women, regional differences in CVD early in adult life do not further diverge in later life. This may be due to regional differences in early detection, survivorship of women entering the study, or event severity. Targeting health care resources for CVD by geographic variation may not be appropriate for older age-groups.

  7. Research Data Management Training for Geographers: First Impressions

    Directory of Open Access Journals (Sweden)

    Kerstin Helbig

    2016-03-01

    Full Text Available Sharing and secondary analysis of data have become increasingly important for research. Especially in geography, the collection of digital data has grown due to technological changes. Responsible handling and proper documentation of research data have therefore become essential for funders, publishers and higher education institutions. To achieve this goal, universities offer support and training in research data management. This article presents the experiences of a pilot workshop in research data management, especially for geographers. A discipline-specific approach to research data management training is recommended. The focus of this approach increases researchers’ interest and allows for more specific guidance. The instructors identified problems and challenges of research data management for geographers. In regards to training, the communication of benefits and reaching the target groups seem to be the biggest challenges. Consequently, better incentive structures as well as communication channels have to be established.

  8. Prospects for Formation and Development of the Geographical (Territorial) Industrial Clusters in West Kazakhstan Region of the Republic of Kazakhstan

    Science.gov (United States)

    Imashev, Eduard Zh.

    2016-01-01

    The purpose of this research is to develop and implement an economic and geographic approach to forming and developing geographic (territorial) industrial clusters in regions of Kazakhstan. The purpose necessitates the accomplishment of the following scientific objectives: to investigate scientific approaches and experience of territorial economic…

  9. Different methods for analysing and imputation missing values in wind speed series; La problematica de la calidad de la informacion en series de velocidad del viento-metodologias de analisis y imputacion de datos faltantes

    Energy Technology Data Exchange (ETDEWEB)

    Ferreira, A. M.

    2004-07-01

    This study concerns about different methods for analysing and imputation missing values in wind speed series. The algorithm EM and a methodology derivated from the sequential hot deck have been utilized. Series with missing values imputed are compared with original and complete series, using several criteria, such the wind potential; and appears to exist a significant goodness of fit between the estimates and real values. (Author)

  10. CYBERNETICS AND GEOGRAPHICAL EDUCATION: CYBERNETICS OF LEARNING AND LEARNING OF CYBERNETICS

    Directory of Open Access Journals (Sweden)

    M. R. Arpentieva

    2017-01-01

    Full Text Available Modern geographical education implies a broad implementation of innovative technologies, allowing students to fully and deeply understand the subject and methods of professional activity, and effectively and productively act upon this understanding. Therefore, in the work of modern geographer computer and media technologies occupy a significant place, and geographic education occupies an important place in learning cybernetic disciplines: computer technologies act as an important condition for obtaining high quality professional education, as well as an important tool of professional activity of modern specialist-geographer. The article is devoted to comparing three modern approaches to the study and optimization of training Cybernetics and programming in the framework of geographical education: an approach devoted to the study of “learning styles”; the metacognitive approach to learning computer science and programming; and intersubjective, evergetic or actually cybernetic, approach. It describes their advantages and limitations in the context of geographical education, as well as the internal unity as different forms of study of productivity and conditions of the dialogical interaction between teacher and student in the context of obtaining high-quality geographical education.

  11. Rule-guided human classification of Volunteered Geographic Information

    Science.gov (United States)

    Ali, Ahmed Loai; Falomir, Zoe; Schmid, Falko; Freksa, Christian

    2017-05-01

    During the last decade, web technologies and location sensing devices have evolved generating a form of crowdsourcing known as Volunteered Geographic Information (VGI). VGI acted as a platform of spatial data collection, in particular, when a group of public participants are involved in collaborative mapping activities: they work together to collect, share, and use information about geographic features. VGI exploits participants' local knowledge to produce rich data sources. However, the resulting data inherits problematic data classification. In VGI projects, the challenges of data classification are due to the following: (i) data is likely prone to subjective classification, (ii) remote contributions and flexible contribution mechanisms in most projects, and (iii) the uncertainty of spatial data and non-strict definitions of geographic features. These factors lead to various forms of problematic classification: inconsistent, incomplete, and imprecise data classification. This research addresses classification appropriateness. Whether the classification of an entity is appropriate or inappropriate is related to quantitative and/or qualitative observations. Small differences between observations may be not recognizable particularly for non-expert participants. Hence, in this paper, the problem is tackled by developing a rule-guided classification approach. This approach exploits data mining techniques of Association Classification (AC) to extract descriptive (qualitative) rules of specific geographic features. The rules are extracted based on the investigation of qualitative topological relations between target features and their context. Afterwards, the extracted rules are used to develop a recommendation system able to guide participants to the most appropriate classification. The approach proposes two scenarios to guide participants towards enhancing the quality of data classification. An empirical study is conducted to investigate the classification of grass

  12. Socioeconomic and geographic inequalities in adolescent smoking: a multilevel cross-sectional study of 15 year olds in Scotland.

    Science.gov (United States)

    Levin, K A; Dundas, R; Miller, M; McCartney, G

    2014-04-01

    The objective of the study was to present socioeconomic and geographic inequalities in adolescent smoking in Scotland. The international literature suggests there is no obvious pattern in the geography of adolescent smoking, with rural areas having a higher prevalence than urban areas in some countries, and a lower prevalence in others. These differences are most likely due to substantive differences in rurality between countries in terms of their social, built and cultural geography. Previous studies in the UK have shown an association between lower socioeconomic status and smoking. The Scottish Health Behaviour in School-aged Children study surveyed 15 year olds in schools across Scotland between March and June of 2010. We ran multilevel logistic regressions using Markov chain Monte Carlo method and adjusting for age, school type, family affluence, area level deprivation and rurality. We imputed missing rurality and deprivation data using multivariate imputation by chained equations, and re-analysed the data (N = 3577), comparing findings. Among boys, smoking was associated only with area-level deprivation. This relationship appeared to have a quadratic S-shape, with those living in the second most deprived quintile having highest odds of smoking. Among girls, however, odds of smoking increased with deprivation at individual and area-level, with an approximate dose-response relationship for both. Odds of smoking were higher for girls living in remote and rural parts of Scotland than for those living in urban areas. Schools in rural areas were no more or less homogenous than schools in urban areas in terms of smoking prevalence. We discuss possible social and cultural explanations for the high prevalence of boys' and girls' smoking in low SES neighbourhoods and of girls' smoking in rural areas. We consider possible differences in the impact of recent tobacco policy changes, primary socialization, access and availability, retail outlet density and the home

  13. Conceptual Model of Dynamic Geographic Environment

    Directory of Open Access Journals (Sweden)

    Martínez-Rosales Miguel Alejandro

    2014-04-01

    Full Text Available In geographic environments, there are many and different types of geographic entities such as automobiles, trees, persons, buildings, storms, hurricanes, etc. These entities can be classified into two groups: geographic objects and geographic phenomena. By its nature, a geographic environment is dynamic, thus, it’s static modeling is not sufficient. Considering the dynamics of geographic environment, a new type of geographic entity called event is introduced. The primary target is a modeling of geographic environment as an event sequence, because in this case the semantic relations are much richer than in the case of static modeling. In this work, the conceptualization of this model is proposed. It is based on the idea to process each entity apart instead of processing the environment as a whole. After that, the so called history of each entity and its spatial relations to other entities are defined to describe the whole environment. The main goal is to model systems at a conceptual level that make use of spatial and temporal information, so that later it can serve as the semantic engine for such systems.

  14. 25 CFR 571.10 - Geographical location.

    Science.gov (United States)

    2010-04-01

    ... 25 Indians 2 2010-04-01 2010-04-01 false Geographical location. 571.10 Section 571.10 Indians NATIONAL INDIAN GAMING COMMISSION, DEPARTMENT OF THE INTERIOR COMPLIANCE AND ENFORCEMENT PROVISIONS MONITORING AND INVESTIGATIONS Subpoenas and Depositions § 571.10 Geographical location. The attendance of...

  15. The evolution of cooperation on geographical networks

    Science.gov (United States)

    Li, Yixiao; Wang, Yi; Sheng, Jichuan

    2017-11-01

    We study evolutionary public goods game on geographical networks, i.e., complex networks which are located on a geographical plane. The geographical feature effects in two ways: In one way, the geographically-induced network structure influences the overall evolutionary dynamics, and, in the other way, the geographical length of an edge influences the cost when the two players at the two ends interact. For the latter effect, we design a new cost function of cooperators, which simply assumes that the longer the distance between two players, the higher cost the cooperator(s) of them have to pay. In this study, network substrates are generated by a previous spatial network model with a cost-benefit parameter controlling the network topology. Our simulations show that the greatest promotion of cooperation is achieved in the intermediate regime of the parameter, in which empirical estimates of various railway networks fall. Further, we investigate how the distribution of edges' geographical costs influences the evolutionary dynamics and consider three patterns of the distribution: an approximately-equal distribution, a diverse distribution, and a polarized distribution. For normal geographical networks which are generated using intermediate values of the cost-benefit parameter, a diverse distribution hinders the evolution of cooperation, whereas a polarized distribution lowers the threshold value of the amplification factor for cooperation in public goods game. These results are helpful for understanding the evolution of cooperation on real-world geographical networks.

  16. Hierarchical spatial organization of geographical networks

    International Nuclear Information System (INIS)

    Travencolo, Bruno A N; Costa, Luciano da F

    2008-01-01

    In this work, we propose a hierarchical extension of the polygonality index as the means to characterize geographical planar networks. By considering successive neighborhoods around each node, it is possible to obtain more complete information about the spatial order of the network at progressive spatial scales. The potential of the methodology is illustrated with respect to synthetic and real geographical networks

  17. Future Prospects for Geographical Education in Slovenia

    Science.gov (United States)

    Resnic Planinc, Tatjana

    2011-01-01

    This paper deals with future prospects for geographical education in Slovenia, with special emphasis on the development and aims of the didactics of geography. The author discusses the past development of geographical curricula and of competencies of geography teachers, and the education of future teachers of the subject in Slovenia. Her ideas are…

  18. Socioeconomic Development Inequalities among Geographic Units ...

    African Journals Online (AJOL)

    Socio-economic development inequality among geographic units is a phenomenon common in both the developed and developing countries. Regional inequality may result in dissension among geographic units of the same state due to the imbalance in socio-economic development. This study examines the inequality ...

  19. Composing Models of Geographic Physical Processes

    Science.gov (United States)

    Hofer, Barbara; Frank, Andrew U.

    Processes are central for geographic information science; yet geographic information systems (GIS) lack capabilities to represent process related information. A prerequisite to including processes in GIS software is a general method to describe geographic processes independently of application disciplines. This paper presents such a method, namely a process description language. The vocabulary of the process description language is derived formally from mathematical models. Physical processes in geography can be described in two equivalent languages: partial differential equations or partial difference equations, where the latter can be shown graphically and used as a method for application specialists to enter their process models. The vocabulary of the process description language comprises components for describing the general behavior of prototypical geographic physical processes. These process components can be composed by basic models of geographic physical processes, which is shown by means of an example.

  20. Integrative real-time geographic visualization of energy resources

    International Nuclear Information System (INIS)

    Sorokine, A.; Shankar, M.; Stovall, J.; Bhaduri, B.; King, T.; Fernandez, S.; Datar, N.; Omitaomu, O.

    2009-01-01

    'Full text:' Several models forecast that climatic changes will increase the frequency of disastrous events like droughts, hurricanes, and snow storms. Responding to these events and also to power outages caused by system errors such as the 2003 North American blackout require an interconnect-wide real-time monitoring system for various energy resources. Such a system should be capable of providing situational awareness to its users in the government and energy utilities by dynamically visualizing the status of the elements of the energy grid infrastructure and supply chain in geographic contexts. We demonstrate an approach that relies on Google Earth and similar standard-based platforms as client-side geographic viewers with a data-dependent server component. The users of the system can view status information in spatial and temporal contexts. These data can be integrated with a wide range of geographic sources including all standard Google Earth layers and a large number of energy and environmental data feeds. In addition, we show a real-time spatio-temporal data sharing capability across the users of the system, novel methods for visualizing dynamic network data, and a fine-grain access to very large multi-resolution geographic datasets for faster delivery of the data. The system can be extended to integrate contingency analysis results and other grid models to assess recovery and repair scenarios in the case of major disruption. (author)

  1. The Effect of Geographic Units of Analysis on Measuring Geographic Variation in Medical Services Utilization

    Directory of Open Access Journals (Sweden)

    Agnus M. Kim

    2016-07-01

    Full Text Available Objectives: We aimed to evaluate the effect of geographic units of analysis on measuring geographic variation in medical services utilization. For this purpose, we compared geographic variations in the rates of eight major procedures in administrative units (districts and new areal units organized based on the actual health care use of the population in Korea. Methods: To compare geographic variation in geographic units of analysis, we calculated the age–sex standardized rates of eight major procedures (coronary artery bypass graft surgery, percutaneous transluminal coronary angioplasty, surgery after hip fracture, knee-replacement surgery, caesarean section, hysterectomy, computed tomography scan, and magnetic resonance imaging scan from the National Health Insurance database in Korea for the 2013 period. Using the coefficient of variation, the extremal quotient, and the systematic component of variation, we measured geographic variation for these eight procedures in districts and new areal units. Results: Compared with districts, new areal units showed a reduction in geographic variation. Extremal quotients and inter-decile ratios for the eight procedures were lower in new areal units. While the coefficient of variation was lower for most procedures in new areal units, the pattern of change of the systematic component of variation between districts and new areal units differed among procedures. Conclusions: Geographic variation in medical service utilization could vary according to the geographic unit of analysis. To determine how geographic characteristics such as population size and number of geographic units affect geographic variation, further studies are needed.

  2. Ecoregions and ecoregionalization: geographical and ecological perspectives

    Science.gov (United States)

    Loveland, Thomas R.; Merchant, James W.

    2005-01-01

    Ecoregions, i.e., areas exhibiting relative homogeneity of ecosystems, are units of analysis that are increasingly important in environmental assessment and management. Ecoregions provide a holistic framework for flexible, comparative analysis of complex environmental problems. Ecoregions mapping has intellectual foundations in both geography and ecology. However, a hallmark of ecoregions mapping is that it is a truly interdisciplinary endeavor that demands the integration of knowledge from a multitude of sciences. Geographers emphasize the role of place, scale, and both natural and social elements when delineating and characterizing regions. Ecologists tend to focus on environmental processes with special attention given to energy flows and nutrient cycling. Integration of disparate knowledge from the many key sciences has been one of the great challenges of ecoregions mapping, and may lie at the heart of the lack of consensus on the “optimal” approach and methods to use in such work. Through a review of the principal existing US ecoregion maps, issues that should be addressed in order to advance the state of the art are identified. Research related to needs, methods, data sources, data delivery, and validation is needed. It is also important that the academic system foster education so that there is an infusion of new expertise in ecoregion mapping and use.

  3. The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based methods.

    Directory of Open Access Journals (Sweden)

    Rui Martiniano

    2017-07-01

    Full Text Available We analyse new genomic data (0.05-2.95x from 14 ancient individuals from Portugal distributed from the Middle Neolithic (4200-3500 BC to the Middle Bronze Age (1740-1430 BC and impute genomewide diploid genotypes in these together with published ancient Eurasians. While discontinuity is evident in the transition to agriculture across the region, sensitive haplotype-based analyses suggest a significant degree of local hunter-gatherer contribution to later Iberian Neolithic populations. A more subtle genetic influx is also apparent in the Bronze Age, detectable from analyses including haplotype sharing with both ancient and modern genomes, D-statistics and Y-chromosome lineages. However, the limited nature of this introgression contrasts with the major Steppe migration turnovers within third Millennium northern Europe and echoes the survival of non-Indo-European language in Iberia. Changes in genomic estimates of individual height across Europe are also associated with these major cultural transitions, and ancestral components continue to correlate with modern differences in stature.

  4. Probabilistic Flood Mapping using Volunteered Geographical Information

    Science.gov (United States)

    Rivera, S. J.; Girons Lopez, M.; Seibert, J.; Minsker, B. S.

    2016-12-01

    Flood extent maps are widely used by decision makers and first responders to provide critical information that prevents economic impacts and the loss of human lives. These maps are usually obtained from sensory data and/or hydrologic models, which often have limited coverage in space and time. Recent developments in social media and communication technology have created a wealth of near-real-time, user-generated content during flood events in many urban areas, such as flooded locations, pictures of flooding extent and height, etc. These data could improve decision-making and response operations as events unfold. However, the integration of these data sources has been limited due to the need for methods that can extract and translate the data into useful information for decision-making. This study presents an approach that uses volunteer geographic information (VGI) and non-traditional data sources (i.e., Twitter, Flicker, YouTube, and 911 and 311 calls) to generate/update the flood extent maps in areas where no models and/or gauge data are operational. The approach combines Web-crawling and computer vision techniques to gather information about the location, extent, and water height of the flood from unstructured textual data, images, and videos. These estimates are then used to provide an updated flood extent map for areas surrounding the geo-coordinate of the VGI through the application of a Hydro Growing Region Algorithm (HGRA). HGRA combines hydrologic and image segmentation concepts to estimate a probabilistic flooding extent along the corresponding creeks. Results obtained for a case study in Austin, TX (i.e., 2015 Memorial Day flood) were comparable to those obtained by a calibrated hydrologic model and had good spatial correlation with flooding extents estimated by the Federal Emergency Management Agency (FEMA).

  5. Usability of geographic information -- factors identified from qualitative analysis of task-focused user interviews.

    Science.gov (United States)

    Harding, Jenny

    2013-11-01

    Understanding user needs for geographic information and the factors which influence the usability of such information in diverse user contexts is an essential part of user centred development of information products. There is relatively little existing research focused on the design and usability of information products in general. This paper presents a research approach based on semi structured interviews with people working with geographic information on a day to day basis, to establish a reference base of qualitative data on user needs for geographic information with respect to context of use. From this reference data nine key categories of geographic information usability are identified and discussed in the context of limited existing research concerned with geographic information usability. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.

  6. Towards a Geographic Information Systems (GIS) Approach in ...

    African Journals Online (AJOL)

    It has been observed in Kenya that the inadequacy and or the inappropriate land related information systems is the single major constraint in the effective and efficient management of land for housing, security of tenure and determination of property rights. The purpose of this paper is therefore to highlight areas in housing ...

  7. A quantitative approach to social and geographical dialect variation

    NARCIS (Netherlands)

    Wieling, Martijn Benjamin

    2012-01-01

    Dialectvariatie beter in beeld te brengen Het centrale thema van Martijn Wielings proefschrift is het onderzoeken van dialectvariatie. Om een objectief beeld te krijgen van dialectvariatie meten we dialectafstanden op basis van honderden woorden automatisch op basis van genoteerde uitspraken.

  8. Commercial Bee Pollen with Different Geographical Origins: A Comprehensive Approach

    Science.gov (United States)

    Nogueira, Carla; Iglesias, Antonio; Feás, Xesus; Estevinho, Leticia M.

    2012-01-01

    Since the primordial of humanity, pollen has been considered a good source of nutrients and energy. Its promising healing properties have also been referred to. The present study aimed to characterize, for the first time, eight commercial pollens from Portugal and Spain available on the market studying the legislation on labeling, pollinic origin, physicochemical and microbiological analyses and identification of yeasts. Eleven botanical families were found amongst the samples. The most abundant family and the most dominant pollen was Cistaceae. The moisture content, ash, aw, pH, reducing sugars, carbohydrates, proteins, lipids and energy were analyzed and the specific parameters were within the specifications required by some countries with legislation regarding these parameters. Microbiologically commercial pollen showed acceptable safety for the commercial quality and hygiene. All samples showed negative results for toxigenic species. The microorganisms studied were aerobic mesophiles, yeasts and moulds, coliforms, Escherichia coli, Staphylococcus aureus, Salmonella and sulfite-reducing Clostridium. During the work, six yeasts species were isolated from pollen, with Rhodotorula mucilaginosa being the most abundant, as it was present in four samples. PMID:23109845

  9. Commercial Bee Pollen with Different Geographical Origins: A Comprehensive Approach

    Directory of Open Access Journals (Sweden)

    Leticia M. Estevinho

    2012-09-01

    Full Text Available Since the primordial of humanity, pollen has been considered a good source of nutrients and energy. Its promising healing properties have also been referred to. The present study aimed to characterize, for the first time, eight commercial pollens from Portugal and Spain available on the market studying the legislation on labeling, pollinic origin, physicochemical and microbiological analyses and identification of yeasts. Eleven botanical families were found amongst the samples. The most abundant family and the most dominant pollen was Cistaceae. The moisture content, ash, aw, pH, reducing sugars, carbohydrates, proteins, lipids and energy were analyzed and the specific parameters were within the specifications required by some countries with legislation regarding these parameters. Microbiologically commercial pollen showed acceptable safety for the commercial quality and hygiene. All samples showed negative results for toxigenic species. The microorganisms studied were aerobic mesophiles, yeasts and moulds, coliforms, Escherichia coli, Staphylococcus aureus, Salmonella and sulfite-reducing Clostridium. During the work, six yeasts species were isolated from pollen, with Rhodotorula mucilaginosa being the most abundant, as it was present in four samples.

  10. Modeling social networks in geographic space: approach and empirical application

    NARCIS (Netherlands)

    Arentze, T.A.; Berg, van den P.E.W.; Timmermans, H.J.P.

    2012-01-01

    Social activities are responsible for a large proportion of travel demands of individuals. Modeling of the social network of a studied population offers a basis to predict social travel in a more comprehensive way than currently is possible. In this paper we develop a method to generate a whole

  11. A Framework for Conceptual Modeling of Geographic Data Quality

    DEFF Research Database (Denmark)

    Friis-Christensen, Anders; Christensen, J.V.; Jensen, Christian Søndergaard

    2004-01-01

    Sustained advances in wireless communications, geo-positioning, and consumer electronics pave the way to a kind of location-based service that relies on the tracking of the continuously changing positions of an entire population of service users. This type of service is characterized by large...... an object is moving. Empirical performance studies based on a real road network and GPS logs from cars areThe notion of data quality is of particular importance to geographic data. One reason is that such data is often inherently imprecise. Another is that the usability of the data is in large part...... determined by how "good" the data is, as different applications of geographic data require different qualities of the data are met. Such qualities concern the object level as well as the attribute level of the data. This paper presents a systematic and integrated approach to the conceptual modeling...

  12. Geographic Education--Where Have We Failed?

    Science.gov (United States)

    Gritzner, Charles F.

    1981-01-01

    Discusses geography's rather low status and relatively poor public image in the United States and some of the consequences. Among the world's educated industrial nations, the United States ranks among the least literate in a geographical sense. (RM)

  13. Medicare Geographic Variation - Public Use File

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Medicare Geographic Variation Public Use File provides the ability to view demographic, utilization and quality indicators at the state level (including...

  14. GNIS: Geographic Names Information Systems - All features

    Data.gov (United States)

    Earth Data Analysis Center, University of New Mexico — The Geographic Names Information System (GNIS) actively seeks data from and partnerships with Government agencies at all levels and other interested organizations....

  15. Geographic Variation in Medicare Spending Dashboard

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Geographic Variation Dashboards present Medicare fee-for-service per-capita spending at the state and county level in an interactive format. We calculated the...

  16. Genome-wide association study with 1000 genomes imputation identifies signals for nine sex hormone-related phenotypes.

    Science.gov (United States)

    Ruth, Katherine S; Campbell, Purdey J; Chew, Shelby; Lim, Ee Mun; Hadlow, Narelle; Stuckey, Bronwyn G A; Brown, Suzanne J; Feenstra, Bjarke; Joseph, John; Surdulescu, Gabriela L; Zheng, Hou Feng; Richards, J Brent; Murray, Anna; Spector, Tim D; Wilson, Scott G; Perry, John R B

    2016-02-01

    Genetic factors contribute strongly to sex hormone levels, yet knowledge of the regulatory mechanisms remains incomplete. Genome-wide association studies (GWAS) have identified only a small number of loci associated with sex hormone levels, with several reproductive hormones yet to be assessed. The aim of the study was to identify novel genetic variants contributing to the regulation of sex hormones. We performed GWAS using genotypes imputed from the 1000 Genomes reference panel. The study used genotype and phenotype data from a UK twin register. We included 2913 individuals (up to 294 males) from the Twins UK study, excluding individuals receiving hormone treatment. Phenotypes were standardised for age, sex, BMI, stage of menstrual cycle and menopausal status. We tested 7,879,351 autosomal SNPs for association with levels of dehydroepiandrosterone sulphate (DHEAS), oestradiol, free androgen index (FAI), follicle-stimulating hormone (FSH), luteinizing hormone (LH), prolactin, progesterone, sex hormone-binding globulin and testosterone. Eight independent genetic variants reached genome-wide significance (P<5 × 10(-8)), with minor allele frequencies of 1.3-23.9%. Novel signals included variants for progesterone (P=7.68 × 10(-12)), oestradiol (P=1.63 × 10(-8)) and FAI (P=1.50 × 10(-8)). A genetic variant near the FSHB gene was identified which influenced both FSH (P=1.74 × 10(-8)) and LH (P=3.94 × 10(-9)) levels. A separate locus on chromosome 7 was associated with both DHEAS (P=1.82 × 10(-14)) and progesterone (P=6.09 × 10(-14)). This study highlights loci that are relevant to reproductive function and suggests overlap in the genetic basis of hormone regulation.

  17. Multiple Imputation of Groundwater Data to Evaluate Spatial and Temporal Anthropogenic Influences on Subsurface Water Fluxes in Los Angeles, CA

    Science.gov (United States)

    Manago, K. F.; Hogue, T. S.; Hering, A. S.

    2014-12-01

    In the City of Los Angeles, groundwater accounts for 11% of the total water supply on average, and 30% during drought years. Due to ongoing drought in California, increased reliance on local water supply highlights the need for better understanding of regional groundwater dynamics and estimating sustainable groundwater supply. However, in an urban setting, such as Los Angeles, understanding or modeling groundwater levels is extremely complicated due to various anthropogenic influences such as groundwater pumping, artificial recharge, landscape irrigation, leaking infrastructure, seawater intrusion, and extensive impervious surfaces. This study analyzes anthropogenic effects on groundwater levels using groundwater monitoring well data from the County of Los Angeles Department of Public Works. The groundwater data is irregularly sampled with large gaps between samples, resulting in a sparsely populated dataset. A multiple imputation method is used to fill the missing data, allowing for multiple ensembles and improved error estimates. The filled data is interpolated to create spatial groundwater maps utilizing information from all wells. The groundwater data is evaluated at a monthly time step over the last several decades to analyze the effect of land cover and identify other influencing factors on groundwater levels spatially and temporally. Preliminary results show irrigated parks have the largest influence on groundwater fluctuations, resulting in large seasonal changes, exceeding changes in spreading grounds. It is assumed that these fluctuations are caused by watering practices required to sustain non-native vegetation. Conversely, high intensity urbanized areas resulted in muted groundwater fluctuations and behavior decoupling from climate patterns. Results provides improved understanding of anthropogenic effects on groundwater levels in addition to providing high quality datasets for validation of regional groundwater models.

  18. Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study.

    Science.gov (United States)

    Cornish, R P; Macleod, J; Carpenter, J R; Tilling, K

    2017-01-01

    When an outcome variable is missing not at random (MNAR: probability of missingness depends on outcome values), estimates of the effect of an exposure on this outcome are often biased. We investigated the extent of this bias and examined whether the bias can be reduced through incorporating proxy outcomes obtained through linkage to administrative data as auxiliary variables in multiple imputation (MI). Using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) we estimated the association between breastfeeding and IQ (continuous outcome), incorporating linked attainment data (proxies for IQ) as auxiliary variables in MI models. Simulation studies explored the impact of varying the proportion of missing data (from 20 to 80%), the correlation between the outcome and its proxy (0.1-0.9), the strength of the missing data mechanism, and having a proxy variable that was incomplete. Incorporating a linked proxy for the missing outcome as an auxiliary variable reduced bias and increased efficiency in all scenarios, even when 80% of the outcome was missing. Using an incomplete proxy was similarly beneficial. High correlations (> 0.5) between the outcome and its proxy substantially reduced the missing information. Consistent with this, ALSPAC analysis showed inclusion of a proxy reduced bias and improved efficiency. Gains with additional proxies were modest. In longitudinal studies with loss to follow-up, incorporating proxies for this study outcome obtained via linkage to external sources of data as auxiliary variables in MI models can give practically important bias reduction and efficiency gains when the study outcome is MNAR.

  19. Living at a Geographically Higher Elevation Is Associated with Lower Risk of Metabolic Syndrome: Prospective Analysis of the SUN Cohort

    Directory of Open Access Journals (Sweden)

    Amaya Lopez-Pascual

    2017-01-01

    Full Text Available Living in a geographically higher altitude affects oxygen availability. The possible connection between environmental factors and the development of metabolic syndrome (MetS feature is not fully understood, being the available epidemiological evidence still very limited. The aim of the present study was to evaluate the longitudinal association between altitude and incidence of MetS and each of its components in a prospective Spanish cohort, The Seguimiento Universidad de Navarra (SUN project. Our study included 6860 highly educated subjects (university graduates free from any MetS criteria at baseline. The altitude of residence was imputed with the postal code of each individual subject residence according to the data of the Spanish National Cartographic Institute and participants were categorized into tertiles. MetS was defined according to the harmonized definition. Cox proportional hazards models were used to assess the association between the altitude of residence and the risk of MetS during follow-up. After a median follow-up period of 10 years, 462 incident cases of MetS were identified. When adjusting for potential confounders, subjects in the highest category of altitude (>456 m exhibited a significantly lower risk of developing MetS compared to those in the lowest tertile (<122 m of altitude of residence [Model 2: Hazard ratio = 0.75 (95% Confidence interval: 0.58–0.97; p for trend = 0.029]. Living at geographically higher altitude was associated with a lower risk of developing MetS in the SUN project. Our findings suggest that geographical elevation may be an important factor linked to metabolic diseases.

  20. Estimating past hepatitis C infection risk from reported risk factor histories: implications for imputing age of infection and modeling fibrosis progression

    Directory of Open Access Journals (Sweden)

    Busch Michael P

    2007-12-01

    Full Text Available Abstract Background Chronic hepatitis C virus infection is prevalent and often causes hepatic fibrosis, which can progress to cirrhosis and cause liver cancer or liver failure. Study of fibrosis progression often relies on imputing the time of infection, often as the reported age of first injection drug use. We sought to examine the accuracy of such imputation and implications for modeling factors that influence progression rates. Methods We analyzed cross-sectional data on hepatitis C antibody status and reported risk factor histories from two large studies, the Women's Interagency HIV Study and the Urban Health Study, using modern survival analysis methods for current status data to model past infection risk year by year. We compared fitted distributions of past infection risk to reported age of first injection drug use. Results Although injection drug use appeared to be a very strong risk factor, models for both studies showed that many subjects had considerable probability of having been infected substantially before or after their reported age of first injection drug use. Persons reporting younger age of first injection drug use were more likely to have been infected after, and persons reporting older age of first injection drug use were more likely to have been infected before. Conclusion In cross-sectional studies of fibrosis progression where date of HCV infection is estimated from risk factor histories, modern methods such as multiple imputation should be used to account for the substantial uncertainty about when infection occurred. The models presented here can provide the inputs needed by such methods. Using reported age of first injection drug use as the time of infection in studies of fibrosis progression is likely to produce a spuriously strong association of younger age of infection with slower rate of progression.

  1. Similarity of trajectories taking into account geographic context

    Directory of Open Access Journals (Sweden)

    Maike Buchin

    2014-12-01

    Full Text Available The movements of animals, people, and vehicles are embedded in a geographic context. This context influences the movement and may cause the formation of certain behavioral responses. Thus, it is essential to include context parameters in the study of movement and the development of movement pattern analytics. Advances in sensor technologies and positioning devices provide valuable data not only of moving agents but also of the circumstances embedding the movement in space and time. Developing knowledge discovery methods to investigate the relation between movement and its surrounding context is a major challenge in movement analysis today. In this paper we show how to integrate geographic context into the similarity analysis of movement data. For this, we discuss models for geographic context of movement data. Based on this we develop simple but efficient context-aware similarity measures for movement trajectories, which combine a spatial and a contextual distance. These are based on well-known similarity measures for trajectories, such as the Hausdorff, Fréchet, or equal time distance. We validate our approach by applying these measures to movement data of hurricanes and albatross.

  2. Geographical range and local abundance of tree species in China.

    Directory of Open Access Journals (Sweden)

    Haibao Ren

    Full Text Available Most studies on the geographical distribution of species have utilized a few well-known taxa in Europe and North America, with little research in China and its wide range of climate and forest types. We assembled large datasets to quantify the geographic ranges of tree species in China and to test several biogeographic hypotheses: 1 whether locally abundant species tend to be geographically widespread; 2 whether species are more abundant towards their range-centers; and 3 how abundances are correlated between sites. Local abundances of 651 species were derived from four tree plots of 20-25 ha where all individuals ≥1 cm in stem diameter were mapped and identified taxonomically. Range sizes of these species across China were then estimated from over 460,000 geo-referenced records; a Bayesian approach was used, allowing careful measures of error of each range estimate. The log-transformed range sizes had a bell-shaped distribution with a median of 703,000 km(2, and >90% of 651 species had ranges >10(5 km(2. There was no relationship between local abundance and range size, and no evidence for species being more abundant towards their range-centers. Finally, species' abundances were positively correlated between sites. The widespread nature of most tree species in China suggests few are vulnerable to global extinction, and there is no indication of the double-peril that would result if rare species also had narrow ranges.

  3. Geographic Names Information System (GNIS) for Lousiana, Geographic NAD83, USGS (2007) [GNIS_LA_USGS_2007

    Data.gov (United States)

    Louisiana Geographic Information Center — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  4. Comparison of immunization strategies in geographical networks

    Energy Technology Data Exchange (ETDEWEB)

    Wang Bing; Aihara, Kazuyuki [Institute of Industrial Science, The University of Tokyo, Tokyo (Japan)] [ERATO Aihara Complexity Modelling Project, JST, Institute of Industrial Science, University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo, 153-8505 (Japan); Kim, Beom Jun, E-mail: beomjun@skku.ed [BK21 Physics Research Division and Department of Energy Science, Sungkyunkwan University, Suwon 440-746 (Korea, Republic of)] [Department of Computational Biology, School of Computer Science and Communication, Royal Institute of Technology, 100 44 Stockholm (Sweden)

    2009-10-12

    The epidemic spread and immunizations in geographically embedded scale-free (SF) and Watts-Strogatz (WS) networks are numerically investigated. We make a realistic assumption that it takes time which we call the detection time, for a vertex to be identified as infected, and implement two different immunization strategies: one is based on connection neighbors (CN) of the infected vertex with the exact information of the network structure utilized and the other is based on spatial neighbors (SN) with only geographical distances taken into account. We find that the decrease of the detection time is crucial for a successful immunization in general. Simulation results show that for both SF networks and WS networks, the SN strategy always performs better than the CN strategy, especially for more heterogeneous SF networks at long detection time. The observation is verified by checking the number of the infected nodes being immunized. We found that in geographical space, the distance preferences in the network construction process and the geographically decaying infection rate are key factors that make the SN immunization strategy outperforms the CN strategy. It indicates that even in the absence of the full knowledge of network connectivity we can still stop the epidemic spread efficiently only by using geographical information as in the SN strategy, which may have potential applications for preventing the real epidemic spread.

  5. Personality Homophily and Geographic Distance in Facebook.

    Science.gov (United States)

    Noë, Nyala; Whitaker, Roger M; Allen, Stuart M

    2018-05-24

    Personality homophily remains an understudied aspect of social networks, with the traditional focus concerning sociodemographic variables as the basis for assortativity, rather than psychological dispositions. We consider the effect of personality homophily on one of the biggest constraints to human social networks: geographic distance. We use the Big five model of personality to make predictions for each of the five facets: Openness to experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. Using a network of 313,669 Facebook users, we investigate the difference in geographic distance between homophilous pairs, in which both users scored similarly on a particular facet, and mixed pairs. In accordance with our hypotheses, we find that pairs of open and conscientious users are geographically further apart than mixed pairs. Pairs of extraverts, on the other hand, tend to be geographically closer together. We find mixed results for the Neuroticism facet, and no significant effects for the Agreeableness facet. The results are discussed in the context of personality homophily and the impact of geographic distance on social connections.

  6. Comparison of immunization strategies in geographical networks

    International Nuclear Information System (INIS)

    Wang Bing; Aihara, Kazuyuki; Kim, Beom Jun

    2009-01-01

    The epidemic spread and immunizations in geographically embedded scale-free (SF) and Watts-Strogatz (WS) networks are numerically investigated. We make a realistic assumption that it takes time which we call the detection time, for a vertex to be identified as infected, and implement two different immunization strategies: one is based on connection neighbors (CN) of the infected vertex with the exact information of the network structure utilized and the other is based on spatial neighbors (SN) with only geographical distances taken into account. We find that the decrease of the detection time is crucial for a successful immunization in general. Simulation results show that for both SF networks and WS networks, the SN strategy always performs better than the CN strategy, especially for more heterogeneous SF networks at long detection time. The observation is verified by checking the number of the infected nodes being immunized. We found that in geographical space, the distance preferences in the network construction process and the geographically decaying infection rate are key factors that make the SN immunization strategy outperforms the CN strategy. It indicates that even in the absence of the full knowledge of network connectivity we can still stop the epidemic spread efficiently only by using geographical information as in the SN strategy, which may have potential applications for preventing the real epidemic spread.

  7. Interactive segmentation for geographic atrophy in retinal fundus images.

    Science.gov (United States)

    Lee, Noah; Smith, R Theodore; Laine, Andrew F

    2008-10-01

    Fundus auto-fluorescence (FAF) imaging is a non-invasive technique for in vivo ophthalmoscopic inspection of age-related macular degeneration (AMD), the most common cause of blindness in developed countries. Geographic atrophy (GA) is an advanced form of AMD and accounts for 12-21% of severe visual loss in this disorder [3]. Automatic quantification of GA is important for determining disease progression and facilitating clinical diagnosis of AMD. The problem of automatic segmentation of pathological images still remains an unsolved problem. In this paper we leverage the watershed transform and generalized non-linear gradient operators for interactive segmentation and present an intuitive and simple approach for geographic atrophy segmentation. We compare our approach with the state of the art random walker [5] algorithm for interactive segmentation using ROC statistics. Quantitative evaluation experiments on 100 FAF images show a mean sensitivity/specificity of 98.3/97.7% for our approach and a mean sensitivity/specificity of 88.2/96.6% for the random walker algorithm.

  8. Louisiana State Soil Geographic, General Soil Map, Geographic NAD83, NWRC (1998) [statsgo_soils_NWRC_1998

    Data.gov (United States)

    Louisiana Geographic Information Center — This data set contains vector line map information. The vector data contain selected base categories of geographic features, and characteristics of these features,...

  9. Thematic cartography as a geographical application

    Directory of Open Access Journals (Sweden)

    Drago Perko

    2002-12-01

    Full Text Available A thematic map may be a geographical application (tool in itself or the basis for some other geographical work. The development of Slovene thematic cartography accelerated considerably following the independence of the country in 1991. From the viewpoint of content and technology, its greatest achievements are the Geographical Atlas of Slovenia and the National Atlas of Slovenia, which are outstanding achievements at the international level and of great significance for the promotion of Slovenia and Slovene geography and cartography. However, this rapid development has been accompanied by numerous problems, for example, the ignoring of various Slovene and international conventions for the preparation of maps including United Nations resolutions, Slovene and international (SIST ISO, and copyright laws.

  10. Training for Internationalization through Domestic Geographical Dispersion

    DEFF Research Database (Denmark)

    Santangelo, Grazia D.; Stucchi, Tamara

    Traditionally created to deal with the unfriendly domestic environment, business groups (BGs) are increasingly internationalizing. However, how BGs can reconcile their strictly domestic orientation with an international dimension still remains an open question. Drawing on arguments from...... organizational learning, we seek to solve this puzzle in relation to the internationalization of Indian BGs. In particular, we argue that in heterogeneous domestic emerging markets BG’s geographical dispersion across sub-national states provides training for internationalization. To internationalize successfully......, BGs need to develop the capability of managing geographically dispersed units in institutional heterogeneous contexts. Domestic geographical dispersion would indeed help the BG dealing with different regulations, customers and infrastructures. However, there is less scope for such training as BGs...

  11. Geographical data structures supporting regional analysis

    International Nuclear Information System (INIS)

    Edwards, R.G.; Durfee, R.C.

    1978-01-01

    In recent years the computer has become a valuable aid in solving regional environmental problems. Over a hundred different geographic information systems have been developed to digitize, store, analyze, and display spatially distributed data. One important aspect of these systems is the data structure (e.g. grids, polygons, segments) used to model the environment being studied. This paper presents eight common geographic data structures and their use in studies of coal resources, power plant siting, population distributions, LANDSAT imagery analysis, and landuse analysis

  12. Tanzanian food origins and protected geographical indications

    DEFF Research Database (Denmark)

    John, Innocensia Festo; Egelyng, Henrik; Lokina, Azack

    2016-01-01

    As the world's population is constantly growing, food security will remain on the policy Agenda, particularly in Africa. At the same time, global food systems experience a new wave focusing on local foods and food sovereignty featuring high quality food products of verifiable geographical origin...... of food origin products in Tanzania that have potential for GI certification. The hypothesis was that there are origin products in Tanzania whose unique characteristics are linked to the area of production. Geographical indications can be useful policy instruments contributing to food security...... the diversity of supply of natural and unique quality products and so contribute to enhanced food security....

  13. Data Matching Imputation System

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The DMIS dataset is a flat file record of the matching of several data set collections. Primarily it consists of VTRs, dealer records, Observer data in conjunction...

  14. Who cares and how much? The imputed economic contribution to the Canadian healthcare system of middle-aged and older unpaid caregivers providing care to the elderly.

    Science.gov (United States)

    Hollander, Marcus J; Liu, Guiping; Chappell, Neena L

    2009-01-01

    Canadians provide significant amounts of unpaid care to elderly family members and friends with long-term health problems. While some information is available on the nature of the tasks unpaid caregivers perform, and the amounts of time they spend on these tasks, the contribution of unpaid caregivers is often hidden. (It is recognized that some caregiving may be for short periods of time or may entail matters better described as "help" or "assistance," such as providing transportation. However, we use caregiving to cover the full range of unpaid care provided from some basic help to personal care.) Aggregate estimates of the market costs to replace the unpaid care provided are important to governments for policy development as they provide a means to situate the contributions of unpaid caregivers within Canada's healthcare system. The purpose of this study was to obtain an assessment of the imputed costs of replacing the unpaid care provided by Canadians to the elderly. (Imputed costs is used to refer to costs that would be incurred if the care provided by an unpaid caregiver was, instead, provided by a paid caregiver, on a direct hour-for-hour substitution basis.) The economic value of unpaid care as understood in this study is defined as the cost to replace the services provided by unpaid caregivers at rates for paid care providers.

  15. Imputing Variants in HLA-DR Beta Genes Reveals That HLA-DRB1 Is Solely Associated with Rheumatoid Arthritis and Systemic Lupus Erythematosus.

    Directory of Open Access Journals (Sweden)

    Kwangwoo Kim

    Full Text Available The genetic association of HLA-DRB1 with rheumatoid arthritis (RA and systemic lupus erythematosus (SLE is well documented, but association with other HLA-DR beta genes (HLA-DRB3, HLA-DRB4 and HLA-DRB5 has not been thoroughly studied, despite their similar functions and chromosomal positions. We examined variants in all functional HLA-DR beta genes in RA and SLE patients and controls, down to the amino-acid level, to better understand disease association with the HLA-DR locus. To this end, we improved an existing HLA reference panel to impute variants in all protein-coding HLA-DR beta genes. Using the reference panel, HLA variants were inferred from high-density SNP data of 9,271 RA-control subjects and 5,342 SLE-control subjects. Disease association tests were performed by logistic regression and log-likelihood ratio tests. After imputation using the newly constructed HLA reference panel and statistical analysis, we observed that HLA-DRB1 variants better accounted for the association between MHC and susceptibility to RA and SLE than did the other three HLA-DRB variants. Moreover, there were no secondary effects in HLA-DRB3, HLA-DRB4, or HLA-DRB5 in RA or SLE. Of all the HLA-DR beta chain paralogs, those encoded by HLA-DRB1 solely or dominantly influence susceptibility to RA and SLE.

  16. Empirical Bayesian Geographical Mapping of Occupational Accidents among Iranian Workers.

    Science.gov (United States)

    Vahabi, Nasim; Kazemnejad, Anoshirvan; Datta, Somnath

    2017-05-01

    Work-related accidents are believed to be a serious preventable cause of mortality and disability worldwide. This study aimed to provide Bayesian geographical maps of occupational injury rates among workers insured by the Iranian Social Security Organization. The participants included all insured workers in the Iranian Social Security Organization database in 2012. One of the applications of the Bayesian approach called the Poisson-Gamma model was applied to estimate the relative risk of occupational accidents. Data analysis and mapping were performed using R 3.0.3, Open-Bugs 3.2.3 rev 1012 and ArcMap9.3. The majority of all 21,484 investigated occupational injury victims were male (98.3%) including 16,443 (76.5%) single workers aged 20 - 29 years. The accidents were more frequent in basic metal, electric, and non-electric machining jobs. About 0.4% (96) of work-related accidents led to death, 2.2% (457) led to disability (partial and total), 4.6% (980) led to fixed compensation, and 92.8% (19,951) of the injured victims recovered completely. The geographical maps of estimated relative risk of occupational accidents were also provided. The results showed that the highest estimations pertained to provinces which were mostly located along mountain chains, some of which are categorized as deprived provinces in Iran. The study revealed the need for further investigation of the role of economic and climatic factors in high risk areas. The application of geographical mapping together with statistical approaches can provide more accurate tools for policy makers to make better decisions in order to prevent and reduce the risks and adverse outcomes of work-related accidents.

  17. Combining Land Capability Evaluation, Geographic Information ...

    African Journals Online (AJOL)

    Combining Land Capability Evaluation, Geographic Information Systems, AnD Indigenous Technologies for Soil Conservation in Northern Ethiopia. ... Land capability and land use status were established following the procedures of a modified treatment-oriented capability classification using GIS. The case study ...

  18. Geometric algorithms for delineating geographic regions

    NARCIS (Netherlands)

    Reinbacher, I.

    2006-01-01

    Everyone of us is used to geographical regions like the south of Utrecht, the dutch Randstad, or the mountainous areas of Austria. Some of these regions have crisp, fixed boundaries like Utrecht or Austria. Others, like the dutch Randstad and the Austrian mountains, have no such boundaries and are

  19. [Geographic data for Neotropical bats (Chiroptera)].

    Science.gov (United States)

    Noguera-Urbano, Elkin A; Escalante, Tania

    2014-03-01

    The global effort to digitize biodiversity occurrence data from collections, museums and other institutions has stimulated the development of important tools to improve the knowledge and conservation of biodiversity. The Global Biodiversity Information Facility (GBIF) enables and opens access to biodiversity data of 321 million of records, from 379 host institutions. Neotropical bats are a highly diverse and specialized group, and the geographic information about them is increasing since few years ago, but there are a few reports about this topic. The aim of this study was to analyze the number of digital records in GBIF of Neotropical bats with distribution in 21 American countries, evaluating their nomenclatural and geographical consistence at scale of country. Moreover, we evaluated the gaps of information on 1 degrees latitude x 1 degrees longitude grids cells. There were over 1/2 million records, but 58% of them have no latitude and longitude data; and 52% full fit nomenclatural and geographic evaluation. We estimated that there are no records in 54% of the analyzed area; the principal gaps are in biodiversity hotspots like the Colombian and Brazilian Amazonia and Southern Venezuela. In conclusion, our study suggests that available data on GBIF have nomenclatural and geographic biases. GBIF data represent partially the bat species richness and the main gaps in information are in South America.

  20. Europeans among themselves: Geographical and linguistic stereotypes

    NARCIS (Netherlands)

    Mamadouh, V.D.; Dąbrowska, A.; Pisarek, W.; Stickel, G.

    2017-01-01

    Stereotypes can be studied from the perspective of political geography and critical geopolitics as part of geographical imaginations, in other words those geopolitical representations that help us make sense of the world around us. They necessarily frame our perception of ongoing events, and inform

  1. Using Educational Tourism in Geographical Education

    Science.gov (United States)

    Prakapiene, Dalia; Olberkyte, Loreta

    2013-01-01

    The article analyses and defines the concept of educational tourism, presents the structure of the concept and looks into the opportunities for using educational tourism in geographical education. In order to reveal such opportunities a research was carried out in the Lithuanian national and regional parks using the qualitative method of content…

  2. Geographic distribution of wild potato species

    NARCIS (Netherlands)

    Hijmans, R.J.; Spooner, D.M.

    2001-01-01

    The geographic distribution of wild potatoes (Solanaceae sect. Petota) was analyzed using a database of 6073 georeferenced observations. Wild potatoes occur in 16 countries, but 88% of the observations are from Argentina, Bolivia, Mexico, and Peru. Most species are rare and narrowly endemic: for 77

  3. Geography and Geographical Information Science: Interdisciplinary Integrators

    Science.gov (United States)

    Ellul, Claire

    2015-01-01

    To understand how Geography and Geographical Information Science (GIS) can contribute to Interdisciplinary Research (IDR), it is relevant to articulate the differences between the different types of such research. "Multidisciplinary" researchers work in a "parallel play" mode, completing work in their disciplinary work streams…

  4. Geographic pathology of Helicobacter pylori gastritis

    NARCIS (Netherlands)

    Liu, Yi; Ponsioen, Cyriel I. J.; Xiao, Shu-Dong; Tytgat, Guido N. J.; ten Kate, Fiebo J. W.

    2005-01-01

    Background and aim. Helicobacter pylori is etiologically associated with gastritis and gastric cancer. There are significant geographical differences between the clinical manifestation of H. pylori infections. The aim of this study was to compare gastric mucosal histology in relation to age among H.

  5. Execution Management Solutions for Geographically Distributed Simulations

    NARCIS (Netherlands)

    Berg, T.W. van den; Jansen, H.G.M.; Jansen, R.E.J.; Prins, L.M.

    2009-01-01

    Managing the initialization, execution control and monitoring of HLA federates is not always straightforward, especially for a geographically distributed time managed federation. Issues include pre and post run-time data distribution and run-time data collection; starting, stopping and monitoring

  6. Geographic Analysis of Neurosurgery Workforce in Korea.

    Science.gov (United States)

    Park, Hye Ran; Park, Sukh Que; Kim, Jae Hyun; Hwang, Jae Chan; Lee, Gwang Soo; Chang, Jae-Chil

    2018-01-01

    In respect of the health and safety of the public, universal access to health care is an issue of the greatest importance. The geographic distribution of doctors is one of the important factors contributing to access to health care. The aim of this study is to assess the imbalances in the geographic distribution of neurosurgeons across Korea. Population data was obtained from the National Statistical Office. We classified geographic groups into 7 metropolitan cities, 78 non-metropolitan cities, and 77 rural areas. The number of doctors and neurosurgeons per 100000 populations in each county unit was calculated using the total number of doctors and neurosurgeons at the country level from 2009 to 2015. The density levels of neurosurgeon and doctor were calculated and depicted in maps. Between 2009 and 2015, the number of neurosurgeons increased from 2002 to 2557, and the ratio of neurosurgeons per 100000 populations increased from 4.02 to 4.96. The number of neurosurgeons per 100000 populations was highest in metropolitan cities and lowest in rural areas from 2009 to 2015. A comparison of the geographic distribution of neurosurgeons in 2009 and 2015 showed an increase in the regional gap. The neurosurgeon density was affected by country unit characteristics ( p =0.000). Distribution of neurosurgeons throughout Korea is uneven. Neurosurgeons are being increasingly concentrated in a limited number of metropolitan cities. This phenomenon will need to be accounted when planning for a supply of neurosurgeons, allocation of resources and manpower, and the provision of regional neurosurgical services.

  7. Geographic disparity in kidney transplantation under KAS.

    Science.gov (United States)

    Zhou, Sheng; Massie, Allan B; Luo, Xun; Ruck, Jessica M; Chow, Eric K H; Bowring, Mary G; Bae, Sunjae; Segev, Dorry L; Gentry, Sommer E

    2017-12-12

    The Kidney Allocation System fundamentally altered kidney allocation, causing a substantial increase in regional and national sharing that we hypothesized might impact geographic disparities. We measured geographic disparity in deceased donor kidney transplant (DDKT) rate under KAS (6/1/2015-12/1/2016), and compared that with pre-KAS (6/1/2013-12/3/2014). We modeled DSA-level DDKT rates with multilevel Poisson regression, adjusting for allocation factors under KAS. Using the model we calculated a novel, improved metric of geographic disparity: the median incidence rate ratio (MIRR) of transplant rate, a measure of DSA-level variation that accounts for patient casemix and is robust to outlier values. Under KAS, MIRR was 1.75 1.81 1.86 for adults, meaning that similar candidates across different DSAs have a median 1.81-fold difference in DDKT rate. The impact of geography was greater than the impact of factors emphasized by KAS: having an EPTS score ≤20% was associated with a 1.40-fold increase (IRR =  1.35 1.40 1.45 , P geographic disparities with KAS (P = .3). Despite extensive changes to kidney allocation under KAS, geography remains a primary determinant of access to DDKT. © 2017 The American Society of Transplantation and the American Society of Transplant Surgeons.

  8. The National Geographic Society's Teaching Geography Project.

    Science.gov (United States)

    Bockenhauer, Mark H.

    1993-01-01

    Contends that the National Geographic Society's Teaching Geography Project is an inservice teacher education success story. Describes the origins, objectives, and development of the project. Summarizes the impact of the project and contends that its success is the result of the workshop format and guided practice in instructional strategies. (CFR)

  9. GEOGRAPHERS AND ECOSYSTEMS: A POINT OF VIEW

    African Journals Online (AJOL)

    are fearful of tackling it, mainly because they have never studied ecology or any of the pure sciences. Most of these geographers are trained in the arts disciplines and thus feel at a disadvantage even when confronted only by a 'jargon' which is un- familiar. They perceive themselves as being inade- quate and are unhappy ...

  10. The Geographic Extent of Global Supply Chains

    DEFF Research Database (Denmark)

    Machikita, Tomohiro; Ueki, Yasushi

    2012-01-01

    We study the extent to which inter-firm relationships are locally concentrated and what determines firm differences in geographic proximity to domestic or foreign suppliers and customers. From micro-data on selfreported customer and supplier data of firms in Indonesia, the Philippines, Thailand, ...

  11. Geographical information modelling for land resource survey

    NARCIS (Netherlands)

    Bruin, de S.

    2000-01-01

    The increasing popularity of geographical information systems (GIS) has at least three major implications for land resources survey. Firstly, GIS allows alternative and richer representation of spatial phenomena than is possible with the traditional paper map. Secondly, digital technology has

  12. Groundwater quality mapping using geographic information system ...

    African Journals Online (AJOL)

    Spatial variations in ground water quality in the corporation area of Gulbarga City located in the northern part of Karnataka State, India, have been studied using geographic information system (GIS) technique. GIS, a tool which is used for storing, analyzing and displaying spatial data is also used for investigating ground ...

  13. Formal Ontologies and Uncertainty. In Geographical Knowledge

    Directory of Open Access Journals (Sweden)

    Matteo Caglioni

    2014-05-01

    Full Text Available Formal ontologies have proved to be a very useful tool to manage interoperability among data, systems and knowledge. In this paper we will show how formal ontologies can evolve from a crisp, deterministic framework (ontologies of hard knowledge to new probabilistic, fuzzy or possibilistic frameworks (ontologies of soft knowledge. This can considerably enlarge the application potential of formal ontologies in geographic analysis and planning, where soft knowledge is intrinsically linked to the complexity of the phenomena under study.  The paper briefly presents these new uncertainty-based formal ontologies. It then highlights how ontologies are formal tools to define both concepts and relations among concepts. An example from the domain of urban geography finally shows how the cause-to-effect relation between household preferences and urban sprawl can be encoded within a crisp, a probabilistic and a possibilistic ontology, respectively. The ontology formalism will also determine the kind of reasoning that can be developed from available knowledge. Uncertain ontologies can be seen as the preliminary phase of more complex uncertainty-based models. The advantages of moving to uncertainty-based models is evident: whether it is in the analysis of geographic space or in decision support for planning, reasoning on geographic space is almost always reasoning with uncertain knowledge of geographic phenomena.

  14. Ontology-based geographic data set integration

    NARCIS (Netherlands)

    Uitermark, H.T.J.A.; Uitermark, Harry T.; Oosterom, Peter J.M.; Mars, Nicolaas; Molenaar, Martien; Molenaar, M.

    1999-01-01

    In order to develop a system to propagate updates we investigate the semantic and spatial relationships between independently produced geographic data sets of the same region (data set integration). The goal of this system is to reduce operator intervention in update operations between corresponding

  15. Geographic variation in gorilla limb bones.

    Science.gov (United States)

    Jabbour, Rebecca S; Pearman, Tessa L

    2016-06-01

    Gorilla systematics has received increased attention over recent decades from primatologists, conservationists, and paleontologists. Studies of geographic variation in DNA, skulls, and teeth have led to new taxonomic proposals, such as recognition of two gorilla species, Gorilla gorilla (western gorilla) and Gorilla beringei (eastern gorilla). Postcranial differences between mountain gorillas (G. beringei beringei) and western lowland gorillas (G. g. gorilla) have a long history of study, but differences between the limb bones of the eastern and western species have not yet been examined with an emphasis on geographic variation within each species. In addition, proposals for recognition of the Cross River gorilla as Gorilla gorilla diehli and gorillas from Tshiaberimu and Kahuzi as G. b. rex-pymaeorum have not been evaluated in the context of geographic variation in the forelimb and hindlimb skeletons. Forty-three linear measurements were collected from limb bones of 266 adult gorillas representing populations of G. b. beringei, Gorilla beringei graueri, G. g. gorilla, and G. g. diehli in order to investigate geographic diversity. Skeletal elements included the humerus, radius, third metacarpal, third proximal hand phalanx, femur, tibia, calcaneus, first metatarsal, third metatarsal, and third proximal foot phalanx. Comparisons of means and principal components analyses clearly differentiate eastern and western gorillas, indicating that eastern gorillas have absolutely and relatively smaller hands and feet, among other differences. Gorilla subspecies and populations cluster consistently by species, although G. g. diehli may be similar to the eastern gorillas in having small hands and feet. The subspecies of G. beringei are distinguished less strongly and by different variables than the two gorilla species. Populations of G. b. graueri are variable, and Kahuzi and Tshiaberimu specimens do not cluster together. Results support the possible influence of

  16. Regional Geographic Information Systems of Health and Environmental Monitoring

    Directory of Open Access Journals (Sweden)

    Kurolap Semen A.

    2016-12-01

    Full Text Available The article describes a new scientific and methodological approach to designing geographic information systems of health and environmental monitoring for urban areas. Geographic information systems (GIS are analytical tools of the regional health and environmental monitoring; they are used for an integrated assessment of the environmental status of a large industrial centre or a part of it. The authors analyse the environmental situation in Voronezh, a major industrial city, located in the Central Black Earth Region with a population of more than 1 million people. The proposed research methodology is based on modern approaches to the assessment of health risks caused by adverse environmental conditions. The research work was implemented using a GIS and multicriteria probabilistic and statistical evaluation to identify cause-and-effect links, a combination of action and reaction, in the dichotomy ‘environmental factors — public health’. The analysis of the obtained statistical data confirmed an increase in childhood diseases in some areas of the city. Environmentally induced diseases include congenital malformations, tumors, endocrine and urogenital pathologies. The main factors having an adverse impact on health are emissions of carcinogens into the atmosphere and the negative impact of transport on the environment. The authors identify and characterize environmentally vulnerable parts of the city and developed principles of creating an automated system of health monitoring and control of environmental risks. The article offers a number of measures aimed at the reduction of environmental risks, better protection of public health and a more efficient environmental monitoring.

  17. Geographic Object-Based Image Analysis - Towards a new paradigm.

    Science.gov (United States)

    Blaschke, Thomas; Hay, Geoffrey J; Kelly, Maggi; Lang, Stefan; Hofmann, Peter; Addink, Elisabeth; Queiroz Feitosa, Raul; van der Meer, Freek; van der Werff, Harald; van Coillie, Frieke; Tiede, Dirk

    2014-01-01

    The amount of scientific literature on (Geographic) Object-based Image Analysis - GEOBIA has been and still is sharply increasing. These approaches to analysing imagery have antecedents in earlier research on image segmentation and use GIS-like spatial analysis within classification and feature extraction approaches. This article investigates these development and its implications and asks whether or not this is a new paradigm in remote sensing and Geographic Information Science (GIScience). We first discuss several limitations of prevailing per-pixel methods when applied to high resolution images. Then we explore the paradigm concept developed by Kuhn (1962) and discuss whether GEOBIA can be regarded as a paradigm according to this definition. We crystallize core concepts of GEOBIA, including the role of objects, of ontologies and the multiplicity of scales and we discuss how these conceptual developments support important methods in remote sensing such as change detection and accuracy assessment. The ramifications of the different theoretical foundations between the ' per-pixel paradigm ' and GEOBIA are analysed, as are some of the challenges along this path from pixels, to objects, to geo-intelligence. Based on several paradigm indications as defined by Kuhn and based on an analysis of peer-reviewed scientific literature we conclude that GEOBIA is a new and evolving paradigm.

  18. Geographical conceptualization of quality of life

    Directory of Open Access Journals (Sweden)

    Murgaš František

    2016-12-01

    Full Text Available The conceptualization of quality of life in terms of geography is based on two assumptions. The first assumption is that the quality of life consists of two dimensions: subjective and objective. The subjective is known as ‘well-being’, while the objective is the proposed term ‘quality of place’. The second assumption is based on the recognition that quality of life is always a spatial dimension. The concept of quality of life is closely linked with the concept of a good life; geographers enriched this concept by using the term ‘good place’ as a place in which the conditions are created for a good life. The quality of life for individuals in terms of a good place overlaps with the quality of life in society, namely the societal quality of life. The geographical conceptualisation of quality of life is applied to settlements within the city of Liberec.

  19. A Systems Perspective on Volunteered Geographic Information

    Directory of Open Access Journals (Sweden)

    Victoria Fast

    2014-12-01

    Full Text Available Volunteered geographic information (VGI is geographic information collected by way of crowdsourcing. However, the distinction between VGI as an information product and the processes that create VGI is blurred. Clearly, the environment that influences the creation of VGI is different than the information product itself, yet most literature treats them as one and the same. Thus, this research is motivated by the need to formalize and standardize the systems that support the creation of VGI. To this end, we propose a conceptual framework for VGI systems, the main components of which—project, participants, and technical infrastructure—form an environment conducive to the creation of VGI. Drawing on examples from OpenStreetMap, Ushahidi, and RinkWatch, we illustrate the pragmatic relevance of these components. Applying a system perspective to VGI allows us to better understand the components and functionality needed to effectively create VGI.

  20. Geographical information systems and computer cartography

    CERN Document Server

    Jones, Chris B

    2014-01-01

    A concise text presenting the fundamental concepts in Geographical Information Systems (GIS), emphasising an understanding of techniques in management, analysis and graphic display of spatial information. Divided into five parts - the first part reviews the development and application of GIS, followed by a summary of the characteristics and representation of geographical information. It concludes with an overview of the functions provided by typical GIS systems. Part Two introduces co-ordinate systems and map projections, describes methods for digitising map data and gives an overview of remote sensing. Part Three deals with data storage and database management, as well as specialised techniques for accessing spatial data. Spatial modelling and analytical techniques for decision making form the subject of Part Four, while the final part is concerned with graphical representation, emphasising issues of graphics technology, cartographic design and map generalisation.

  1. SOLID WASTE: PRESENCE AND THREATIN GEOGRAPHICAL SPACE

    Directory of Open Access Journals (Sweden)

    Clesley Maria Tavares do Nascimento

    2017-12-01

    Full Text Available This article deals with the trajectory of the solid waste in different historical periods, configuring them as a constructive element of geographical space. The intention to bring the theme from the timeline perspective, is marked out in the conviction of the inseparability of the categories of space and time and its importance in understanding a geographical phenomenon. The methodological support of this research relied on the documentary type of research involving literature, consultation of secondary sources such as books, academic journals, dissertations and theses on the subject. The results presented and discussed in this paper indicated that the production of waste is adjacent to historical time, reflects societies and techniques that generated them, and is a permanent part of the dialectical process of spatial formation.

  2. Discovery and fine-mapping of adiposity loci using high density imputation of genome-wide association studies in individuals of African ancestry: African Ancestry Anthropometry Genetics Consortium.

    Science.gov (United States)

    Ng, Maggie C Y; Graff, Mariaelisa; Lu, Yingchang; Justice, Anne E; Mudgal, Poorva; Liu, Ching-Ti; Young, Kristin; Yanek, Lisa R; Feitosa, Mary F; Wojczynski, Mary K; Rand, Kristin; Brody, Jennifer A; Cade, Brian E; Dimitrov, Latchezar; Duan, Qing; Guo, Xiuqing; Lange, Leslie A; Nalls, Michael A; Okut, Hayrettin; Tajuddin, Salman M; Tayo, Bamidele O; Vedantam, Sailaja; Bradfield, Jonathan P; Chen, Guanjie; Chen, Wei-Min; Chesi, Alessandra; Irvin, Marguerite R; Padhukasahasram, Badri; Smith, Jennifer A; Zheng, Wei; Allison, Matthew A; Ambrosone, Christine B; Bandera, Elisa V; Bartz, Traci M; Berndt, Sonja I; Bernstein, Leslie; Blot, William J; Bottinger, Erwin P; Carpten, John; Chanock, Stephen J; Chen, Yii-Der Ida; Conti, David V; Cooper, Richard S; Fornage, Myriam; Freedman, Barry I; Garcia, Melissa; Goodman, Phyllis J; Hsu, Yu-Han H; Hu, Jennifer; Huff, Chad D; Ingles, Sue A; John, Esther M; Kittles, Rick; Klein, Eric; Li, Jin; McKnight, Barbara; Nayak, Uma; Nemesure, Barbara; Ogunniyi, Adesola; Olshan, Andrew; Press, Michael F; Rohde, Rebecca; Rybicki, Benjamin A; Salako, Babatunde; Sanderson, Maureen; Shao, Yaming; Siscovick, David S; Stanford, Janet L; Stevens, Victoria L; Stram, Alex; Strom, Sara S; Vaidya, Dhananjay; Witte, John S; Yao, Jie; Zhu, Xiaofeng; Ziegler, Regina G; Zonderman, Alan B; Adeyemo, Adebowale; Ambs, Stefan; Cushman, Mary; Faul, Jessica D; Hakonarson, Hakon; Levin, Albert M; Nathanson, Katherine L; Ware, Erin B; Weir, David R; Zhao, Wei; Zhi, Degui; Arnett, Donna K; Grant, Struan F A; Kardia, Sharon L R; Oloapde, Olufunmilayo I; Rao, D C; Rotimi, Charles N; Sale, Michele M; Williams, L Keoki; Zemel, Babette S; Becker, Diane M; Borecki, Ingrid B; Evans, Michele K; Harris, Tamara B; Hirschhorn, Joel N; Li, Yun; Patel, Sanjay R; Psaty, Bruce M; Rotter, Jerome I; Wilson, James G; Bowden, Donald W; Cupples, L Adrienne; Haiman, Christopher A; Loos, Ruth J F; North, Kari E

    2017-04-01

    Genome-wide association studies (GWAS) have identified >300 loci associated with measures of adiposity including body mass index (BMI) and waist-to-hip ratio (adjusted for BMI, WHRadjBMI), but few have been identified through screening of the African ancestry genomes. We performed large scale meta-analyses and replications in up to 52,895 individuals for BMI and up to 23,095 individuals for WHRadjBMI from the African Ancestry Anthropometry Genetics Consortium (AAAGC) using 1000 Genomes phase 1 imputed GWAS to improve coverage of both common and low frequency variants in the low linkage disequilibrium African ancestry genomes. In the sex-combined analyses, we identified one novel locus (TCF7L2/HABP2) for WHRadjBMI and eight previously established loci at P African ancestry individuals. An additional novel locus (SPRYD7/DLEU2) was identified for WHRadjBMI when combined with European GWAS. In the sex-stratified analyses, we identified three novel loci for BMI (INTS10/LPL and MLC1 in men, IRX4/IRX2 in women) and four for WHRadjBMI (SSX2IP, CASC8, PDE3B and ZDHHC1/HSD11B2 in women) in individuals of African ancestry or both African and European ancestry. For four of the novel variants, the minor allele frequency was low (African ancestry sex-combined and sex-stratified analyses, 26 BMI loci and 17 WHRadjBMI loci contained ≤ 20 variants in the credible sets that jointly account for 99% posterior probability of driving the associations. The lead variants in 13 of these loci had a high probability of being causal. As compared to our previous HapMap imputed GWAS for BMI and WHRadjBMI including up to 71,412 and 27,350 African ancestry individuals, respectively, our results suggest that 1000 Genomes imputation showed modest improvement in identifying GWAS loci including low frequency variants. Trans-ethnic meta-analyses further improved fine mapping of putative causal variants in loci shared between the African and European ancestry populations.

  3. Do geographically isolated wetlands influence landscape functions?

    OpenAIRE

    Cohen, Matthew J.; Creed, Irena F.; Alexander, Laurie; Basu, Nandita B.; Calhoun, Aram J. K.; Craft, Christopher; D’Amico, Ellen; DeKeyser, Edward; Fowler, Laurie; Golden, Heather E.; Jawitz, James W.; Kalla, Peter; Kirkman, L. Katherine; Lane, Charles R.; Lang, Megan

    2016-01-01

    Geographically isolated wetlands (GIWs), those surrounded by uplands, exchange materials, energy, and organisms with other elements in hydrological and habitat networks, contributing to landscape functions, such as flow generation, nutrient and sediment retention, and biodiversity support. GIWs constitute most of the wetlands in many North American landscapes, provide a disproportionately large fraction of wetland edges where many functions are enhanced, and form complexes with other water bo...

  4. Geographic Analysis of Neurosurgery Workforce in Korea

    Science.gov (United States)

    Park, Hye Ran; Park, Sukh Que; Kim, Jae Hyun; Hwang, Jae Chan; Lee, Gwang Soo; Chang, Jae-Chil

    2018-01-01

    Objective In respect of the health and safety of the public, universal access to health care is an issue of the greatest importance. The geographic distribution of doctors is one of the important factors contributing to access to health care. The aim of this study is to assess the imbalances in the geographic distribution of neurosurgeons across Korea. Methods Population data was obtained from the National Statistical Office. We classified geographic groups into 7 metropolitan cities, 78 non-metropolitan cities, and 77 rural areas. The number of doctors and neurosurgeons per 100000 populations in each county unit was calculated using the total number of doctors and neurosurgeons at the country level from 2009 to 2015. The density levels of neurosurgeon and doctor were calculated and depicted in maps. Results Between 2009 and 2015, the number of neurosurgeons increased from 2002 to 2557, and the ratio of neurosurgeons per 100000 populations increased from 4.02 to 4.96. The number of neurosurgeons per 100000 populations was highest in metropolitan cities and lowest in rural areas from 2009 to 2015. A comparison of the geographic distribution of neurosurgeons in 2009 and 2015 showed an increase in the regional gap. The neurosurgeon density was affected by country unit characteristics (p=0.000). Conclusion Distribution of neurosurgeons throughout Korea is uneven. Neurosurgeons are being increasingly concentrated in a limited number of metropolitan cities. This phenomenon will need to be accounted when planning for a supply of neurosurgeons, allocation of resources and manpower, and the provision of regional neurosurgical services. PMID:29354242

  5. Geographic assistance of decontamination strategy elaboration

    International Nuclear Information System (INIS)

    Davydchuk, V.; Arapis, G.

    1996-01-01

    Those who elaborates the strategy of decontamination of vast territories is to take into consideration the heterogeneity of such elements of landscape as relief, lithology, humidity and types of soils and, vegetation, both on local and regional level. Geographic assistance includes evaluation of efficacy of decontamination technologies in different natural conditions, identification of areas of their effective application and definition of ecological damage, estimation of balances of the radionuclides in the landscapes to create background of the decontamination strategy

  6. Geographic Analysis of the Radiation Oncology Workforce

    International Nuclear Information System (INIS)

    Aneja, Sanjay; Smith, Benjamin D.; Gross, Cary P.; Wilson, Lynn D.; Haffty, Bruce G.; Roberts, Kenneth; Yu, James B.

    2012-01-01

    Purpose: To evaluate trends in the geographic distribution of the radiation oncology (RO) workforce. Methods and Materials: We used the 1995 and 2007 versions of the Area Resource File to map the ratio of RO to the population aged 65 years or older (ROR) within different health service areas (HSA) within the United States. We used regression analysis to find associations between population variables and 2007 ROR. We calculated Gini coefficients for ROR to assess the evenness of RO distribution and compared that with primary care physicians and total physicians. Results: There was a 24% increase in the RO workforce from 1995 to 2007. The overall growth in the RO workforce was less than that of primary care or the overall physician workforce. The mean ROR among HSAs increased by more than one radiation oncologist per 100,000 people aged 65 years or older, from 5.08 per 100,000 to 6.16 per 100,000. However, there remained consistent geographic variability concerning RO distribution, specifically affecting the non-metropolitan HSAs. Regression analysis found higher ROR in HSAs that possessed higher education (p = 0.001), higher income (p < 0.001), lower unemployment rates (p < 0.001), and higher minority population (p = 0.022). Gini coefficients showed RO distribution less even than for both primary care physicians and total physicians (0.326 compared with 0.196 and 0.292, respectively). Conclusions: Despite a modest growth in the RO workforce, there exists persistent geographic maldistribution of radiation oncologists allocated along socioeconomic and racial lines. To solve problems surrounding the RO workforce, issues concerning both gross numbers and geographic distribution must be addressed.

  7. U Plant Geographic Zone Cleanup Prototype

    International Nuclear Information System (INIS)

    Romine, L.D.; Leary, K.D.; Lackey, M.B.; Robertson, J.R.

    2006-01-01

    The U Plant geographic zone (UPZ) occupies 0.83 square kilometers on the Hanford Site Central Plateau (200 Area). It encompasses the U Plant canyon (221-U Facility), ancillary facilities that supported the canyon, soil waste sites, and underground pipelines. The UPZ cleanup initiative coordinates the cleanup of the major facilities, ancillary facilities, waste sites, and contaminated pipelines (collectively identified as 'cleanup items') within the geographic zone. The UPZ was selected as a geographic cleanup zone prototype for resolving regulatory, technical, and stakeholder issues and demonstrating cleanup methods for several reasons: most of the area is inactive, sufficient characterization information is available to support decisions, cleanup of the high-risk waste sites will help protect the groundwater, and the zone contains a representative cross-section of the types of cleanup actions that will be required in other geographic zones. The UPZ cleanup demonstrates the first of 22 integrated zone cleanup actions on the Hanford Site Central Plateau to address threats to groundwater, the environment, and human health. The UPZ contains more than 100 individual cleanup items. Cleanup actions in the zone will be undertaken using multiple regulatory processes and decision documents. Cleanup actions will include building demolition, waste site and pipeline excavation, and the construction of multiple, large engineered barriers. In some cases, different cleanup actions may be taken at item locations that are immediately adjacent to each other. The cleanup planning and field activities for each cleanup item must be undertaken in a coordinated and cohesive manner to ensure effective execution of the UPZ cleanup initiative. The UPZ zone cleanup implementation plan (ZCIP) [1] was developed to address the need for a fundamental integration tool for UPZ cleanup. As UPZ cleanup planning and implementation moves forward, the ZCIP is intended to be a living document that will

  8. Using Educational Tourism in Geographical Education

    OpenAIRE

    PRAKAPIENĖ, Dalia; OLBERKYTĖ, Loreta

    2014-01-01

    The article analyses and defines the concept of educational tourism, presents the structure of the concept and looks into the opportunities for using educational tourism in geographical education. In order to reveal such opportunities a research was carried out in the Lithuanian national and regional parks using the qualitative method of content analysis and the quantitative method of questionnaire survey. The authors of the research identified the educational excursion activities conducted i...

  9. Geographic Analysis of the Radiation Oncology Workforce

    Energy Technology Data Exchange (ETDEWEB)

    Aneja, Sanjay [Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT (United States); Cancer Outcomes, Policy, and Effectiveness Research Center at Yale, New Haven, CT (United States); Smith, Benjamin D. [University of Texas M. D. Anderson Cancer Center, Houston, TX (United States); Gross, Cary P. [Cancer Outcomes, Policy, and Effectiveness Research Center at Yale, New Haven, CT (United States); Department of General Internal Medicine, Yale University School of Medicine, New Haven, CT (United States); Wilson, Lynn D. [Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT (United States); Haffty, Bruce G. [Cancer Institute of New Jersey, New Brunswick, NJ (United States); Roberts, Kenneth [Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT (United States); Yu, James B., E-mail: james.b.yu@yale.edu [Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT (United States); Cancer Outcomes, Policy, and Effectiveness Research Center at Yale, New Haven, CT (United States)

    2012-04-01

    Purpose: To evaluate trends in the geographic distribution of the radiation oncology (RO) workforce. Methods and Materials: We used the 1995 and 2007 versions of the Area Resource File to map the ratio of RO to the population aged 65 years or older (ROR) within different health service areas (HSA) within the United States. We used regression analysis to find associations between population variables and 2007 ROR. We calculated Gini coefficients for ROR to assess the evenness of RO distribution and compared that with primary care physicians and total physicians. Results: There was a 24% increase in the RO workforce from 1995 to 2007. The overall growth in the RO workforce was less than that of primary care or the overall physician workforce. The mean ROR among HSAs increased by more than one radiation oncologist per 100,000 people aged 65 years or older, from 5.08 per 100,000 to 6.16 per 100,000. However, there remained consistent geographic variability concerning RO distribution, specifically affecting the non-metropolitan HSAs. Regression analysis found higher ROR in HSAs that possessed higher education (p = 0.001), higher income (p < 0.001), lower unemployment rates (p < 0.001), and higher minority population (p = 0.022). Gini coefficients showed RO distribution less even than for both primary care physicians and total physicians (0.326 compared with 0.196 and 0.292, respectively). Conclusions: Despite a modest growth in the RO workforce, there exists persistent geographic maldistribution of radiation oncologists allocated along socioeconomic and racial lines. To solve problems surrounding the RO workforce, issues concerning both gross numbers and geographic distribution must be addressed.

  10. Globalization in history : a geographical perspective

    OpenAIRE

    Crafts, N. F. R.; Venables, Anthony

    2001-01-01

    This paper argues that a geographical perspectie is fundamental to understanding comparative economic development in the context of globalization. Central to this view is the role of agglomeration in productivity performance; size and location matter. The tools of the new economic geography are used to illuminate important epidsodes when the relative position of major eeconmies radically changed; the rise of the United States at the beginning and of East Asia at the end of the twentieth centu...

  11. PEDIATRIC FITNESS: SECULAR TRENDS AND GEOGRAPHIC VARIABILITY

    Directory of Open Access Journals (Sweden)

    Grant R. Tomkinson

    2007-06-01

    Full Text Available DESCRIPTION This book describes and discusses children's physical capacity in terms of aerobic and anaerobic power generation according to secular trends and geographic variability. PURPOSE To discuss the controversial issue of whether present day's children and adolescents are fitter than their equals of the past and whether they are fitter if they live in the more prosperous countries. AUDIENCE Pediatricians, medical practitioners, physical educators, exercise and/or sport scientists, exercise physiologists, personal trainers and graduate students in relevant fields will find this book helpful when dealing with contemporary trends and geographic variability in pediatric fitness. FEATURES The volume starts by examining the general picture on children fitness by the editors. The individual chapter's authors discuses the data gathered since the late 1950s on secular trends and geographic changeability in aerobic and anaerobic pediatric fitness performances of children and adolescents from 23 countries in Africa, Asia, Australasia, Europe, the Middle East and North America. There are chapters proposing that there is proof that there has been a world-wide decline in pediatric aerobic performance in recent decades, relative stability in anaerobic performance, and that the best performing children come from northern and central Europe. In final chapters possible causes to that end are considered, including whether weakening in aerobic performance are the result of distributional or widespread declines, and whether increases in obesity alone can explain the failure in aerobic performance. ASSESSMENT The editors have assembled a volume of Medicine and Sports Science that is necessary and essential reading for all who are interested in understanding and improving the fitness of children. The readers will find useful information in this book on secular trends and geographic variability in pediatric fitness. I believe, the book will serve as a first

  12. Deterrence and Geographical Externalities in Auto Theft

    OpenAIRE

    Marco Gonzalez-Navarro

    2013-01-01

    Understanding the degree of geographical crime displacement is crucial for the design of crime prevention policies. This paper documents changes in automobile theft risk that were generated by the plausibly exogenous introduction of Lojack, a highly effective stolen vehicle recovery device, into a number of new Ford car models in some Mexican states, but not others. Lojack-equipped vehicles in Lojack-coverage states experienced a 48 percent reduction in theft risk due to deterrence effects. H...

  13. GEOGRAPHICAL EDUCATION MEDIATIZATION AND MEDIASECURITY ISSUES

    Directory of Open Access Journals (Sweden)

    M. R. Arpentieva

    2017-01-01

    Full Text Available The article is devoted to the interaction of legal and moral development of media technologies in the context of geographical education. The article summarizes the experience of the theoretical analysis of mediatization in geographic education, the legal and moral aspects of the disorders and ways of their prevention and correction in the process of educational interaction between teacher and student, between student and teacher, mediated mediatechnologies. It is noted that geographical education in the modern world is education, which is closely associated with the use of media technologies. In other types of education the role of media technologies in improving the quality of education is less obvious, in the field of teaching and learning geography, it speaks very clearly. Therefore, the problems associated with its mediatization, are very important and their solution is particularly compelling. These issues are primarily associated with actively flowing social, economic, political and ideological crisis in many communities and countries of the Earth. Many of them as in the “mirror” are reflected in the sphere of high technologies, including media technologies. The article provides guidance and direction to the correction of violations at the individual and social levels.

  14. A Geographical Heuristic Routing Protocol for VANETs

    Science.gov (United States)

    Urquiza-Aguiar, Luis; Tripp-Barba, Carolina; Aguilar Igartua, Mónica

    2016-01-01

    Vehicular ad hoc networks (VANETs) leverage the communication system of Intelligent Transportation Systems (ITS). Recently, Delay-Tolerant Network (DTN) routing protocols have increased their popularity among the research community for being used in non-safety VANET applications and services like traffic reporting. Vehicular DTN protocols use geographical and local information to make forwarding decisions. However, current proposals only consider the selection of the best candidate based on a local-search. In this paper, we propose a generic Geographical Heuristic Routing (GHR) protocol that can be applied to any DTN geographical routing protocol that makes forwarding decisions hop by hop. GHR includes in its operation adaptations simulated annealing and Tabu-search meta-heuristics, which have largely been used to improve local-search results in discrete optimization. We include a complete performance evaluation of GHR in a multi-hop VANET simulation scenario for a reporting service. Our study analyzes all of the meaningful configurations of GHR and offers a statistical analysis of our findings by means of MANOVA tests. Our results indicate that the use of a Tabu list contributes to improving the packet delivery ratio by around 5% to 10%. Moreover, if Tabu is used, then the simulated annealing routing strategy gets a better performance than the selection of the best node used with carry and forwarding (default operation). PMID:27669254

  15. Geographical assemblages of European raptors and owls

    Science.gov (United States)

    López-López, Pascual; Benavent-Corai, José; García-Ripollés, Clara

    2008-09-01

    In this work we look for geographical structure patterns in European raptors (Order: Falconiformes) and owls (Order: Strigiformes). For this purpose we have conducted our research using freely available tools such as statistical software and databases. To perform the study, presence-absence data for the European raptors and owl species (Class Aves) were downloaded from the BirdLife International website. Using the freely available "pvclust" R-package, we applied similarity Jaccard index and cluster analysis in order to delineate biogeographical relationships for European countries. According to the cluster of similarity, we found that Europe is structured into two main geographical assemblages. The larger length branch separated two main groups: one containing Iceland, Greenland and the countries of central, northern and northwestern Europe, and the other group including the countries of eastern, southern and southwestern Europe. Both groups are divided into two main subgroups. According to our results, the European raptors and owls could be considered structured into four meta-communities well delimited by suture zones defined by Remington (1968) [Remington, C.L., 1968. Suture-zones of hybrid interaction between recently joined biotas. Evol. Biol. 2, 321-428]. Climatic oscillations during the Quaternary Ice Ages could explain at least in part the modern geographical distribution of the group.

  16. Ontology for cell-based geographic information

    Science.gov (United States)

    Zheng, Bin; Huang, Lina; Lu, Xinhai

    2009-10-01

    Inter-operability is a key notion in geographic information science (GIS) for the sharing of geographic information (GI). That requires a seamless translation among different information sources. Ontology is enrolled in GI discovery to settle the semantic conflicts for its natural language appearance and logical hierarchy structure, which are considered to be able to provide better context for both human understanding and machine cognition in describing the location and relationships in the geographic world. However, for the current, most studies on field ontology are deduced from philosophical theme and not applicable for the raster expression in GIS-which is a kind of field-like phenomenon but does not physically coincide to the general concept of philosophical field (mostly comes from the physics concepts). That's why we specifically discuss the cell-based GI ontology in this paper. The discussion starts at the investigation of the physical characteristics of cell-based raster GI. Then, a unified cell-based GI ontology framework for the recognition of the raster objects is introduced, from which a conceptual interface for the connection of the human epistemology and the computer world so called "endurant-occurrant window" is developed for the better raster GI discovery and sharing.

  17. The Rebirth of the Theory of Imputation in the Science of Criminal Law: to an Overcoming Stage or an Involution to Pre-Scientific Conceptions?

    Directory of Open Access Journals (Sweden)

    Nicolás Santiago Cordini

    2015-06-01

    Full Text Available The Science of Criminal Law goes through a moment that can be characterized as a “crisis”. Faced with this situation, have been proliferate theories that define themselves as “theories of imputation” that leave, in whole or in part, the theory of crime up to now dominating. The aim of this article is to analyze three theories enrolled under the concept of imputation and determine in which proportion they conserve other they get off the categories proposed by the theory of crime. Then, we will establish in which proportion these theories constitute an advance for the Science of Criminal Law or, on the contrary, they are manifestations of a retreat to a pre-scientific stage.

  18. New insights into the pharmacogenomics of antidepressant response from the GENDEP and STAR*D studies: rare variant analysis and high-density imputation.

    Science.gov (United States)

    Fabbri, C; Tansey, K E; Perlis, R H; Hauser, J; Henigsberg, N; Maier, W; Mors, O; Placentino, A; Rietschel, M; Souery, D; Breen, G; Curtis, C; Sang-Hyuk, L; Newhouse, S; Patel, H; Guipponi, M; Perroud, N; Bondolfi, G; O'Donovan, M; Lewis, G; Biernacka, J M; Weinshilboum, R M; Farmer, A; Aitchison, K J; Craig, I; McGuffin, P; Uher, R; Lewis, C M

    2017-11-21

    Genome-wide association studies have generally failed to identify polymorphisms associated with antidepressant response. Possible reasons include limited coverage of genetic variants that this study tried to address by exome genotyping and dense imputation. A meta-analysis of Genome-Based Therapeutic Drugs for Depression (GENDEP) and Sequenced Treatment Alternatives to Relieve Depression (STAR*D) studies was performed at the single-nucleotide polymorphism (SNP), gene and pathway levels. Coverage of genetic variants was increased compared with previous studies by adding exome genotypes to previously available genome-wide data and using the Haplotype Reference Consortium panel for imputation. Standard quality control was applied. Phenotypes were symptom improvement and remission after 12 weeks of antidepressant treatment. Significant findings were investigated in NEWMEDS consortium samples and Pharmacogenomic Research Network Antidepressant Medication Pharmacogenomic Study (PGRN-AMPS) for replication. A total of 7062 950 SNPs were analyzed in GENDEP (n=738) and STAR*D (n=1409). rs116692768 (P=1.80e-08, ITGA9 (integrin α9)) and rs76191705 (P=2.59e-08, NRXN3 (neurexin 3)) were significantly associated with symptom improvement during citalopram/escitalopram treatment. At the gene level, no consistent effect was found. At the pathway level, the Gene Ontology (GO) terms GO: 0005694 (chromosome) and GO: 0044427 (chromosomal part) were associated with improvement (corrected P=0.007 and 0.045, respectively). The association between rs116692768 and symptom improvement was replicated in PGRN-AMPS (P=0.047), whereas rs76191705 was not. The two SNPs did not replicate in NEWMEDS. ITGA9 codes for a membrane receptor for neurotrophins and NRXN3 is a transmembrane neuronal adhesion receptor involved in synaptic differentiation. Despite their meaningful biological rationale for being involved in antidepressant effect, replication was partial. Further studies may help in clarifying

  19. Treating pre-instrumental data as "missing" data: using a tree-ring-based paleoclimate record and imputations to reconstruct streamflow in the Missouri River Basin

    Science.gov (United States)

    Ho, M. W.; Lall, U.; Cook, E. R.

    2015-12-01

    Advances in paleoclimatology in the past few decades have provided opportunities to expand the temporal perspective of the hydrological and climatological variability across the world. The North American region is particularly fortunate in this respect where a relatively dense network of high resolution paleoclimate proxy records have been assembled. One such network is the annually-resolved Living Blended Drought Atlas (LBDA): a paleoclimate reconstruction of the Palmer Drought Severity Index (PDSI) that covers North America on a 0.5° × 0.5° grid based on tree-ring chronologies. However, the use of the LBDA to assess North American streamflow variability requires a model by which streamflow may be reconstructed. Paleoclimate reconstructions have typically used models that first seek to quantify the relationship between the paleoclimate variable and the environmental variable of interest before extrapolating the relationship back in time. In contrast, the pre-instrumental streamflow is here considered as "missing" data. A method of imputing the "missing" streamflow data, prior to the instrumental record, is applied through multiple imputation using chained equations for streamflow in the Missouri River Basin. In this method, the distribution of the instrumental streamflow and LBDA is used to estimate sets of plausible values for the "missing" streamflow data resulting in a ~600 year-long streamflow reconstruction. Past research into external climate forcings, oceanic-atmospheric variability and its teleconnections, and assessments of rare multi-centennial instrumental records demonstrate that large temporal oscillations in hydrological conditions are unlikely to be captured in most instrumental records. The reconstruction of multi-centennial records of streamflow will enable comprehensive assessments of current and future water resource infrastructure and operations under the existing scope of natural climate variability.

  20. Exploring the Interplay between Rescue Drugs, Data Imputation, and Study Outcomes: Conceptual Review and Qualitative Analysis of an Acute Pain Data Set.

    Science.gov (United States)

    Singla, Neil K; Meske, Diana S; Desjardins, Paul J

    2017-12-01

    In placebo-controlled acute surgical pain studies, provisions must be made for study subjects to receive adequate analgesic therapy. As such, most protocols allow study subjects to receive a pre-specified regimen of open-label analgesic drugs (rescue drugs) as needed. The selection of an appropriate rescue regimen is a critical experimental design choice. We hypothesized that a rescue regimen that is too liberal could lead to all study arms receiving similar levels of pain relief (thereby confounding experimental results), while a regimen that is too stringent could lead to a high subject dropout rate (giving rise to a preponderance of missing data). Despite the importance of rescue regimen as a study design feature, there exist no published review articles or meta-analysis focusing on the impact of rescue therapy on experimental outcomes. Therefore, when selecting a rescue regimen, researchers must rely on clinical factors (what analgesics do patients usually receive in similar surgical scenarios) and/or anecdotal evidence. In the following article, we attempt to bridge this gap by reviewing and discussing the experimental impacts of rescue therapy on a common acute surgical pain population: first metatarsal bunionectomy. The function of this analysis is to (1) create a framework for discussion and future exploration of rescue as a methodological study design feature, (2) discuss the interplay between data imputation techniques and rescue drugs, and (3) inform the readership regarding the impact of data imputation techniques on the validity of study conclusions. Our findings indicate that liberal rescue may degrade assay sensitivity, while stringent rescue may lead to unacceptably high dropout rates.

  1. Geographic Prevalence and Mix of Regional Cuisines in Chinese Cities

    Directory of Open Access Journals (Sweden)

    Jingwei Zhu

    2018-05-01

    Full Text Available Previous research on the geographies of food put a considerable focus on analyzing how different types of food or ingredients are consumed across different places. Little is known, however, about how food culture is manifested through various cooking traditions as well as people’s perceptions over different culinary styles. Using a data set captured from one of the largest online review sites in China (www.dianping.com, this study demonstrates how geo-referenced social review data can be leveraged to better understand the geographic prevalence and mix of regional cuisines in Chinese cities. Based on information of millions of restaurants obtained in selected cities (i.e., provincial capitals and municipalities under direct supervision of the Chinese central government, we first measure by each city the diversity of restaurants that serve regional Chinese cuisines using the Shannon entropy, and analyze how cities with different characteristics are geographically distributed. A hierarchical clustering algorithm is then used to further explore the similarities of consumers’ dining options among these cities. By associating each regional Chinese cuisine to its origin, we then develop a weighted distance measure to quantify the geographic prevalence of each cuisine type. Finally, a popularity index (POPU is introduced to quantify consumers’ preferences for different regional cuisines. We find that: (1 diversity of restaurants among the cities shows an “east–west” contrast that is in general agreement with the socioeconomic divide in China; (2 most of the cities have their own unique characteristics, which are mainly driven by a large market share of the corresponding local cuisine; (3 there exists great heterogeneity of the geographic prevalence of different Chinese cuisines. In particular, Chuan and Xiang, which are famous for their spicy taste, are widely distributed across the mainland China and (4 among the top-tier restaurants ranked

  2. Human-geographical concept of the regional geodemographic system

    Directory of Open Access Journals (Sweden)

    Kateryna Sehida

    2017-10-01

    Full Text Available The synergetic analysis of geodemographic researches indicates that they can be solved with use of modern technologies of management. according to the theory of a sotsioaktogenez, for this purpose it is necessary to define and formulate accurately the purpose of future phase transition, to construct consistent system of the purposes taking into account own and provided resources, to create executive system, effective from the point of view of optimum use of the available methods (technologies and means of activity, and to control and analyze obtaining result. The analysis of results of social management demands the quantitative description and comparison of real result with his expected model (purpose. The offered concept of geodemographic system of the region on the basis of dissipative structures which treats people, groups of people, society is aimed at the development and functioning of the studied system where the special role belongs to implementation of administrative decisions. In article it is covered the generalized structure of the concept, it is revealed her the purpose, an object subject area. It is defined public and spatial localization of a research, in particular within regional, region and local communities. It is identified geodemographic process as composite human and geographical process as sotsioaktogenez (with determination of stages of motivation, system of the purposes, executive system and result from a line item of society and a family as self-development and self-organization (with determination of the internal and external factors supporting and evolutionary resources, mechanisms as process (information exchange, external and internal adaptation. Methodological approaches (geographical, system, synergy, information, historical, research techniques (the analysis of system indices, simulation of a path of development, the component analysis and evaluation and prognostic simulation are opened. Technological procedures

  3. Unity, Plurality and Explanation. The case of geographical economics and its neighbours

    NARCIS (Netherlands)

    C. Marchionni (Caterina)

    2005-01-01

    textabstractGeographical economics (also known as the new economic geography) is a recent approach developed within economics. It aims to provide a unified framework for the study of spatial agglomeration: the spatial concentration of economic activity. This search for unity stands in

  4. The Italian Geographers' Document on the University Education of Future Primary School Teachers

    Science.gov (United States)

    Giorda, Cristiano; Di Palma, Maria Teresa

    2011-01-01

    This article describes an important document compiled by a group of Italian geographers who teach in the Teaching Sciences faculty. Twenty-two university professors in an online community debated concepts and compared ideas in order to establish content, methods and didactic approaches to be applied when training Primary School teachers (pupils…

  5. GNIS: Geographic Names Information Systems - All features (2013)

    Data.gov (United States)

    Earth Data Analysis Center, University of New Mexico — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  6. Urethroplasty: a geographic disparity in care.

    Science.gov (United States)

    Burks, Frank N; Salmon, Scott A; Smith, Aaron C; Santucci, Richard A

    2012-06-01

    Urethroplasty is the gold standard for urethral strictures but its geographic prevalence throughout the United States is unknown. We analyzed where and how often urethroplasty was being performed in the United States compared to other treatment modalities for urethral stricture. De-identified case logs from the American Board of Urology were collected from certifying/recertifying urologists from 2004 to 2009. Results were categorized by ZIP codes to determine the geographic distribution. Case logs from 3,877 urologists (2,533 recertifying and 1,344 certifying) were reviewed including 1,836 urethroplasties, 13,080 urethrotomies and 19,564 urethral dilations. The proportion of urethroplasty varied widely among states (range 0% to 17%). The ratio of urethroplasty-to-urethrotomy/dilation also varied widely from state to state, but overall 1 urethroplasty was performed for every 17 urethrotomies or dilations performed. Certifying urologists were 3 times as likely to perform urethroplasty as recertifying urologists (12% vs 4%, respectively, pUrethroplasties were performed more commonly in states with residency programs (mean 5% vs 3%). Some states reported no urethroplasties during the observation period (Vermont, North Dakota, South Dakota, Maine and West Virginia). To our knowledge this is the first report on the geographic distribution of urethroplasty for urethral stricture disease. There are large variations in the rates of urethroplasty performed throughout the United States, indicating a disparity of care, especially for those regions in which few or no urethroplasties were reported. This disparity may decrease with time as younger certifying urologists are performing 3 times as many urethroplasties as older recertifying urologists. Copyright © 2012 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.

  7. The geosystems of complex geographical atlases

    Directory of Open Access Journals (Sweden)

    Jovanović Jasmina

    2012-01-01

    Full Text Available Complex geographical atlases represent geosystems of different hierarchical rank, complexity and diversity, scale and connection. They represent a set of large number of different pieces of information about geospace. Also, they contain systematized, correlative and in the apparent form represented pieces of information about space. The degree of information revealed in the atlas is precisely explained by its content structure and the form of presentation. The quality of atlas depends on the method of visualization of data and the quality of geodata. Cartographic visualization represents cognitive process. The analysis converts geospatial data into knowledge. A complex geographical atlas represents information complex of spatial - temporal coordinated database on geosystems of different complexity and territorial scope. Each geographical atlas defines a concrete geosystem. Systemic organization (structural and contextual determines its complexity and concreteness. In complex atlases, the attributes of geosystems are modeled and pieces of information are given in systematized, graphically unique form. The atlas can be considered as a database. In composing a database, semantic analysis of data is important. The result of semantic modeling is expressed in structuring of data information, in emphasizing logic connections between phenomena and processes and in defining their classes according to the degree of similarity. Accordingly, the efficiency of research of needed pieces of information in the process of the database use is enabled. An atlas map has a special power to integrate sets of geodata and present information contents in user - friendly and understandable visual and tactile way using its visual ability. Composing an atlas by systemic cartography requires the pieces of information on concrete - defined geosystems of different hierarchical level, the application of scientific methods and making of adequate number of analytical, synthetic

  8. Geographic Information Systems and Web Page Development

    Science.gov (United States)

    Reynolds, Justin

    2004-01-01

    The Facilities Engineering and Architectural Branch is responsible for the design and maintenance of buildings, laboratories, and civil structures. In order to improve efficiency and quality, the FEAB has dedicated itself to establishing a data infrastructure based on Geographic Information Systems, GIS. The value of GIS was explained in an article dating back to 1980 entitled "Need for a Multipurpose Cadastre" which stated, "There is a critical need for a better land-information system in the United States to improve land-conveyance procedures, furnish a basis for equitable taxation, and provide much-needed information for resource management and environmental planning." Scientists and engineers both point to GIS as the solution. What is GIS? According to most text books, Geographic Information Systems is a class of software that stores, manages, and analyzes mapable features on, above, or below the surface of the earth. GIS software is basically database management software to the management of spatial data and information. Simply put, Geographic Information Systems manage, analyze, chart, graph, and map spatial information. GIS can be broken down into two main categories, urban GIS and natural resource GIS. Further still, natural resource GIS can be broken down into six sub-categories, agriculture, forestry, wildlife, catchment management, archaeology, and geology/mining. Agriculture GIS has several applications, such as agricultural capability analysis, land conservation, market analysis, or whole farming planning. Forestry GIs can be used for timber assessment and management, harvest scheduling and planning, environmental impact assessment, and pest management. GIS when used in wildlife applications enables the user to assess and manage habitats, identify and track endangered and rare species, and monitor impact assessment.

  9. Epidemiology of hip fracture: Worldwide geographic variation

    Directory of Open Access Journals (Sweden)

    Dinesh K Dhanwal

    2011-01-01

    Full Text Available Osteoporosis is a major health problem, especially in elderly populations, and is associated with fragility fractures at the hip, spine, and wrist. Hip fracture contributes to both morbidity and mortality in the elderly. The demographics of world populations are set to change, with more elderly living in developing countries, and it has been estimated that by 2050 half of hip fractures will occur in Asia. This review conducted using the PubMed database describes the incidence of hip fracture in different regions of the world and discusses the possible causes of this wide geographic variation. The analysis of data from different studies show a wide geographic variation across the world, with higher hip fracture incidence reported from industrialized countries as compared to developing countries. The highest hip fracture rates are seen in North Europe and the US and lowest in Latin America and Africa. Asian countries such as Kuwait, Iran, China, and Hong Kong show intermediate hip fracture rates. There is also a north-south gradient seen in European studies, and more fractures are seen in the north of the US than in the south. The factors responsible of this variation are population demographics (with more elderly living in countries with higher incidence rates and the influence of ethnicity, latitude, and environmental factors. The understanding of this changing geographic variation will help policy makers to develop strategies to reduce the burden of hip fractures in developing countries such as India, which will face the brunt of this problem over the coming decades.

  10. Virtual Globe Games for Geographic Learning

    Directory of Open Access Journals (Sweden)

    Ola Ahlqvist

    2010-02-01

    Full Text Available Virtual, online maps and globes allow for volunteered geographic information to capitalize on users as sensors and generate unprecedented access to information resources and services. These new "Web 2.0" applications will probably dominate development and use of virtual globes and maps in the near future. We present an experimental platform that integrates an existing virtual globe interface with added functionality as follows; an interactive layer on top of the existing map that support real time creation and manipulation of spatial interaction objects. These objects, together with the existing information delivered through the virtual globe, form a game board that can be used for educational purposes.

  11. House Prices, Geographical Mobility, and Unemployment

    DEFF Research Database (Denmark)

    Ingholt, Marcus Mølbak

    2017-01-01

    Geographical mobility correlates positively with house prices and negatively with unemployment over the U.S. business cycle. I present a DSGE model in which declining house prices and tight credit conditions impede the mobility of indebted workers. This reduces the workers’ cross-area competition...... for jobs, causing wages and unemployment to rise. A Bayesian estimation shows that this channel more than quadruples the response of unemployment to adverse housing market shocks. The estimation also shows that adverse housing market shocks caused the decline in mobility during the Great Recession. Absent...

  12. Studying the making of geographical knowledge

    DEFF Research Database (Denmark)

    Adriansen, Hanne Kirstine; Madsen, Lene Møller

    2009-01-01

    The article addresses the issue of being a ‘double' insider when conducting interviews. Double insider means being an insider both in relation to one's research matter - in the authors' case the making of geographical knowledge - and in relation to one's interviewees - our colleagues. The article...... is a reflection paper in the sense that we reflect upon experiences drawn from a previous research project carried out in Danish academia. It is important that the project was situated in a Scandinavian workplace culture because this has bearings for the social, cultural, and economic situation in which knowledge...

  13. Cartography and Geographic Information Science in Current Contents

    Directory of Open Access Journals (Sweden)

    Nedjeljko Frančula

    2009-12-01

    Full Text Available The Cartography and Geographic Information Science (CaGIS journal was published as The American Cartographer from 1974 to 1989, after that as Cartography and Geographic Information System, and since then has been published with its current name. It is published by the Cartography and Geographic Information Society, a member of the American Congress on Surveying and Mapping.

  14. Geographic Literacy and Moral Formation among University Students

    Science.gov (United States)

    Bascom, Jonathan

    2011-01-01

    This study extends analysis of geographic literacy further by examining the relationship of geographic knowledge with the primary goal of geographic educators--cultivation of cultural understanding and moral sensitivity for global citizenry. The main aim is to examine contributors to moral formation during the university years based on a survey…

  15. Surveying and Mapping Geographical Information from the Perspective of Geography

    Directory of Open Access Journals (Sweden)

    LÜ Guonian

    2017-10-01

    Full Text Available It briefly reviewed the history of geographic information content development since the existence of geographic information system. It pointed out that the current definition of geographic information is always the extension from the "spatial+ attributes" basic mapping framework of geographic information. It is increasingly difficult to adapt to the analysis and application of spatial-temporal big data. From the perspective of geography research subject and content, it summarized systematically that the content and extension of the "geographic information" that geography needs. It put forward that a six-element expression model of geographic information, including spatial location, semantic description, attribute characteristics, geometric form, evolution process, and objects relationship.Under the guidance of the laws of geography, for geographical phenomenon of spatial distribution, temporal pattern and evolution process, the interaction mechanism of the integrated expression, system analysis and efficient management, it designed that a unified GIS data model which is expressed by six basic elements, a new GIS data structure driven by geographical rules and interaction, and key technologies of unstructured spatio-temporal data organization and storage. It provided that a theoretical basis and technical support for the shift from the surveying and mapping geographic information to the scientific geographic information, and it can help improving the organization, management, analysis and expression ability of the GIS of the geographical laws such as geographical pattern, evolution process, and interaction between elements.

  16. Dynamic management of geographic data in a virtual environment

    NARCIS (Netherlands)

    Jense, G.J.; Donkers, K.

    1996-01-01

    In order to achieve true 3D user interaction with geographic information, an interface between a virtual environment system and a geographic information system has been designed and implemented. This VE/GIS interface is based on a loose coupling of the underlying geographic database and the virtual

  17. Geographically Modified PageRank Algorithms: Identifying the Spatial Concentration of Human Movement in a Geospatial Network.

    Science.gov (United States)

    Chin, Wei-Chien-Benny; Wen, Tzai-Hung

    2015-01-01

    A network approach, which simplifies geographic settings as a form of nodes and links, emphasizes the connectivity and relationships of spatial features. Topological networks of spatial features are used to explore geographical connectivity and structures. The PageRank algorithm, a network metric, is often used to help identify important locations where people or automobiles concentrate in the geographical literature. However, geographic considerations, including proximity and location attractiveness, are ignored in most network metrics. The objective of the present study is to propose two geographically modified PageRank algorithms-Distance-Decay PageRank (DDPR) and Geographical PageRank (GPR)-that incorporate geographic considerations into PageRank algorithms to identify the spatial concentration of human movement in a geospatial network. Our findings indicate that in both intercity and within-city settings the proposed algorithms more effectively capture the spatial locations where people reside than traditional commonly-used network metrics. In comparing location attractiveness and distance decay, we conclude that the concentration of human movement is largely determined by the distance decay. This implies that geographic proximity remains a key factor in human mobility.

  18. Geographically Modified PageRank Algorithms: Identifying the Spatial Concentration of Human Movement in a Geospatial Network.

    Directory of Open Access Journals (Sweden)

    Wei-Chien-Benny Chin

    Full Text Available A network approach, which simplifies geographic settings as a form of nodes and links, emphasizes the connectivity and relationships of spatial features. Topological networks of spatial features are used to explore geographical connectivity and structures. The PageRank algorithm, a network metric, is often used to help identify important locations where people or automobiles concentrate in the geographical literature. However, geographic considerations, including proximity and location attractiveness, are ignored in most network metrics. The objective of the present study is to propose two geographically modified PageRank algorithms-Distance-Decay PageRank (DDPR and Geographical PageRank (GPR-that incorporate geographic considerations into PageRank algorithms to identify the spatial concentration of human movement in a geospatial network. Our findings indicate that in both intercity and within-city settings the proposed algorithms more effectively capture the spatial locations where people reside than traditional commonly-used network metrics. In comparing location attractiveness and distance decay, we conclude that the concentration of human movement is largely determined by the distance decay. This implies that geographic proximity remains a key factor in human mobility.

  19. Mapping the geographical distribution of lymphatic filariasis in Zambia

    DEFF Research Database (Denmark)

    Mwase, Enala T; Stensgaard, Anna-Sofie; Nsakashalo-Senkwe, Mutale

    2014-01-01

    to be an important determinant of medium-high prevalence levels. CONCLUSIONS/SIGNIFICANCE: LF was found to be surprisingly widespread in Zambia, although in most places with low prevalence. The produced maps and the identified environmental correlates of LF infection will provide useful guidance for planning...... volunteers from 108 geo-referenced survey sites across Zambia were examined for circulating filarial antigens (CFA) with rapid format ICT cards, and a map indicating the distribution of CFA prevalences in Zambia was prepared. 78% of survey sites had CFA positive cases, with prevalences ranging between 1......% and 54%. Most positive survey sites had low prevalence, but six foci with more than 15% prevalence were identified. The observed geographical variation in prevalence pattern was examined in more detail using a species distribution modeling approach to explore environmental requirements for parasite...

  20. Geographic information modeling of Econet of Northwestern Federal District territory on graph theory basis

    Science.gov (United States)

    Kopylova, N. S.; Bykova, A. A.; Beregovoy, D. N.

    2018-05-01

    Based on the landscape-geographical approach, a structural and logical scheme for the Northwestern Federal District Econet has been developed, which can be integrated into the federal and world ecological network in order to improve the environmental infrastructure of the region. The method of Northwestern Federal District Econet organization on the basis of graph theory by means of the Quantum GIS geographic information system is proposed as an effective mean of preserving and recreating the unique biodiversity of landscapes, regulation of the sphere of environmental protection.