WorldWideScience

Sample records for evaluating geographic imputation

  1. Comprehensive evaluation of imputation performance in African Americans.

    Science.gov (United States)

    Chanda, Pritam; Yuhki, Naoya; Li, Man; Bader, Joel S; Hartz, Alex; Boerwinkle, Eric; Kao, W H Linda; Arking, Dan E

    2012-07-01

    Imputation of genome-wide single-nucleotide polymorphism (SNP) arrays to a larger known reference panel of SNPs has become a standard and an essential part of genome-wide association studies. However, little is known about the behavior of imputation in African Americans with respect to the different imputation algorithms, the reference population(s) and the reference SNP panels used. Genome-wide SNP data (Affymetrix 6.0) from 3207 African American samples in the Atherosclerosis Risk in Communities Study (ARIC) was used to systematically evaluate imputation quality and yield. Imputation was performed with the imputation algorithms MACH, IMPUTE and BEAGLE using several combinations of three reference panels of HapMap III (ASW, YRI and CEU) and 1000 Genomes Project (pilot 1 YRI June 2010 release, EUR and AFR August 2010 and June 2011 releases) panels with SNP data on chromosomes 18, 20 and 22. About 10% of the directly genotyped SNPs from each chromosome were masked, and SNPs common between the reference panels were used for evaluating the imputation quality using two statistical metrics-concordance accuracy and Cohen's kappa (κ) coefficient. The dependencies of these metrics on the minor allele frequencies (MAF) and specific genotype categories (minor allele homozygotes, heterozygotes and major allele homozygotes) were thoroughly investigated to determine the best panel and method for imputation in African Americans. In addition, the power to detect imputed SNPs associated with simulated phenotypes was studied using the mean genotype of each masked SNP in the imputed data. Our results indicate that the genotype concordances after stratification into each genotype category and Cohen's κ coefficient are considerably better equipped to differentiate imputation performance compared with the traditionally used total concordance statistic, and both statistics improved with increasing MAF irrespective of the imputation method. We also find that both MACH and IMPUTE

  2. Evaluation of the imputation performance of the program IMPUTE in an admixed sample from Mexico City using several model designs

    Directory of Open Access Journals (Sweden)

    Krithika S

    2012-05-01

    Full Text Available Abstract Background We explored the imputation performance of the program IMPUTE in an admixed sample from Mexico City. The following issues were evaluated: (a the impact of different reference panels (HapMap vs. 1000 Genomes on imputation; (b potential differences in imputation performance between single-step vs. two-step (phasing and imputation approaches; (c the effect of different INFO score thresholds on imputation performance and (d imputation performance in common vs. rare markers. Methods The sample from Mexico City comprised 1,310 individuals genotyped with the Affymetrix 5.0 array. We randomly masked 5% of the markers directly genotyped on chromosome 12 (n = 1,046 and compared the imputed genotypes with the microarray genotype calls. Imputation was carried out with the program IMPUTE. The concordance rates between the imputed and observed genotypes were used as a measure of imputation accuracy and the proportion of non-missing genotypes as a measure of imputation efficacy. Results The single-step imputation approach produced slightly higher concordance rates than the two-step strategy (99.1% vs. 98.4% when using the HapMap phase II combined panel, but at the expense of a lower proportion of non-missing genotypes (85.5% vs. 90.1%. The 1,000 Genomes reference sample produced similar concordance rates to the HapMap phase II panel (98.4% for both datasets, using the two-step strategy. However, the 1000 Genomes reference sample increased substantially the proportion of non-missing genotypes (94.7% vs. 90.1%. Rare variants ( Conclusions The program IMPUTE had an excellent imputation performance for common alleles in an admixed sample from Mexico City, which has primarily Native American (62% and European (33% contributions. Genotype concordances were higher than 98.4% using all the imputation strategies, in spite of the fact that no Native American samples are present in the HapMap and 1000 Genomes reference panels. The best balance of

  3. Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

    Directory of Open Access Journals (Sweden)

    Kristin Meseck

    2016-05-01

    Full Text Available The main purpose of the present study was to assess the impact of global positioning system (GPS signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accurate and viable method for correcting GPS data loss. Accelerometer and GPS data of 782 participants from 8 studies were pooled to represent a range of lifestyles and interactions with the built environment. Periods of GPS signal lapse were identified and extracted. Generalised linear mixed models were run with the number of lapses and the length of lapses as outcomes. The signal lapses were imputed using a simple ruleset, and imputation was validated against person-worn camera imagery. A final generalised linear mixed model was used to identify the difference between the amount of GPS minutes pre- and post-imputation for the activity categories of sedentary, light, and moderate-to-vigorous physical activity. Over 17% of the dataset was comprised of GPS data lapses. No strong associations were found between increasing lapse length and number of lapses and the demographic and built environment variables. A significant difference was found between the pre- and postimputation minutes for each activity category. No demographic or environmental bias was found for length or number of lapses, but imputation of GPS data may make a significant difference for inclusion of physical activity data that occurred during a lapse. Imputing GPS data lapses is a viable technique for returning spatial context to accelerometer data and improving the completeness of the dataset.

  4. Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

    DEFF Research Database (Denmark)

    Meseck, Kristin; Jankowska, Marta M; Schipperijn, Jasper

    2016-01-01

    The main purpose of the present study was to assess the impact of global positioning system (GPS) signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accura...

  5. An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.

    Science.gov (United States)

    Liu, Yuzhe; Gopalakrishnan, Vanathi

    2017-03-01

    Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.

  6. A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation.

    Science.gov (United States)

    Välikangas, Tommi; Suomi, Tomi; Elo, Laura L

    2017-05-31

    Label-free mass spectrometry (MS) has developed into an important tool applied in various fields of biological and life sciences. Several software exist to process the raw MS data into quantified protein abundances, including open source and commercial solutions. Each software includes a set of unique algorithms for different tasks of the MS data processing workflow. While many of these algorithms have been compared separately, a thorough and systematic evaluation of their overall performance is missing. Moreover, systematic information is lacking about the amount of missing values produced by the different proteomics software and the capabilities of different data imputation methods to account for them.In this study, we evaluated the performance of five popular quantitative label-free proteomics software workflows using four different spike-in data sets. Our extensive testing included the number of proteins quantified and the number of missing values produced by each workflow, the accuracy of detecting differential expression and logarithmic fold change and the effect of different imputation and filtering methods on the differential expression results. We found that the Progenesis software performed consistently well in the differential expression analysis and produced few missing values. The missing values produced by the other software decreased their performance, but this difference could be mitigated using proper data filtering or imputation methods. Among the imputation methods, we found that the local least squares (lls) regression imputation consistently increased the performance of the software in the differential expression analysis, and a combination of both data filtering and local least squares imputation increased performance the most in the tested data sets. © The Author 2017. Published by Oxford University Press.

  7. MVIAeval: a web tool for comprehensively evaluating the performance of a new missing value imputation algorithm.

    Science.gov (United States)

    Wu, Wei-Sheng; Jhou, Meng-Jhun

    2017-01-13

    Missing value imputation is important for microarray data analyses because microarray data with missing values would significantly degrade the performance of the downstream analyses. Although many microarray missing value imputation algorithms have been developed, an objective and comprehensive performance comparison framework is still lacking. To solve this problem, we previously proposed a framework which can perform a comprehensive performance comparison of different existing algorithms. Also the performance of a new algorithm can be evaluated by our performance comparison framework. However, constructing our framework is not an easy task for the interested researchers. To save researchers' time and efforts, here we present an easy-to-use web tool named MVIAeval (Missing Value Imputation Algorithm evaluator) which implements our performance comparison framework. MVIAeval provides a user-friendly interface allowing users to upload the R code of their new algorithm and select (i) the test datasets among 20 benchmark microarray (time series and non-time series) datasets, (ii) the compared algorithms among 12 existing algorithms, (iii) the performance indices from three existing ones, (iv) the comprehensive performance scores from two possible choices, and (v) the number of simulation runs. The comprehensive performance comparison results are then generated and shown as both figures and tables. MVIAeval is a useful tool for researchers to easily conduct a comprehensive and objective performance evaluation of their newly developed missing value imputation algorithm for microarray data or any data which can be represented as a matrix form (e.g. NGS data or proteomics data). Thus, MVIAeval will greatly expedite the progress in the research of missing value imputation algorithms.

  8. Impact of pre-imputation SNP-filtering on genotype imputation results.

    Science.gov (United States)

    Roshyara, Nab Raj; Kirsten, Holger; Horn, Katrin; Ahnert, Peter; Scholz, Markus

    2014-08-12

    Imputation of partially missing or unobserved genotypes is an indispensable tool for SNP data analyses. However, research and understanding of the impact of initial SNP-data quality control on imputation results is still limited. In this paper, we aim to evaluate the effect of different strategies of pre-imputation quality filtering on the performance of the widely used imputation algorithms MaCH and IMPUTE. We considered three scenarios: imputation of partially missing genotypes with usage of an external reference panel, without usage of an external reference panel, as well as imputation of completely un-typed SNPs using an external reference panel. We first created various datasets applying different SNP quality filters and masking certain percentages of randomly selected high-quality SNPs. We imputed these SNPs and compared the results between the different filtering scenarios by using established and newly proposed measures of imputation quality. While the established measures assess certainty of imputation results, our newly proposed measures focus on the agreement with true genotypes. These measures showed that pre-imputation SNP-filtering might be detrimental regarding imputation quality. Moreover, the strongest drivers of imputation quality were in general the burden of missingness and the number of SNPs used for imputation. We also found that using a reference panel always improves imputation quality of partially missing genotypes. MaCH performed slightly better than IMPUTE2 in most of our scenarios. Again, these results were more pronounced when using our newly defined measures of imputation quality. Even a moderate filtering has a detrimental effect on the imputation quality. Therefore little or no SNP filtering prior to imputation appears to be the best strategy for imputing small to moderately sized datasets. Our results also showed that for these datasets, MaCH performs slightly better than IMPUTE2 in most scenarios at the cost of increased computing

  9. Evaluation of Multiple Imputation in Missing Data Analysis: An Application on Repeated Measurement Data in Animal Science

    Directory of Open Access Journals (Sweden)

    Gazel Ser

    2015-12-01

    Full Text Available The purpose of this study was to evaluate the performance of multiple imputation method in case that missing observation structure is at random and completely at random from the approach of general linear mixed model. The application data of study was consisted of a total 77 heads of Norduz ram lambs at 7 months of age. After slaughtering, pH values measured at five different time points were determined as dependent variable. In addition, hot carcass weight, muscle glycogen level and fasting durations were included as independent variables in the model. In the dependent variable without missing observation, two missing observation structures including Missing Completely at Random (MCAR and Missing at Random (MAR were created by deleting the observations at certain rations (10% and 25%. After that, in data sets that have missing observation structure, complete data sets were obtained using MI (multiple imputation. The results obtained by applying general linear mixed model to the data sets that were completed using MI method were compared to the results regarding complete data. In the mixed model which was applied to the complete data and MI data sets, results whose covariance structures were the same and parameter estimations and standard estimations were rather close to the complete data are obtained. As a result, in this study, it was ensured that reliable information was obtained in mixed model in case of choosing MI as imputation method in missing observation structure and rates of both cases.

  10. Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods.

    Science.gov (United States)

    Seaman, Shaun R; Bartlett, Jonathan W; White, Ian R

    2012-04-10

    Multiple imputation is often used for missing data. When a model contains as covariates more than one function of a variable, it is not obvious how best to impute missing values in these covariates. Consider a regression with outcome Y and covariates X and X2. In 'passive imputation' a value X* is imputed for X and then X2 is imputed as (X*)2. A recent proposal is to treat X2 as 'just another variable' (JAV) and impute X and X2 under multivariate normality. We use simulation to investigate the performance of three methods that can easily be implemented in standard software: 1) linear regression of X on Y to impute X then passive imputation of X2; 2) the same regression but with predictive mean matching (PMM); and 3) JAV. We also investigate the performance of analogous methods when the analysis involves an interaction, and study the theoretical properties of JAV. The application of the methods when complete or incomplete confounders are also present is illustrated using data from the EPIC Study. JAV gives consistent estimation when the analysis is linear regression with a quadratic or interaction term and X is missing completely at random. When X is missing at random, JAV may be biased, but this bias is generally less than for passive imputation and PMM. Coverage for JAV was usually good when bias was small. However, in some scenarios with a more pronounced quadratic effect, bias was large and coverage poor. When the analysis was logistic regression, JAV's performance was sometimes very poor. PMM generally improved on passive imputation, in terms of bias and coverage, but did not eliminate the bias. Given the current state of available software, JAV is the best of a set of imperfect imputation methods for linear regression with a quadratic or interaction effect, but should not be used for logistic regression.

  11. 16 CFR 1115.11 - Imputed knowledge.

    Science.gov (United States)

    2010-01-01

    ... 16 Commercial Practices 2 2010-01-01 2010-01-01 false Imputed knowledge. 1115.11 Section 1115.11... PRODUCT HAZARD REPORTS General Interpretation § 1115.11 Imputed knowledge. (a) In evaluating whether or... care to ascertain the truth of complaints or other representations. This includes the knowledge a firm...

  12. Comparing performance of modern genotype imputation methods in different ethnicities

    Science.gov (United States)

    Roshyara, Nab Raj; Horn, Katrin; Kirsten, Holger; Ahnert, Peter; Scholz, Markus

    2016-10-01

    A variety of modern software packages are available for genotype imputation relying on advanced concepts such as pre-phasing of the target dataset or utilization of admixed reference panels. In this study, we performed a comprehensive evaluation of the accuracy of modern imputation methods on the basis of the publicly available POPRES samples. Good quality genotypes were masked and re-imputed by different imputation frameworks: namely MaCH, IMPUTE2, MaCH-Minimac, SHAPEIT-IMPUTE2 and MaCH-Admix. Results were compared to evaluate the relative merit of pre-phasing and the usage of admixed references. We showed that the pre-phasing framework SHAPEIT-IMPUTE2 can overestimate the certainty of genotype distributions resulting in the lowest percentage of correctly imputed genotypes in our case. MaCH-Minimac performed better than SHAPEIT-IMPUTE2. Pre-phasing always reduced imputation accuracy. IMPUTE2 and MaCH-Admix, both relying on admixed-reference panels, showed comparable results. MaCH showed superior results if well-matched references were available (Nei’s GST ≤ 0.010). For small to medium datasets, frameworks using genetically closest reference panel are recommended if the genetic distance between target and reference data set is small. Our results are valid for small to medium data sets. As shown on a larger data set of population based German samples, the disadvantage of pre-phasing decreases for larger sample sizes.

  13. Public Undertakings and Imputability

    DEFF Research Database (Denmark)

    Ølykke, Grith Skovgaard

    2013-01-01

    Oeresund tender for the provision of passenger transport by railway. From the start, the services were provided at a loss, and in the end a part of DSBFirst was wound up. In order to frame the problems illustrated by this case, the jurisprudence-based imputability requirement in the definition of State aid...... exercised by the State, imputability to the State, and the State’s fulfilment of the Market Economy Investor Principle. Furthermore, it is examined whether, in the absence of imputability, public undertakings’ market behaviour is subject to the Market Economy Investor Principle, and it is concluded...

  14. Molgenis-impute: imputation pipeline in a box.

    Science.gov (United States)

    Kanterakis, Alexandros; Deelen, Patrick; van Dijk, Freerk; Byelas, Heorhiy; Dijkstra, Martijn; Swertz, Morris A

    2015-08-19

    Genotype imputation is an important procedure in current genomic analysis such as genome-wide association studies, meta-analyses and fine mapping. Although high quality tools are available that perform the steps of this process, considerable effort and expertise is required to set up and run a best practice imputation pipeline, particularly for larger genotype datasets, where imputation has to scale out in parallel on computer clusters. Here we present MOLGENIS-impute, an 'imputation in a box' solution that seamlessly and transparently automates the set up and running of all the steps of the imputation process. These steps include genome build liftover (liftovering), genotype phasing with SHAPEIT2, quality control, sample and chromosomal chunking/merging, and imputation with IMPUTE2. MOLGENIS-impute builds on MOLGENIS-compute, a simple pipeline management platform for submission and monitoring of bioinformatics tasks in High Performance Computing (HPC) environments like local/cloud servers, clusters and grids. All the required tools, data and scripts are downloaded and installed in a single step. Researchers with diverse backgrounds and expertise have tested MOLGENIS-impute on different locations and imputed over 30,000 samples so far using the 1,000 Genomes Project and new Genome of the Netherlands data as the imputation reference. The tests have been performed on PBS/SGE clusters, cloud VMs and in a grid HPC environment. MOLGENIS-impute gives priority to the ease of setting up, configuring and running an imputation. It has minimal dependencies and wraps the pipeline in a simple command line interface, without sacrificing flexibility to adapt or limiting the options of underlying imputation tools. It does not require knowledge of a workflow system or programming, and is targeted at researchers who just want to apply best practices in imputation via simple commands. It is built on the MOLGENIS compute workflow framework to enable customization with additional

  15. Cost reduction for web-based data imputation

    KAUST Repository

    Li, Zhixu

    2014-01-01

    Web-based Data Imputation enables the completion of incomplete data sets by retrieving absent field values from the Web. In particular, complete fields can be used as keywords in imputation queries for absent fields. However, due to the ambiguity of these keywords and the data complexity on the Web, different queries may retrieve different answers to the same absent field value. To decide the most probable right answer to each absent filed value, existing method issues quite a few available imputation queries for each absent value, and then vote on deciding the most probable right answer. As a result, we have to issue a large number of imputation queries for filling all absent values in an incomplete data set, which brings a large overhead. In this paper, we work on reducing the cost of Web-based Data Imputation in two aspects: First, we propose a query execution scheme which can secure the most probable right answer to an absent field value by issuing as few imputation queries as possible. Second, we recognize and prune queries that probably will fail to return any answers a priori. Our extensive experimental evaluation shows that our proposed techniques substantially reduce the cost of Web-based Imputation without hurting its high imputation accuracy. © 2014 Springer International Publishing Switzerland.

  16. Quality evaluation of cortex berberidis from different geographical ...

    African Journals Online (AJOL)

    Purpose: To develop an effective method for evaluating the quality of Cortex berberidis from different geographical origins. Methods: A simple, precise and accurate high performance liquid chromatography (HPLC) method was first developed for simultaneous quantification of four active alkaloids (magnoflorine, jatrorrhizine ...

  17. Missing value imputation for epistatic MAPs

    LENUS (Irish Health Repository)

    Ryan, Colm

    2010-04-20

    Abstract Background Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data. Results We identify different categories for the missing data based on their underlying cause, and show that values from the largest category can be imputed effectively. We compare local and global imputation approaches across a variety of distinct E-MAP datasets, showing that both are competitive and preferable to filling in with zeros. In addition we show that these methods are effective in an E-MAP from a different species, suggesting that pairwise imputation techniques will be increasingly useful as analogous epistasis mapping techniques are developed in different species. We show that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions. Finally we show that imputed interactions, generated using nearest neighbor methods, are enriched for annotations in the same manner as measured interactions. Therefore our method potentially

  18. Multiple Imputation of Squared Terms

    NARCIS (Netherlands)

    Vink, G.; Buuren, S. van

    2013-01-01

    We propose a new multiple imputation technique for imputing squares. Current methods yield either unbiased regression estimates or preserve data relations. No method, however, seems to deliver both, which limits researchers in the implementation of regression analysis in the presence of missing

  19. Evaluation of more demanding geographical contents regarding all educational levels

    National Research Council Canada - National Science Library

    Tatjana Resnik Planinc

    2002-01-01

    .... Further improvement of geography teaching and assurance of high quality syllabus on every level of geographical education require a defining of more demanding geographical contents and reasons...

  20. Restrictive Imputation of Incomplete Survey Data

    NARCIS (Netherlands)

    Vink, G.|info:eu-repo/dai/nl/323348793

    2015-01-01

    This dissertation focuses on finding plausible imputations when there is some restriction posed on the imputation model. In these restrictive situations, current imputation methodology does not lead to satisfactory imputations. The restrictions, and the resulting missing data problems are real-life

  1. Evaluation of more demanding geographical contents regarding all educational levels

    Directory of Open Access Journals (Sweden)

    Tatjana Resnik Planinc

    2002-12-01

    Full Text Available The article deals with a didactical problem of transfer and understanding of more demanding geographical contents which are included into geographical educational vertical. Further improvement of geography teaching and assurance of high quality syllabus on every level of geographical education require a defining of more demanding geographical contents and reasons for their difficulty, argumentation of their incorporation into syllabuses and assessment of ways and methods for dealing with them and for their adoption. The analyses of scientifically demanding geographical contents offer the origins for better organisation of contents in the vertical of geographical education and the origins for better education of future geography teachers.

  2. Multiply-Imputed Synthetic Data: Advice to the Imputer

    Directory of Open Access Journals (Sweden)

    Loong Bronwyn

    2017-12-01

    Full Text Available Several statistical agencies have started to use multiply-imputed synthetic microdata to create public-use data in major surveys. The purpose of doing this is to protect the confidentiality of respondents’ identities and sensitive attributes, while allowing standard complete-data analyses of microdata. A key challenge, faced by advocates of synthetic data, is demonstrating that valid statistical inferences can be obtained from such synthetic data for non-confidential questions. Large discrepancies between observed-data and synthetic-data analytic results for such questions may arise because of uncongeniality; that is, differences in the types of inputs available to the imputer, who has access to the actual data, and to the analyst, who has access only to the synthetic data. Here, we discuss a simple, but possibly canonical, example of uncongeniality when using multiple imputation to create synthetic data, which specifically addresses the choices made by the imputer. An initial, unanticipated but not surprising, conclusion is that non-confidential design information used to impute synthetic data should be released with the confidential synthetic data to allow users of synthetic data to avoid possible grossly conservative inferences.

  3. Objective Evaluation in an Online Geographic Information System Certificate Program

    Directory of Open Access Journals (Sweden)

    Scott L. WALKER

    2005-01-01

    Full Text Available Objective Evaluation in an Online Geographic Information System Certificate Program Asst. Professor. Dr. Scott L. WALKER Texas State University-San Marcos San Marcos, Texas, USA ABSTRACT Departmental decisions regarding distance education programs can be subject to subjective decision-making processes influenced by external factors such as strong faculty opinions or pressure to increase student enrolment. This paper outlines an evaluation of a departmental distance-education program. The evaluation utilized several methods that strived to inject objectivity in evaluation and subsequent decision-making. A rapid multi-modal approach included evaluation methods of (1 considering the online psychosocial learning environment, (2 content analyses comparing the online version of classes to face-to-face versions, (3 cost comparisons in online vs. face-to-face classes, (4 student outcomes, (5 student retention, and (6 benchmarking. These approaches offer opportunities for departmental administrators and decision-making committees to make judgments informed by facts rather than being influenced by the emotions, beliefs, or opinions of organizational dynamics.

  4. GIS in Evaluation: Utilizing the Power of Geographic Information Systems to Represent Evaluation Data

    Science.gov (United States)

    Azzam, Tarek; Robinson, David

    2013-01-01

    This article provides an introduction to geographic information systems (GIS) and how the technology can be used to enhance evaluation practice. As a tool, GIS enables evaluators to incorporate contextual features (such as accessibility of program sites or community health needs) into evaluation designs and highlights the interactions between…

  5. Multiple imputation and its application

    CERN Document Server

    Carpenter, James

    2013-01-01

    A practical guide to analysing partially observed data. Collecting, analysing and drawing inferences from data is central to research in the medical and social sciences. Unfortunately, it is rarely possible to collect all the intended data. The literature on inference from the resulting incomplete  data is now huge, and continues to grow both as methods are developed for large and complex data structures, and as increasing computer power and suitable software enable researchers to apply these methods. This book focuses on a particular statistical method for analysing and drawing inferences from incomplete data, called Multiple Imputation (MI). MI is attractive because it is both practical and widely applicable. The authors aim is to clarify the issues raised by missing data, describing the rationale for MI, the relationship between the various imputation models and associated algorithms and its application to increasingly complex data structures. Multiple Imputation and its Application: Discusses the issues ...

  6. Flexible Imputation of Missing Data

    CERN Document Server

    van Buuren, Stef

    2012-01-01

    Missing data form a problem in every scientific discipline, yet the techniques required to handle them are complicated and often lacking. One of the great ideas in statistical science--multiple imputation--fills gaps in the data with plausible values, the uncertainty of which is coded in the data itself. It also solves other problems, many of which are missing data problems in disguise. Flexible Imputation of Missing Data is supported by many examples using real data taken from the author's vast experience of collaborative research, and presents a practical guide for handling missing data unde

  7. Performance of genotype imputations using data from the 1000 Genomes Project.

    Science.gov (United States)

    Sung, Yun Ju; Wang, Lihua; Rankinen, Tuomo; Bouchard, Claude; Rao, D C

    2012-01-01

    Genotype imputations based on 1000 Genomes (1KG) Project data have the advantage of imputing many more SNPs than imputations based on HapMap data. It also provides an opportunity to discover associations with relatively rare variants. Recent investigations are increasingly using 1KG data for genotype imputations, but only limited evaluations of the performance of this approach are available. In this paper, we empirically evaluated imputation performance using 1KG data by comparing imputation results to those using the HapMap Phase II data that have been widely used. We used three reference panels: the CEU panel consisting of 120 haplotypes from HapMap II and 1KG data (June 2010 release) and the EUR panel consisting of 566 haplotypes also from 1KG data (August 2010 release). We used Illumina 324,607 autosomal SNPs genotyped in 501 individuals of European ancestry. Our most important finding was that both 1KG reference panels provided much higher imputation yield than the HapMap II panel. There were more than twice as many successfully imputed SNPs as there were using the HapMap II panel (6.7 million vs. 2.5 million). Our second most important finding was that accuracy using both 1KG panels was high and almost identical to accuracy using the HapMap II panel. Furthermore, after removing SNPs with MACH Rsq Project is still underway, we expect that later versions will provide even better imputation performance. Copyright © 2011 S. Karger AG, Basel.

  8. Imputation and quality control steps for combining multiple genome-wide datasets

    Directory of Open Access Journals (Sweden)

    Shefali S Verma

    2014-12-01

    Full Text Available The electronic MEdical Records and GEnomics (eMERGE network brings together DNA biobanks linked to electronic health records (EHRs from multiple institutions. Approximately 52,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes, and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2 were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.

  9. The Ability of Different Imputation Methods to Preserve the Significant Genes and Pathways in Cancer

    Directory of Open Access Journals (Sweden)

    Rosa Aghdam

    2017-12-01

    Full Text Available Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving significant genes and pathways. In the first step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next, 10 well-known imputation methods were applied to the complete datasets. The significance analysis of microarrays (SAM method was applied to detect the significant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving significant genes. To determine the impact of different imputation methods on the identification of important genes, the chi-squared test was used to compare the proportions of overlaps between significant genes detected from original data and those detected from the imputed datasets. Additionally, the significant genes are tested for their enrichment in important pathways, using the ConsensusPathDB. Our results showed that almost all the significant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no significant difference in the performance of various imputation methods tested. The source code and selected datasets are available on http://profiles.bs.ipm.ir/softwares/imputation_methods/.

  10. RESEARCH ON THE CONSTRUCTION METHOD OF COMPREHENSIVE EVALUATION INDEX OF GEOGRAPHIC CONDITIONS

    Directory of Open Access Journals (Sweden)

    S. H. Lv

    2016-06-01

    Full Text Available This article proposes a construction method for a comprehensive geographic conditions evaluation index based on geographic conditions survey data combined with thematic data on society, economy, ecology and population. This article constructs a three-level evaluation framework composed of index level, object level and factor level from the perspectives of ecological coordination, urban development, regional economic potential and basic public services, studies a method of acquiring all-level factor data on geographic conditions and discusses the comprehensive evaluation factor system of geographic conditions. The components of the all-level index are selected through principal component analysis, and iterative analysis is performed by innovatively setting conditions to ensure the independence of the factors and establish an evaluation factor set for geographic conditions. The weighting for the all-level index is obtained through the analytic hierarchy process resulting in the index of geographic conditions. From the perspective of geographic conditions, this article makes a dynamic and quantitative evaluation of national conditions and strengths to provide a reference basis for regional sustainable development and governmental management decisions. By using the method, this article first obtains the index of geographic conditions of Q city with comprehensive evaluation and analysis to verify the objectivity and scientific nature of the method and expand and deepen the application of survey data on geographic conditions.

  11. Assessment of genotype imputation performance using 1000 Genomes in African American studies.

    Directory of Open Access Journals (Sweden)

    Dana B Hancock

    Full Text Available Genotype imputation, used in genome-wide association studies to expand coverage of single nucleotide polymorphisms (SNPs, has performed poorly in African Americans compared to less admixed populations. Overall, imputation has typically relied on HapMap reference haplotype panels from Africans (YRI, European Americans (CEU, and Asians (CHB/JPT. The 1000 Genomes project offers a wider range of reference populations, such as African Americans (ASW, but their imputation performance has had limited evaluation. Using 595 African Americans genotyped on Illumina's HumanHap550v3 BeadChip, we compared imputation results from four software programs (IMPUTE2, BEAGLE, MaCH, and MaCH-Admix and three reference panels consisting of different combinations of 1000 Genomes populations (February 2012 release: (1 3 specifically selected populations (YRI, CEU, and ASW; (2 8 populations of diverse African (AFR or European (AFR descent; and (3 all 14 available populations (ALL. Based on chromosome 22, we calculated three performance metrics: (1 concordance (percentage of masked genotyped SNPs with imputed and true genotype agreement; (2 imputation quality score (IQS; concordance adjusted for chance agreement, which is particularly informative for low minor allele frequency [MAF] SNPs; and (3 average r2hat (estimated correlation between the imputed and true genotypes, for all imputed SNPs. Across the reference panels, IMPUTE2 and MaCH had the highest concordance (91%-93%, but IMPUTE2 had the highest IQS (81%-83% and average r2hat (0.68 using YRI+ASW+CEU, 0.62 using AFR+EUR, and 0.55 using ALL. Imputation quality for most programs was reduced by the addition of more distantly related reference populations, due entirely to the introduction of low frequency SNPs (MAF≤2% that are monomorphic in the more closely related panels. While imputation was optimized by using IMPUTE2 with reference to the ALL panel (average r2hat = 0.86 for SNPs with MAF>2%, use of the ALL

  12. Performance of genotype imputation for rare variants identified in exons and flanking regions of genes.

    Directory of Open Access Journals (Sweden)

    Li Li

    Full Text Available Genotype imputation has the potential to assess human genetic variation at a lower cost than assaying the variants using laboratory techniques. The performance of imputation for rare variants has not been comprehensively studied. We utilized 8865 human samples with high depth resequencing data for the exons and flanking regions of 202 genes and Genome-Wide Association Study (GWAS data to characterize the performance of genotype imputation for rare variants. We evaluated reference sets ranging from 100 to 3713 subjects for imputing into samples typed for the Affymetrix (500K and 6.0 and Illumina 550K GWAS panels. The proportion of variants that could be well imputed (true r(2>0.7 with a reference panel of 3713 individuals was: 31% (Illumina 550K or 25% (Affymetrix 500K with MAF (Minor Allele Frequency less than or equal 0.001, 48% or 35% with 0.0010.05. The performance for common SNPs (MAF>0.05 within exons and flanking regions is comparable to imputation of more uniformly distributed SNPs. The performance for rare SNPs (0.01imputation for extending the assessment of common variants identified in humans via targeted exon resequencing into additional samples with GWAS data, but imputation of very rare variants (MAF< = 0.005 will require reference panels with thousands of subjects.

  13. Multiple Imputations for Linear Regression Models

    OpenAIRE

    Brownstone, David

    1991-01-01

    Rubin (1987) has proposed multiple imputations as a general method for estimation in the presence of missing data. Rubin’s results only strictly apply to Bayesian models, but Schenker and Welsh (1988) directly prove the consistency  multiple imputations inference~ when there are missing values of the dependent variable in linear regression models. This paper extends and modifies Schenker and Welsh’s theorems to give conditions where multiple imputations yield consistent inferences for bo...

  14. Military-geographic evaluation of the Julian Alps area

    Directory of Open Access Journals (Sweden)

    Zvonimir Bratun

    1999-12-01

    Full Text Available The Julian Alps have been of military significance since Roman times in a military geographic sense because of its valleys, mountain passes and lines of defence on mountain ridges. They became especially important in the 19th and 20th century. The largest mountain front in World War I was located there,and evidence of that front is still visible today. The border between Italy and Yugoslavia in the heart of the Julian Alps was clearly a line of demarcation along the Soča and Sava watersheds and was reinforced with fortification, obstacles and trenches. During the Cold War, there was an ideological line of demarcation along the western edge of the Julian Alps as well. Military strategy in that area included the use of military geographic approaches in both westerly and easterly directions. After the geopolitical changes of 1991, the Julian Alps no longer had same military geographic significance in terms of Slovenian national security. Today other military activities are more important: training under mountains conditions for NATO soldiers, non-commissioned and commissioned officers takes place in the Pokljuka region and on the Triglav mountain chain. Military facilities have been taken on significance in the terms of tourism as well.

  15. Aerosol optical depth as a measure of particulate exposure using imputed censored data, and relationship with childhood asthma hospital admissions for 2004 in athens, Greece.

    Science.gov (United States)

    Higgs, Gary; Sterling, David A; Aryal, Subhash; Vemulapalli, Abhilash; Priftis, Kostas N; Sifakis, Nicolas I

    2015-01-01

    An understanding of human health implications from atmosphere exposure is a priority in both the geographic and the public health domains. The unique properties of geographic tools for remote sensing of the atmosphere offer a distinct ability to characterize and model aerosols in the urban atmosphere for evaluation of impacts on health. Asthma, as a manifestation of upper respiratory disease prevalence, is a good example of the potential interface of geographic and public health interests. The current study focused on Athens, Greece during the year of 2004 and (1) demonstrates a systemized process for aligning data obtained from satellite aerosol optical depth (AOD) with geographic location and time, (2) evaluates the ability to apply imputation methods to censored data, and (3) explores whether AOD data can be used satisfactorily to investigate the association between AOD and health impacts using an example of hospital admission for childhood asthma. This work demonstrates the ability to apply remote sensing data in the evaluation of health outcomes, that the alignment process for remote sensing data is readily feasible, and that missing data can be imputed with a sufficient degree of reliability to develop complete datasets. Individual variables demonstrated small but significant effect levels on hospital admission of children for AOD, nitrogen oxides (NOx), relative humidity (rH), temperature, smoke, and inversely for ozone. However, when applying a multivari-able model, an association with asthma hospital admissions and air quality could not be demonstrated. This work is promising and will be expanded to include additional years.

  16. Improving accuracy of rare variant imputation with a two-step imputation approach

    DEFF Research Database (Denmark)

    Kreiner-Møller, Eskil; Medina-Gomez, Carolina; Uitterlinden, André G

    2015-01-01

    not being comprehensively scrutinized. Next-generation arrays ensuring sufficient coverage together with new reference panels, as the 1000 Genomes panel, are emerging to facilitate imputation of low frequent single-nucleotide polymorphisms (minor allele frequency (MAF) two-step......, the concordance rate between calls of imputed and true genotypes was found to be significantly higher for heterozygotes (Ptwo-step approach in our setting improves imputation quality compared with traditional direct imputation noteworthy...

  17. 3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data.

    Science.gov (United States)

    Luo, Yuan; Szolovits, Peter; Dighe, Anand S; Baron, Jason M

    2017-11-30

    A key challenge in clinical data mining is that most clinical datasets contain missing data. Since many commonly used machine learning algorithms require complete datasets (no missing data), clinical analytic approaches often entail an imputation procedure to "fill in" missing data. However, although most clinical datasets contain a temporal component, most commonly used imputation methods do not adequately accommodate longitudinal time-based data. We sought to develop a new imputation algorithm, 3-dimensional multiple imputation with chained equations (3D-MICE), that can perform accurate imputation of missing clinical time series data. We extracted clinical laboratory test results for 13 commonly measured analytes (clinical laboratory tests). We imputed missing test results for the 13 analytes using 3 imputation methods: multiple imputation with chained equations (MICE), Gaussian process (GP), and 3D-MICE. 3D-MICE utilizes both MICE and GP imputation to integrate cross-sectional and longitudinal information. To evaluate imputation method performance, we randomly masked selected test results and imputed these masked results alongside results missing from our original data. We compared predicted results to measured results for masked data points. 3D-MICE performed significantly better than MICE and GP-based imputation in a composite of all 13 analytes, predicting missing results with a normalized root-mean-square error of 0.342, compared to 0.373 for MICE alone and 0.358 for GP alone. 3D-MICE offers a novel and practical approach to imputing clinical laboratory time series data. 3D-MICE may provide an additional tool for use as a foundation in clinical predictive analytics and intelligent clinical decision support.

  18. Military-geographic evaluation of the Julian Alps area

    OpenAIRE

    Zvonimir Bratun

    1999-01-01

    The Julian Alps have been of military significance since Roman times in a military geographic sense because of its valleys, mountain passes and lines of defence on mountain ridges. They became especially important in the 19th and 20th century. The largest mountain front in World War I was located there,and evidence of that front is still visible today. The border between Italy and Yugoslavia in the heart of the Julian Alps was clearly a line of demarcation along the Soča and Sava watersheds a...

  19. Standard and Robust Methods in Regression Imputation

    Science.gov (United States)

    Moraveji, Behjat; Jafarian, Koorosh

    2014-01-01

    The aim of this paper is to provide an introduction of new imputation algorithms for estimating missing values from official statistics in larger data sets of data pre-processing, or outliers. The goal is to propose a new algorithm called IRMI (iterative robust model-based imputation). This algorithm is able to deal with all challenges like…

  20. A comparison of selected parametric and imputation methods for estimating snag density and snag quality attributes

    Science.gov (United States)

    Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam

    2012-01-01

    Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ≥ 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ≥ 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification

  1. Evaluating the accuracy and effectiveness of criminal geographic profiling methods: The case of Dandora, Kenya

    NARCIS (Netherlands)

    Mburu, L; Helbich, M

    2015-01-01

    Criminal geographic profiling (CGP) prioritizes offender search, extensively reducing the resources expended in criminal investigations. The utility of CGP has, however, remained unclear when variations in environmental characteristics and offense type are introduced. This study evaluates several

  2. Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels.

    Science.gov (United States)

    Gao, Xiaoyi; Haritunians, Talin; Marjoram, Paul; McKean-Cowdin, Roberta; Torres, Mina; Taylor, Kent D; Rotter, Jerome I; Gauderman, William J; Varma, Rohit

    2012-01-01

    Genotype imputation is a vital tool in genome-wide association studies (GWAS) and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for reference panels. Multiple GWAS and meta-analyses are targeting Latinos, the most populous, and fastest growing minority group in the US. However, genotype imputation resources for Latinos are rather limited compared to individuals of European ancestry at present, largely because of the lack of good reference data. One choice of reference panel for Latinos is one derived from the population of Mexican individuals in Los Angeles contained in the HapMap Phase 3 project and the 1000 Genomes Project. However, a detailed evaluation of the quality of the imputed genotypes derived from the public reference panels has not yet been reported. Using simulation studies, the Illumina OmniExpress GWAS data from the Los Angles Latino Eye Study and the MACH software package, we evaluated the accuracy of genotype imputation in Latinos. Our results show that the 1000 Genomes Project AMR + CEU + YRI reference panel provides the highest imputation accuracy for Latinos, and that also including Asian samples in the panel can reduce imputation accuracy. We also provide the imputation accuracy for each autosomal chromosome using the 1000 Genomes Project panel for Latinos. Our results serve as a guide to future imputation based analysis in Latinos.

  3. Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels

    Directory of Open Access Journals (Sweden)

    Xiaoyi eGao

    2012-06-01

    Full Text Available Genotype imputation is a vital tool in genome-wide association studies (GWAS and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for reference panels. Multiple GWAS and meta-analyses are targeting Latinos, the most populous and fastest growing minority group in the US. However, genotype imputation resources for Latinos are rather limited compared to individuals of European ancestry at present, largely because of the lack of good reference data. One choice of reference panel for Latinos is one derived from the population of Mexican individuals in Los Angeles contained in the HapMap Phase 3 project and the 1000 Genomes Project. However, a detailed evaluation of the quality of the imputed genotypes derived from the public reference panels has not yet been reported. Using simulation studies, the Illumina OmniExpress GWAS data from the Los Angles Latino Eye Study and the MACH software package, we evaluated the accuracy of genotype imputation in Latinos. Our results show that the 1000 Genomes Project AMR+CEU+YRI reference panel provides the highest imputation accuracy for Latinos, and that also including Asian samples in the panel can reduce imputation accuracy. We also provide the imputation accuracy for each autosomal chromosome using the 1000 Genomes Project panel for Latinos. Our results serve as a guide to future imputation-based analysis in Latinos.

  4. An Imputation Model for Dropouts in Unemployment Data

    Directory of Open Access Journals (Sweden)

    Nilsson Petra

    2016-09-01

    Full Text Available Incomplete unemployment data is a fundamental problem when evaluating labour market policies in several countries. Many unemployment spells end for unknown reasons; in the Swedish Public Employment Service’s register as many as 20 percent. This leads to an ambiguity regarding destination states (employment, unemployment, retired, etc.. According to complete combined administrative data, the employment rate among dropouts was close to 50 for the years 1992 to 2006, but from 2007 the employment rate has dropped to 40 or less. This article explores an imputation approach. We investigate imputation models estimated both on survey data from 2005/2006 and on complete combined administrative data from 2005/2006 and 2011/2012. The models are evaluated in terms of their ability to make correct predictions. The models have relatively high predictive power.

  5. Genotype imputation for African Americans using data from HapMap phase II versus 1000 genomes projects.

    Science.gov (United States)

    Sung, Yun J; Gu, C Charles; Tiwari, Hemant K; Arnett, Donna K; Broeckel, Ulrich; Rao, Dabeeru C

    2012-07-01

    Genotype imputation provides imputation of untyped single nucleotide polymorphisms (SNPs) that are present on a reference panel such as those from the HapMap Project. It is popular for increasing statistical power and comparing results across studies using different platforms. Imputation for African American populations is challenging because their linkage disequilibrium blocks are shorter and also because no ideal reference panel is available due to admixture. In this paper, we evaluated three imputation strategies for African Americans. The intersection strategy used a combined panel consisting of SNPs polymorphic in both CEU and YRI. The union strategy used a panel consisting of SNPs polymorphic in either CEU or YRI. The merge strategy merged results from two separate imputations, one using CEU and the other using YRI. Because recent investigators are increasingly using the data from the 1000 Genomes (1KG) Project for genotype imputation, we evaluated both 1KG-based imputations and HapMap-based imputations. We used 23,707 SNPs from chromosomes 21 and 22 on Affymetrix SNP Array 6.0 genotyped for 1,075 HyperGEN African Americans. We found that 1KG-based imputations provided a substantially larger number of variants than HapMap-based imputations, about three times as many common variants and eight times as many rare and low-frequency variants. This higher yield is expected because the 1KG panel includes more SNPs. Accuracy rates using 1KG data were slightly lower than those using HapMap data before filtering, but slightly higher after filtering. The union strategy provided the highest imputation yield with next highest accuracy. The intersection strategy provided the lowest imputation yield but the highest accuracy. The merge strategy provided the lowest imputation accuracy. We observed that SNPs polymorphic only in CEU had much lower accuracy, reducing the accuracy of the union strategy. Our findings suggest that 1KG-based imputations can facilitate discovery of

  6. Accuracy of estimation of genomic breeding values in pigs using low-density genotypes and imputation.

    Science.gov (United States)

    Badke, Yvonne M; Bates, Ronald O; Ernst, Catherine W; Fix, Justin; Steibel, Juan P

    2014-04-16

    Genomic selection has the potential to increase genetic progress. Genotype imputation of high-density single-nucleotide polymorphism (SNP) genotypes can improve the cost efficiency of genomic breeding value (GEBV) prediction for pig breeding. Consequently, the objectives of this work were to: (1) estimate accuracy of genomic evaluation and GEBV for three traits in a Yorkshire population and (2) quantify the loss of accuracy of genomic evaluation and GEBV when genotypes were imputed under two scenarios: a high-cost, high-accuracy scenario in which only selection candidates were imputed from a low-density platform and a low-cost, low-accuracy scenario in which all animals were imputed using a small reference panel of haplotypes. Phenotypes and genotypes obtained with the PorcineSNP60 BeadChip were available for 983 Yorkshire boars. Genotypes of selection candidates were masked and imputed using tagSNP in the GeneSeek Genomic Profiler (10K). Imputation was performed with BEAGLE using 128 or 1800 haplotypes as reference panels. GEBV were obtained through an animal-centric ridge regression model using de-regressed breeding values as response variables. Accuracy of genomic evaluation was estimated as the correlation between estimated breeding values and GEBV in a 10-fold cross validation design. Accuracy of genomic evaluation using observed genotypes was high for all traits (0.65-0.68). Using genotypes imputed from a large reference panel (accuracy: R(2) = 0.95) for genomic evaluation did not significantly decrease accuracy, whereas a scenario with genotypes imputed from a small reference panel (R(2) = 0.88) did show a significant decrease in accuracy. Genomic evaluation based on imputed genotypes in selection candidates can be implemented at a fraction of the cost of a genomic evaluation using observed genotypes and still yield virtually the same accuracy. On the other side, using a very small reference panel of haplotypes to impute training animals and candidates for

  7. Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy.

    Science.gov (United States)

    Bouwman, Aniek C; Veerkamp, Roel F

    2014-10-03

    The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chicken breeders, who have to choose wisely how to spend their sequencing efforts over all the breeds or lines they evaluate. Sequence data from cattle breeds was used, because there are currently relatively many individuals from several breeds sequenced within the 1,000 Bull Genomes project. The advantage of whole-genome sequence data is that it carries the causal mutations, but the question is whether it is possible to impute the causal variants accurately. This study therefore focussed on imputation accuracy of variants with low minor allele frequency and breed specific variants. Imputation accuracy was assessed for chromosome 1 and 29 as the correlation between observed and imputed genotypes. For chromosome 1, the average imputation accuracy was 0.70 with a reference population of 20 Holstein, and increased to 0.83 when the reference population was increased by including 3 other dairy breeds with 20 animals each. When the same amount of animals from the Holstein breed were added the accuracy improved to 0.88, while adding the 3 other breeds to the reference population of 80 Holstein improved the average imputation accuracy marginally to 0.89. For chromosome 29, the average imputation accuracy was lower. Some variants benefitted from the inclusion of other breeds in the reference population, initially determined by the MAF of the variant in each breed, but even Holstein specific variants did gain imputation accuracy from the multi-breed reference population. This study shows that splitting sequencing effort over multiple breeds and combining the reference populations is a good strategy for imputation from high-density SNP panels towards whole-genome sequence when reference

  8. Imputation and variable selection in linear regression models with missing covariates.

    Science.gov (United States)

    Yang, Xiaowei; Belin, Thomas R; Boscardin, W John

    2005-06-01

    Across multiply imputed data sets, variable selection methods such as stepwise regression and other criterion-based strategies that include or exclude particular variables typically result in models with different selected predictors, thus presenting a problem for combining the results from separate complete-data analyses. Here, drawing on a Bayesian framework, we propose two alternative strategies to address the problem of choosing among linear regression models when there are missing covariates. One approach, which we call "impute, then select" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. A second strategy is to conduct Bayesian variable selection and missing data imputation simultaneously within one Gibbs sampling process, which we call "simultaneously impute and select" (SIAS). The methods are implemented and evaluated using the Bayesian procedure known as stochastic search variable selection for multivariate normal data sets, but both strategies offer general frameworks within which different Bayesian variable selection algorithms could be used for other types of data sets. A study of mental health services utilization among children in foster care programs is used to illustrate the techniques. Simulation studies show that both ITS and SIAS outperform complete-case analysis with stepwise variable selection and that SIAS slightly outperforms ITS.

  9. An Evaluation of Geographic Information Systems in Social Studies Lessons: Teachers' Views

    Science.gov (United States)

    Aladag, Elif

    2014-01-01

    The aim of this study is to evaluate the applicability of Geographic Information Systems (GIS), used increasingly in primary and secondary education across the world, in social studies lessons in Turkey. In line with this aim, 14 social studies teachers working in the province of Aydin, Turkey received a 6-hour training course about GIS during the…

  10. Aerosol Optical Depth as a Measure of Particulate Exposure Using Imputed Censored Data, and Relationship with Childhood Asthma Hospital Admissions for 2004 in Athens, Greece

    Directory of Open Access Journals (Sweden)

    Gary Higgs

    2015-01-01

    Full Text Available An understanding of human health implications from atmosphere exposure is a priority in both the geographic and the public health domains. The unique properties of geographic tools for remote sensing of the atmosphere offer a distinct ability to characterize and model aerosols in the urban atmosphere for evaluation of impacts on health. Asthma, as a manifestation of upper respiratory disease prevalence, is a good example of the potential interface of geographic and public health interests. The current study focused on Athens, Greece during the year of 2004 and (1 demonstrates a systemized process for aligning data obtained from satellite aerosol optical depth (AOD with geographic location and time, (2 evaluates the ability to apply imputation methods to censored data, and (3 explores whether AOD data can be used satisfactorily to investigate the association between AOD and health impacts using an example of hospital admission for childhood asthma. This work demonstrates the ability to apply remote sensing data in the evaluation of health outcomes, that the alignment process for remote sensing data is readily feasible, and that missing data can be imputed with a sufficient degree of reliability to develop complete datasets. Individual variables demonstrated small but significant effect levels on hospital admission of children for AOD, nitrogen oxides (NO x , relative humidity (rH, temperature, smoke, and inversely for ozone. However, when applying a multivari-able model, an association with asthma hospital admissions and air quality could not be demonstrated. This work is promising and will be expanded to include additional years.

  11. Imputation of adverse drug reactions: Causality assessment in hospitals.

    Science.gov (United States)

    Varallo, Fabiana Rossi; Planeta, Cleopatra S; Herdeiro, Maria Teresa; Mastroianni, Patricia de Carvalho

    2017-01-01

    Different algorithms have been developed to standardize the causality assessment of adverse drug reactions (ADR). Although most share common characteristics, the results of the causality assessment are variable depending on the algorithm used. Therefore, using 10 different algorithms, the study aimed to compare inter-rater and multi-rater agreement for ADR causality assessment and identify the most consistent to hospitals. Using ten causality algorithms, four judges independently assessed the first 44 cases of ADRs reported during the first year of implementation of a risk management service in a medium complexity hospital in the state of Sao Paulo (Brazil). Owing to variations in the terminology used for causality, the equivalent imputation terms were grouped into four categories: definite, probable, possible and unlikely. Inter-rater and multi-rater agreement analysis was performed by calculating the Cohen´s and Light´s kappa coefficients, respectively. None of the algorithms showed 100% reproducibility in the causal imputation. Fair inter-rater and multi-rater agreement was found. Emanuele (1984) and WHO-UMC (2010) algorithms showed a fair rate of agreement between the judges (k = 0.36). Although the ADR causality assessment algorithms were poorly reproducible, our data suggest that WHO-UMC algorithm is the most consistent for imputation in hospitals, since it allows evaluating the quality of the report. However, to improve the ability of assessing the causality using algorithms, it is necessary to include criteria for the evaluation of drug-related problems, which may be related to confounding variables that underestimate the causal association.

  12. Evaluation of Tropospheric and Ionospheric Effects on the Geographic Localization of Data Collection Platforms

    Directory of Open Access Journals (Sweden)

    C. C. Celestino

    2007-01-01

    Full Text Available The Brazilian National Institute for Space Research (INPE is operating the Brazilian Environmental Data Collection System that currently amounts to a user community of around 100 organizations and more than 700 data collection platforms installed in Brazil. This system uses the SCD-1, SCD-2, and CBERS-2 low Earth orbit satellites to accomplish the data collection services. The main system applications are hydrology, meteorology, oceanography, water quality, and others. One of the functionalities offered by this system is the geographic localization of the data collection platforms by using Doppler shifts and a batch estimator based on least-squares technique. There is a growing demand to improve the quality of the geographical location of data collection platforms for animal tracking. This work presents an evaluation of the ionospheric and tropospheric effects on the Brazilian Environmental Data Collection System transmitter geographic location. Some models of the ionosphere and troposphere are presented to simulate their impacts and to evaluate performance of the platform location algorithm. The results of the Doppler shift measurements, using the SCD-2 satellite and the data collection platform (DCP located in Cuiabá town, are presented and discussed.

  13. Many-to-Many Geographically-Embedded Flow Visualisation: An Evaluation.

    Science.gov (United States)

    Yang, Yalong; Dwyer, Tim; Goodwin, Sarah; Marriott, Kim

    2017-01-01

    Showing flows of people and resources between multiple geographic locations is a challenging visualisation problem. We conducted two quantitative user studies to evaluate different visual representations for such dense many-to-many flows. In our first study we compared a bundled node-link flow map representation and OD Maps [37] with a new visualisation we call MapTrix. Like OD Maps, MapTrix overcomes the clutter associated with a traditional flow map while providing geographic embedding that is missing in standard OD matrix representations. We found that OD Maps and MapTrix had similar performance while bundled node-link flow map representations did not scale at all well. Our second study compared participant performance with OD Maps and MapTrix on larger data sets. Again performance was remarkably similar.

  14. Model checking in multiple imputation: an overview and case study

    Directory of Open Access Journals (Sweden)

    Cattram D. Nguyen

    2017-08-01

    Full Text Available Abstract Background Multiple imputation has become very popular as a general-purpose method for handling missing data. The validity of multiple-imputation-based analyses relies on the use of an appropriate model to impute the missing values. Despite the widespread use of multiple imputation, there are few guidelines available for checking imputation models. Analysis In this paper, we provide an overview of currently available methods for checking imputation models. These include graphical checks and numerical summaries, as well as simulation-based methods such as posterior predictive checking. These model checking techniques are illustrated using an analysis affected by missing data from the Longitudinal Study of Australian Children. Conclusions As multiple imputation becomes further established as a standard approach for handling missing data, it will become increasingly important that researchers employ appropriate model checking approaches to ensure that reliable results are obtained when using this method.

  15. Model checking in multiple imputation: an overview and case study.

    Science.gov (United States)

    Nguyen, Cattram D; Carlin, John B; Lee, Katherine J

    2017-01-01

    Multiple imputation has become very popular as a general-purpose method for handling missing data. The validity of multiple-imputation-based analyses relies on the use of an appropriate model to impute the missing values. Despite the widespread use of multiple imputation, there are few guidelines available for checking imputation models. In this paper, we provide an overview of currently available methods for checking imputation models. These include graphical checks and numerical summaries, as well as simulation-based methods such as posterior predictive checking. These model checking techniques are illustrated using an analysis affected by missing data from the Longitudinal Study of Australian Children. As multiple imputation becomes further established as a standard approach for handling missing data, it will become increasingly important that researchers employ appropriate model checking approaches to ensure that reliable results are obtained when using this method.

  16. Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.

    Science.gov (United States)

    Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A

    2016-01-01

    Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.

  17. Clustering with Missing Values: No Imputation Required

    Science.gov (United States)

    Wagstaff, Kiri

    2004-01-01

    Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.

  18. Imputation approaches for animal movement modeling

    Science.gov (United States)

    Scharf, Henry; Hooten, Mevin B.; Johnson, Devin S.

    2017-01-01

    The analysis of telemetry data is common in animal ecological studies. While the collection of telemetry data for individual animals has improved dramatically, the methods to properly account for inherent uncertainties (e.g., measurement error, dependence, barriers to movement) have lagged behind. Still, many new statistical approaches have been developed to infer unknown quantities affecting animal movement or predict movement based on telemetry data. Hierarchical statistical models are useful to account for some of the aforementioned uncertainties, as well as provide population-level inference, but they often come with an increased computational burden. For certain types of statistical models, it is straightforward to provide inference if the latent true animal trajectory is known, but challenging otherwise. In these cases, approaches related to multiple imputation have been employed to account for the uncertainty associated with our knowledge of the latent trajectory. Despite the increasing use of imputation approaches for modeling animal movement, the general sensitivity and accuracy of these methods have not been explored in detail. We provide an introduction to animal movement modeling and describe how imputation approaches may be helpful for certain types of models. We also assess the performance of imputation approaches in two simulation studies. Our simulation studies suggests that inference for model parameters directly related to the location of an individual may be more accurate than inference for parameters associated with higher-order processes such as velocity or acceleration. Finally, we apply these methods to analyze a telemetry data set involving northern fur seals (Callorhinus ursinus) in the Bering Sea. Supplementary materials accompanying this paper appear online.

  19. Genotype Imputation with Thousands of Genomes

    Science.gov (United States)

    Howie, Bryan; Marchini, Jonathan; Stephens, Matthew

    2011-01-01

    Genotype imputation is a statistical technique that is often used to increase the power and resolution of genetic association studies. Imputation methods work by using haplotype patterns in a reference panel to predict unobserved genotypes in a study dataset, and a number of approaches have been proposed for choosing subsets of reference haplotypes that will maximize accuracy in a given study population. These panel selection strategies become harder to apply and interpret as sequencing efforts like the 1000 Genomes Project produce larger and more diverse reference sets, which led us to develop an alternative framework. Our approach is built around a new approximation that uses local sequence similarity to choose a custom reference panel for each study haplotype in each region of the genome. This approximation makes it computationally efficient to use all available reference haplotypes, which allows us to bypass the panel selection step and to improve accuracy at low-frequency variants by capturing unexpected allele sharing among populations. Using data from HapMap 3, we show that our framework produces accurate results in a wide range of human populations. We also use data from the Malaria Genetic Epidemiology Network (MalariaGEN) to provide recommendations for imputation-based studies in Africa. We demonstrate that our approximation improves efficiency in large, sequence-based reference panels, and we discuss general computational strategies for modern reference datasets. Genome-wide association studies will soon be able to harness the power of thousands of reference genomes, and our work provides a practical way for investigators to use this rich information. New methodology from this study is implemented in the IMPUTE2 software package. PMID:22384356

  20. GIS-Based Evaluation of Spatial Interactions by Geographic Disproportionality of Industrial Diversity

    Directory of Open Access Journals (Sweden)

    Jemyung Lee

    2017-11-01

    Full Text Available Diversity of regional industry is regarded as a key factor for regional development, as it has a positive relationship with economic stability, which attracts population. This paper focuses on how the spatial imbalance of industrial diversity contributes to the population change caused by inter-regional migration. This paper introduces a spatial interaction model for the Geographic Information System (GIS-based simulation of the spatial interactions to evaluate the demographic attraction force. The proposed model adopts the notions of gravity, entropy, and virtual work. An industrial classification by profit level is introduced and its diversity is quantified with the entropy of information theory. The introduced model is applied to the cases of 207 regions in South Korea. Spatial interactions are simulated with an optimized model and their resultant forces, the demographic attraction forces, are compared with observed net migration for verification. The results show that the evaluated attraction forces from industrial diversity have a very significant, positive, and moderate relationship with net migration, while other conventional factors of industry, population, economy, and the job market do not. This paper concludes that the geographical quality of industrial diversity has positive and significant effects on population change by migration.

  1. Data driven estimation of imputation error-a strategy for imputation with a reject option

    DEFF Research Database (Denmark)

    Bak, Nikolaj; Hansen, Lars Kai

    2016-01-01

    to be a practical approach to help users using imputation after the informed choice to impute the missing data has been made. To do this all patterns of missing values are simulated in all complete cases, enabling calculation of the "true error" in each of these new cases. The error is then estimated for each case....... The effect of threshold can be estimated using the complete cases. The user can set an a priori relevant threshold for what is acceptable or use cross validation with the final analysis to choose the threshold. The choice can be presented along with argumentation for the choice rather than holding...

  2. The rise of multiple imputation: a review of the reporting and implementation of the method in medical research.

    Science.gov (United States)

    Hayati Rezvan, Panteha; Lee, Katherine J; Simpson, Julie A

    2015-04-07

    Missing data are common in medical research, which can lead to a loss in statistical power and potentially biased results if not handled appropriately. Multiple imputation (MI) is a statistical method, widely adopted in practice, for dealing with missing data. Many academic journals now emphasise the importance of reporting information regarding missing data and proposed guidelines for documenting the application of MI have been published. This review evaluated the reporting of missing data, the application of MI including the details provided regarding the imputation model, and the frequency of sensitivity analyses within the MI framework in medical research articles. A systematic review of articles published in the Lancet and New England Journal of Medicine between January 2008 and December 2013 in which MI was implemented was carried out. We identified 103 papers that used MI, with the number of papers increasing from 11 in 2008 to 26 in 2013. Nearly half of the papers specified the proportion of complete cases or the proportion with missing data by each variable. In the majority of the articles (86%) the imputed variables were specified. Of the 38 papers (37%) that stated the method of imputation, 20 used chained equations, 8 used multivariate normal imputation, and 10 used alternative methods. Very few articles (9%) detailed how they handled non-normally distributed variables during imputation. Thirty-nine papers (38%) stated the variables included in the imputation model. Less than half of the papers (46%) reported the number of imputations, and only two papers compared the distribution of imputed and observed data. Sixty-six papers presented the results from MI as a secondary analysis. Only three articles carried out a sensitivity analysis following MI to assess departures from the missing at random assumption, with details of the sensitivity analyses only provided by one article. This review outlined deficiencies in the documenting of missing data and the

  3. Evaluating the ecotourism potentials of Naharkhoran area in Gorgan using remote sensing and geographic information system

    Science.gov (United States)

    Oladi, Jafar; Bozorgnia, Delavar

    2010-10-01

    Ecotourism may be defined as voluntary travels to intact natural areas in order to enjoy the natural attractions as well as to get familiar with the culture of local communities. The main factor contributing to inappropriate land uses and natural resource destruction is overaggregation of ecotourists in some specific natural areas such as forests and rangelands; while other parts remain unvisited due to the lack of a proper propagation about those areas. Evaluating the ecotourism potentials of each area would lead to a wider participation of local people in natural resource conservation activities. In order to properly introduce the ecotourism potential areas, at first, we carried out land preparation practices using Geographic Information System (GIS) and Remote Sensing (RS) techniques; then, the maps of height, slope and orientation were produced using the digital elevation model (DEM) of the study area. Afterwards, we overlaid these maps and the ecotourism potential areas were identified on the map. These specified areas were classified into two land uses of mass and alternative ecotourism, with three subclasses (including class1, class2 and an inappropriate class) considered for each land use. To classify the image, the training areas determined on the ground using a GPS device (Ground Positioning System) were transferred on the RS image. Subsequently, the ecotourism potential areas were determined using a hybrid method. At the final phase, these areas were compared with the areas determined on the ecotourism potential map; as a result of this comparison, the overlaid ecotourism potential areas were distinguished on the Geographic information System.

  4. Comparison of Imputation Methods for Handling Missing Categorical Data with Univariate Pattern // Una comparación de métodos de imputación de variables categóricas con patrón univariado

    OpenAIRE

    Torres Munguía, Juan Armando

    2014-01-01

    This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico) was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random fores...

  5. Coefficient shifts in geographical ecology: an empirical evaluation of spatial and non-spatial regression

    DEFF Research Database (Denmark)

    Bini, L. M.; Diniz-Filho, J. A. F.; Rangel, T. F. L. V. B.

    2009-01-01

    by regression coefficients, can shift depending on whether spatially explicit or non-spatial modeling is used. However, the extent to which coefficients may shift and why shifts occur are unclear. Here, we analyze the relationship between environmental predictors and the geographical distribution of species...... richness, body size, range size and abundance in 97 multi-factorial data sets. Our goal was to compare standardized partial regression coefficients of non-spatial ordinary least squares regressions (i.e. models fitted using ordinary least squares without taking autocorrelation into account; "OLS models......" hereafter) and eight spatial methods to evaluate the frequency of coefficient shifts and identify characteristics of data that might predict when shifts are likely. We generated three metrics of coefficient shifts and eight characteristics of the data sets as predictors of shifts. Typical of ecological data...

  6. On combining reference data to improve imputation accuracy.

    Directory of Open Access Journals (Sweden)

    Jun Chen

    Full Text Available Genotype imputation is an important tool in human genetics studies, which uses reference sets with known genotypes and prior knowledge on linkage disequilibrium and recombination rates to infer un-typed alleles for human genetic variations at a low cost. The reference sets used by current imputation approaches are based on HapMap data, and/or based on recently available next-generation sequencing (NGS data such as data generated by the 1000 Genomes Project. However, with different coverage and call rates for different NGS data sets, how to integrate NGS data sets of different accuracy as well as previously available reference data as references in imputation is not an easy task and has not been systematically investigated. In this study, we performed a comprehensive assessment of three strategies on using NGS data and previously available reference data in genotype imputation for both simulated data and empirical data, in order to obtain guidelines for optimal reference set construction. Briefly, we considered three strategies: strategy 1 uses one NGS data as a reference; strategy 2 imputes samples by using multiple individual data sets of different accuracy as independent references and then combines the imputed samples with samples based on the high accuracy reference selected when overlapping occurs; and strategy 3 combines multiple available data sets as a single reference after imputing each other. We used three software (MACH, IMPUTE2 and BEAGLE for assessing the performances of these three strategies. Our results show that strategy 2 and strategy 3 have higher imputation accuracy than strategy 1. Particularly, strategy 2 is the best strategy across all the conditions that we have investigated, producing the best accuracy of imputation for rare variant. Our study is helpful in guiding application of imputation methods in next generation association analyses.

  7. On mining incomplete medical datasets: Ordering imputation and classification.

    Science.gov (United States)

    Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong; Hu, Ya-Han

    2015-01-01

    To collect medical datasets, it is usually the case that a number of data samples contain some missing values. Performing the data mining task over the incomplete datasets is a difficult problem. In general, missing value imputation can be approached, which aims at providing estimations for missing values by reasoning from the observed data. Consequently, the effectiveness of missing value imputation is heavily dependent on the observed data (or complete data) in the incomplete datasets. In this paper, the research objective is to perform instance selection to filter out some noisy data (or outliers) from a given (complete) dataset to see its effect on the final imputation result. Specifically, four different processes of combining instance selection and missing value imputation are proposed and compared in terms of data classification. Experiments are conducted based on 11 medical related datasets containing categorical, numerical, and mixed attribute types of data. In addition, missing values for each dataset are introduced into all attributes (the missing data rates are 10%, 20%, 30%, 40%, and 50%). For instance selection and missing value imputation, the DROP3 and k-nearest neighbor imputation methods are employed. On the other hand, the support vector machine (SVM) classifier is used to assess the final classification accuracy of the four different processes. The experimental results show that the second process by performing instance selection first and imputation second allows the SVM classifiers to outperform the other processes. For incomplete medical datasets containing some missing values, it is necessary to perform missing value imputation. In this paper, we demonstrate that instance selection can be used to filter out some noisy data or outliers before the imputation process. In other words, the observed data for missing value imputation may contain some noisy information, which can degrade the quality of the imputation result as well as the

  8. Traffic Speed Data Imputation Method Based on Tensor Completion

    Directory of Open Access Journals (Sweden)

    Bin Ran

    2015-01-01

    Full Text Available Traffic speed data plays a key role in Intelligent Transportation Systems (ITS; however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS. In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC, an efficient tensor completion method, is employed to estimate the missing traffic speed data. This proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. The proposed method is evaluated on Performance Measurement System (PeMS database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches.

  9. Missing value imputation in DNA microarrays based on conjugate gradient method.

    Science.gov (United States)

    Dorri, Fatemeh; Azmi, Paeiz; Dorri, Faezeh

    2012-02-01

    Analysis of gene expression profiles needs a complete matrix of gene array values; consequently, imputation methods have been suggested. In this paper, an algorithm that is based on conjugate gradient (CG) method is proposed to estimate missing values. k-nearest neighbors of the missed entry are first selected based on absolute values of their Pearson correlation coefficient. Then a subset of genes among the k-nearest neighbors is labeled as the best similar ones. CG algorithm with this subset as its input is then used to estimate the missing values. Our proposed CG based algorithm (CGimpute) is evaluated on different data sets. The results are compared with sequential local least squares (SLLSimpute), Bayesian principle component analysis (BPCAimpute), local least squares imputation (LLSimpute), iterated local least squares imputation (ILLSimpute) and adaptive k-nearest neighbors imputation (KNNKimpute) methods. The average of normalized root mean squares error (NRMSE) and relative NRMSE in different data sets with various missing rates shows CGimpute outperforms other methods. Copyright © 2011 Elsevier Ltd. All rights reserved.

  10. Enlargement of Traffic Information Coverage Area Using Selective Imputation of Floating Car Data

    Science.gov (United States)

    Kumagai, Masatoshi; Hiruta, Tomoaki; Fushiki, Takumi; Yokota, Takayoshi

    This paper discusses a real-time imputation method for sparse floating car data (FCD.) Floating cars are effective way to collect traffic information; however, because of the limitation of the number of floating cars, there is a large amount of missing data with FCD. In an effort to address this problem, we previously proposed a new imputation method based on feature space projection. The method consists of three major processes: (i) determination of a feature space from past FCD history; (ii) feature space projection of current FCD; and (iii) estimation of missing data performed by inverse projection from the feature space. Since estimation is achieved on each feature space axis that represents the spatial correlated component of FCD, it performs an accurate imputation and enlarges information coverage area. However, correlation difference among multiple road-links sometimes causes a trade-off problem between the accuracy and the coverage. Therefore, we developed an additional function in order to filter the road-links that have low correlation with the others. The function uses spectral factorization as filtering index, which is suitable to evaluate the correlation on the multidimensional feature space. Combination use of the imputation method and the filtering function decreases maximum estimation error-rate from 0.39 to 0.24, keeping 60% coverage area against sparse FCD of 15% observations.

  11. How to Improve Postgenomic Knowledge Discovery Using Imputation

    Directory of Open Access Journals (Sweden)

    Coppel Ross

    2009-01-01

    Full Text Available While microarrays make it feasible to rapidly investigate many complex biological problems, their multistep fabrication has the proclivity for error at every stage. The standard tactic has been to either ignore or regard erroneous gene readings as missing values, though this assumption can exert a major influence upon postgenomic knowledge discovery methods like gene selection and gene regulatory network (GRN reconstruction. This has been the catalyst for a raft of new flexible imputation algorithms including local least square impute and the recent heuristic collateral missing value imputation, which exploit the biological transactional behaviour of functionally correlated genes to afford accurate missing value estimation. This paper examines the influence of missing value imputation techniques upon postgenomic knowledge inference methods with results for various algorithms consistently corroborating that instead of ignoring missing values, recycling microarray data by flexible and robust imputation can provide substantial performance benefits for subsequent downstream procedures.

  12. How to Improve Postgenomic Knowledge Discovery Using Imputation

    Directory of Open Access Journals (Sweden)

    2009-02-01

    Full Text Available While microarrays make it feasible to rapidly investigate many complex biological problems, their multistep fabrication has the proclivity for error at every stage. The standard tactic has been to either ignore or regard erroneous gene readings as missing values, though this assumption can exert a major influence upon postgenomic knowledge discovery methods like gene selection and gene regulatory network (GRN reconstruction. This has been the catalyst for a raft of new flexible imputation algorithms including local least square impute and the recent heuristic collateral missing value imputation, which exploit the biological transactional behaviour of functionally correlated genes to afford accurate missing value estimation. This paper examines the influence of missing value imputation techniques upon postgenomic knowledge inference methods with results for various algorithms consistently corroborating that instead of ignoring missing values, recycling microarray data by flexible and robust imputation can provide substantial performance benefits for subsequent downstream procedures.

  13. A Review On Missing Value Estimation Using Imputation Algorithm

    Science.gov (United States)

    Armina, Roslan; Zain, Azlan Mohd; Azizah Ali, Nor; Sallehuddin, Roselina

    2017-09-01

    The presence of the missing value in the data set has always been a major problem for precise prediction. The method for imputing missing value needs to minimize the effect of incomplete data sets for the prediction model. Many algorithms have been proposed for countermeasure of missing value problem. In this review, we provide a comprehensive analysis of existing imputation algorithm, focusing on the technique used and the implementation of global or local information of data sets for missing value estimation. In addition validation method for imputation result and way to measure the performance of imputation algorithm also described. The objective of this review is to highlight possible improvement on existing method and it is hoped that this review gives reader better understanding of imputation method trend.

  14. Imputing Gene Expression in Uncollected Tissues Within and Beyond GTEx

    Science.gov (United States)

    Wang, Jiebiao; Gamazon, Eric R.; Pierce, Brandon L.; Stranger, Barbara E.; Im, Hae Kyung; Gibbons, Robert D.; Cox, Nancy J.; Nicolae, Dan L.; Chen, Lin S.

    2016-01-01

    Gene expression and its regulation can vary substantially across tissue types. In order to generate knowledge about gene expression in human tissues, the Genotype-Tissue Expression (GTEx) program has collected transcriptome data in a wide variety of tissue types from post-mortem donors. However, many tissue types are difficult to access and are not collected in every GTEx individual. Furthermore, in non-GTEx studies, the accessibility of certain tissue types greatly limits the feasibility and scale of studies of multi-tissue expression. In this work, we developed multi-tissue imputation methods to impute gene expression in uncollected or inaccessible tissues. Via simulation studies, we showed that the proposed methods outperform existing imputation methods in multi-tissue expression imputation and that incorporating imputed expression data can improve power to detect phenotype-expression correlations. By analyzing data from nine selected tissue types in the GTEx pilot project, we demonstrated that harnessing expression quantitative trait loci (eQTLs) and tissue-tissue expression-level correlations can aid imputation of transcriptome data from uncollected GTEx tissues. More importantly, we showed that by using GTEx data as a reference, one can impute expression levels in inaccessible tissues in non-GTEx expression studies. PMID:27040689

  15. Using geographical information system for spatial evaluation of canine extruded disc herniation

    Directory of Open Access Journals (Sweden)

    Constantin Daraban

    2014-11-01

    Full Text Available Disc herniation is one of the most common pathologies of the vertebral column in dogs. The aim of this study was to develop a geographical information system (GIS-based vertebral canal (VC map useful for spatial evaluation of extruded disc herniation (EDH in dogs. ArcGIS® was used to create two-dimensional and three-dimensional maps, in which the VC surface is divided into polygons by lines representing latitude and longitude. Actual locations and directions of the herniated disc material were assessed by a series of 142 computer tomographies of dogs collected between 2005 and 2013. Most EDHs were located on the cervical and transitional regions (thoraco-lumbar and lumbo-sacral and shown at the level of the ven- tro-cranial and ventro-central polygons created. Choropleth maps, highlighting the distribution and the location/direction patterns of the EDHs throughout the VC, were produced based on the frequency of the ailment. GIS proved to be a valuable tool in analysing EDH in dogs. Further studies are required for biomechanical analysis of EDH patterns.

  16. Freedom of the Will and Legal Imputability in Schopenhauer

    Directory of Open Access Journals (Sweden)

    Renato César Cardoso

    2015-12-01

    Full Text Available The present article aims to analyze Arthur Schopenhauer's criticism of the postulation that freedom of the will is the condition of possibility of legal imputability. According to the philosopher, an intellectually determinable will, not an unconditioned will, is what would be the true enabler of state imputability. In conclusion, we argue that it is with the potential of change of the agent, and not with the culpability, that society and the state should be concerned. This means that, according to Schopenhauer, an alternative and deterministic conception like yours, contrary to what is often said, does not compromise, but enhances the imputability, which is why there is nothing to fear.

  17. Recovery of information from multiple imputation: a simulation study

    Directory of Open Access Journals (Sweden)

    Lee Katherine J

    2012-06-01

    Full Text Available Abstract Background Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question of interest, or whether potential gains may be offset by bias from a poorly fitting imputation model, particularly as the amount of missing data increases. Methods Simulated datasets (n = 1000 drawn from a synthetic population were used to explore information recovery from multiple imputation in estimating the coefficient of a binary exposure variable when various proportions of data (10-90% were set missing at random in a highly-skewed continuous covariate or in the binary exposure. Imputation was performed using multivariate normal imputation (MVNI, with a simple or zero-skewness log transformation to manage non-normality. Bias, precision, mean-squared error and coverage for a set of regression parameter estimates were compared between multiple imputation and complete case analyses. Results For missingness in the continuous covariate, multiple imputation produced less bias and greater precision for the effect of the binary exposure variable, compared with complete case analysis, with larger gains in precision with more missing data. However, even with only moderate missingness, large bias and substantial under-coverage were apparent in estimating the continuous covariate’s effect when skewness was not adequately addressed. For missingness in the binary covariate, all estimates had negligible bias but gains in precision from multiple imputation were minimal, particularly for the coefficient of the binary exposure. Conclusions Although multiple imputation can be useful if covariates required for confounding adjustment are missing, benefits are likely to be minimal when data are missing in the exposure variable of interest. Furthermore, when there are large amounts of missingness, multiple

  18. Multiple Imputation by Chained Equations (MICE: Implementation in Stata

    Directory of Open Access Journals (Sweden)

    Patrick Royston

    2011-12-01

    Full Text Available Missing data are a common occurrence in real datasets. For epidemiological and prognostic factors studies in medicine, multiple imputation is becoming the standard route to estimating models with missing covariate data under a missing-at-random assumption. We describe ice, an implementation in Stata of the MICE approach to multiple imputation. Real data from an observational study in ovarian cancer are used to illustrate the most important of the many options available with ice. We remark brie y on the new databasearchitecture and procedures for multiple imputation introduced in releases 11 and 12 of Stata.

  19. Methods and Strategies to Impute Missing Genotypes for Improving Genomic Prediction

    DEFF Research Database (Denmark)

    Ma, Peipei

    for improving genomic prediction. The results indicate the IMPUTE2 and Beagle are accurate imputation methods, while Fimpute is a good alternative for routine imputation with large data set. Genotypes of non-genotyped animals can be accurately imputed if they have genotyped porgenies. A combined reference...

  20. Imputation by PLS regression for generalized linear mixed models

    OpenAIRE

    Guyon, Emilie; Pommeret, Denys

    2011-01-01

    The problem of handling missing data in generalized linear mixed models with correlated covariates is considered when the missing mechanism concerns both the response variable and the covariates. An imputation algorithm combining multiple imputation and Partial Least Squares (PLS) regression is proposed. The method relies on two steps. In a first step, using a linearization technique, the generalized linear mixed model is approximated by a linear mixed model. A latent variable is introduced a...

  1. Partial F-tests with multiply imputed data in the linear regression framework via coefficient of determination.

    Science.gov (United States)

    Chaurasia, Ashok; Harel, Ofer

    2015-02-10

    Tests for regression coefficients such as global, local, and partial F-tests are common in applied research. In the framework of multiple imputation, there are several papers addressing tests for regression coefficients. However, for simultaneous hypothesis testing, the existing methods are computationally intensive because they involve calculation with vectors and (inversion of) matrices. In this paper, we propose a simple method based on the scalar entity, coefficient of determination, to perform (global, local, and partial) F-tests with multiply imputed data. The proposed method is evaluated using simulated data and applied to suicide prevention data. Copyright © 2014 John Wiley & Sons, Ltd.

  2. Haplotype variation and genotype imputation in African populations

    Science.gov (United States)

    Huang, Lucy; Jakobsson, Mattias; Pemberton, Trevor J.; Ibrahim, Muntaser; Nyambo, Thomas; Omar, Sabah; Pritchard, Jonathan K.; Tishkoff, Sarah A.; Rosenberg, Noah A.

    2013-01-01

    Sub-Saharan Africa has been identified as the part of the world with the greatest human genetic diversity. This high level of diversity causes difficulties for genome-wide association (GWA) studies in African populations—for example, by reducing the accuracy of genotype imputation in African populations compared to non-African populations. Here, we investigate haplotype variation and imputation in Africa, using 253 unrelated individuals from 15 Sub-Saharan African populations. We identify the populations that provide the greatest potential for serving as reference panels for imputing genotypes in the remaining groups. Considering reference panels comprising samples of recent African descent in Phase 3 of the HapMap Project, we identify mixtures of reference groups that produce the maximal imputation accuracy in each of the sampled populations. We find that optimal HapMap mixtures and maximal imputation accuracies identified in detailed tests of imputation procedures can instead be predicted by using simple summary statistics that measure relationships between the pattern of genetic variation in a target population and the patterns in potential reference panels. Our results provide an empirical basis for facilitating the selection of reference panels in GWA studies of diverse human populations, especially those of African ancestry. Genet. Epidemiol. 35:766–780, 2011. PMID:22125220

  3. A web-based approach to data imputation

    KAUST Repository

    Li, Zhixu

    2013-10-24

    In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Moreover, several optimization techniques are also proposed to reduce the cost of estimating the confidence of imputation queries at both the tuple-level and the database-level. Experiments based on several real-world data collections demonstrate not only the effectiveness of WebPut compared to existing approaches, but also the efficiency of our proposed algorithms and optimization techniques. © 2013 Springer Science+Business Media New York.

  4. A SPATIOTEMPORAL APPROACH FOR HIGH RESOLUTION TRAFFIC FLOW IMPUTATION

    Energy Technology Data Exchange (ETDEWEB)

    Han, Lee [University of Tennessee, Knoxville (UTK); Chin, Shih-Miao [ORNL; Hwang, Ho-Ling [ORNL

    2016-01-01

    Along with the rapid development of Intelligent Transportation Systems (ITS), traffic data collection technologies have been evolving dramatically. The emergence of innovative data collection technologies such as Remote Traffic Microwave Sensor (RTMS), Bluetooth sensor, GPS-based Floating Car method, automated license plate recognition (ALPR) (1), etc., creates an explosion of traffic data, which brings transportation engineering into the new era of Big Data. However, despite the advance of technologies, the missing data issue is still inevitable and has posed great challenges for research such as traffic forecasting, real-time incident detection and management, dynamic route guidance, and massive evacuation optimization, because the degree of success of these endeavors depends on the timely availability of relatively complete and reasonably accurate traffic data. A thorough literature review suggests most current imputation models, if not all, focus largely on the temporal nature of the traffic data and fail to consider the fact that traffic stream characteristics at a certain location are closely related to those at neighboring locations and utilize these correlations for data imputation. To this end, this paper presents a Kriging based spatiotemporal data imputation approach that is able to fully utilize the spatiotemporal information underlying in traffic data. Imputation performance of the proposed approach was tested using simulated scenarios and achieved stable imputation accuracy. Moreover, the proposed Kriging imputation model is more flexible compared to current models.

  5. The Relative Impacts of Design Effects and Multiple Imputation on Variance Estimates: A Case Study with the 2008 National Ambulatory Medical Care Survey

    Directory of Open Access Journals (Sweden)

    Lewis Taylor

    2014-03-01

    Full Text Available The National Ambulatory Medical Care Survey collects data on office-based physician care from a nationally representative, multistage sampling scheme where the ultimate unit of analysis is a patient-doctor encounter. Patient race, a commonly analyzed demographic, has been subject to a steadily increasing item nonresponse rate. In 1999, race was missing for 17 percent of cases; by 2008, that figure had risen to 33 percent. Over this entire period, single imputation has been the compensation method employed. Recent research at the National Center for Health Statistics evaluated multiply imputing race to better represent the missing-data uncertainty. Given item nonresponse rates of 30 percent or greater, we were surprised to find many estimates’ ratios of multiple-imputation to single-imputation estimated standard errors close to 1. A likely explanation is that the design effects attributable to the complex sample design largely outweigh any increase in variance attributable to missing-data uncertainty.

  6. Missing Data Imputation of Solar Radiation Data under Different Atmospheric Conditions

    Directory of Open Access Journals (Sweden)

    Concepción Crespo Turrado

    2014-10-01

    Full Text Available Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE. This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW and Multiple Linear Regression (MLR. The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW.

  7. [Medical expert reports in chest disease; the question of imputability of death].

    Science.gov (United States)

    Martinet, Y

    2011-05-01

    In the course of an investigation, judicial or not, the expert opinion encompasses several questions of a different nature, including the following one « did the patient die of a disease he/she was supposed to suffer from at time of death? » Based on a personal experience over one year in 2008, the goal of this paper is to tackle this question of imputability, which was asked in respect of 12 investigations, including ten of occupational diseases, one of nosocomial infection and one iatrogenic accident. Only two autopsies were carried out; one autopsy refusal was reported. In five out of 12 cases, the imputability of death related to an occupational disease or an iatrogenic accident was considered by the expert to be certain in one case, very probable in two cases, and possible in two cases; in seven out to 12 cases, imputability of death was unlikely, since the cause of death was unknown in two cases, or was not the suggested cause in five cases. The discussion considers several arguments that can help answer this question: evaluation of the vital prognosis of the disease, the importance of the quality of medical records, the contributions and limits of autopsy findings, deaths that result from multiple causes, and the concept of aggravating circumstances. Copyright © 2011 SPLF. Published by Elsevier Masson SAS. All rights reserved.

  8. Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes

    Directory of Open Access Journals (Sweden)

    Lotz Meredith J

    2008-01-01

    Full Text Available Abstract Background Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method has its own advantages, but the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of eight current imputation methods on multiple types of microarray experiments, including time series, multiple exposures, and multiple exposures × time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data set. Results We found that the optimal imputation algorithms (LSA, LLS, and BPCA are all highly competitive with each other, and that no method is uniformly superior in all the data sets we examined. The success of each method can also depend on the underlying "complexity" of the expression data, where we take complexity to indicate the difficulty in mapping the gene expression matrix to a lower-dimensional subspace. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS scheme is useful for selecting an appropriate imputation algorithm. We further propose a simulation-based self-training selection (STS scheme. This technique has been used previously for microarray data imputation, but for different purposes. The scheme selects the optimal or near-optimal method with high accuracy but at an increased computational cost. Conclusion Our findings provide insight into the problem of which imputation method is optimal for a given data set. Three top-performing methods (LSA, LLS and BPCA are competitive with each other. Global-based imputation methods (PLS, SVD, BPCA

  9. The Utility of Nonparametric Transformations for Imputation of Survey Data

    Directory of Open Access Journals (Sweden)

    Robbins Michael W.

    2014-12-01

    Full Text Available Missing values present a prevalent problem in the analysis of establishment survey data. Multivariate imputation algorithms (which are used to fill in missing observations tend to have the common limitation that imputations for continuous variables are sampled from Gaussian distributions. This limitation is addressed here through the use of robust marginal transformations. Specifically, kernel-density and empirical distribution-type transformations are discussed and are shown to have favorable properties when used for imputation of complex survey data. Although such techniques have wide applicability (i.e., they may be easily applied in conjunction with a wide array of imputation techniques, the proposed methodology is applied here with an algorithm for imputation in the USDA’s Agricultural Resource Management Survey. Data analysis and simulation results are used to illustrate the specific advantages of the robust methods when compared to the fully parametric techniques and to other relevant techniques such as predictive mean matching. To summarize, transformations based upon parametric densities are shown to distort several data characteristics in circumstances where the parametric model is ill fit; however, no circumstances are found in which the transformations based upon parametric models outperform the nonparametric transformations. As a result, the transformation based upon the empirical distribution (which is the most computationally efficient is recommended over the other transformation procedures in practice.

  10. Imputation of genotypes in Danish two-way crossbred pigs using low density panels

    DEFF Research Database (Denmark)

    Xiang, Tao; Christensen, Ole Fredslund; Legarra, Andres

    of imputation from 5K SNPs to 7K SNPs on Danish Landrace, Yorkshire, and crossbred Landrace-Yorkshire were compared. In conclusion, genotype imputation on crossbreds performs equally well as in purebreds, when parental breeds are used as the reference panel. When the size of reference is considerably large......, it is redundant to use a combined reference to impute the purebred because a within breed reference can already ensure an outstanding imputation accuracy, but in crossbreds, using a combined reference increased the imputation accuracy greatly. Highly accurate imputed 60K crossbred genotypes were achieved from 7K...

  11. TRIP: An interactive retrieving-inferring data imputation approach

    KAUST Repository

    Li, Zhixu

    2016-06-25

    Data imputation aims at filling in missing attribute values in databases. Existing imputation approaches to nonquantitive string data can be roughly put into two categories: (1) inferring-based approaches [2], and (2) retrieving-based approaches [1]. Specifically, the inferring-based approaches find substitutes or estimations for the missing ones from the complete part of the data set. However, they typically fall short in filling in unique missing attribute values which do not exist in the complete part of the data set [1]. The retrieving-based approaches resort to external resources for help by formulating proper web search queries to retrieve web pages containing the missing values from the Web, and then extracting the missing values from the retrieved web pages [1]. This webbased retrieving approach reaches a high imputation precision and recall, but on the other hand, issues a large number of web search queries, which brings a large overhead [1]. © 2016 IEEE.

  12. The Crash Intensity Evaluation Using General Centrality Criterions and a Geographically Weighted Regression

    Science.gov (United States)

    Ghadiriyan Arani, M.; Pahlavani, P.; Effati, M.; Noori Alamooti, F.

    2017-09-01

    Today, one of the social problems influencing on the lives of many people is the road traffic crashes especially the highway ones. In this regard, this paper focuses on highway of capital and the most populous city in the U.S. state of Georgia and the ninth largest metropolitan area in the United States namely Atlanta. Geographically weighted regression and general centrality criteria are the aspects of traffic used for this article. In the first step, in order to estimate of crash intensity, it is needed to extract the dual graph from the status of streets and highways to use general centrality criteria. With the help of the graph produced, the criteria are: Degree, Pageranks, Random walk, Eccentricity, Closeness, Betweenness, Clustering coefficient, Eigenvector, and Straightness. The intensity of crash point is counted for every highway by dividing the number of crashes in that highway to the total number of crashes. Intensity of crash point is calculated for each highway. Then, criteria and crash point were normalized and the correlation between them was calculated to determine the criteria that are not dependent on each other. The proposed hybrid approach is a good way to regression issues because these effective measures result to a more desirable output. R2 values for geographically weighted regression using the Gaussian kernel was 0.539 and also 0.684 was obtained using a triple-core cube. The results showed that the triple-core cube kernel is better for modeling the crash intensity.

  13. THE CRASH INTENSITY EVALUATION USING GENERAL CENTRALITY CRITERIONS AND A GEOGRAPHICALLY WEIGHTED REGRESSION

    Directory of Open Access Journals (Sweden)

    M. Ghadiriyan Arani

    2017-09-01

    Full Text Available Today, one of the social problems influencing on the lives of many people is the road traffic crashes especially the highway ones. In this regard, this paper focuses on highway of capital and the most populous city in the U.S. state of Georgia and the ninth largest metropolitan area in the United States namely Atlanta. Geographically weighted regression and general centrality criteria are the aspects of traffic used for this article. In the first step, in order to estimate of crash intensity, it is needed to extract the dual graph from the status of streets and highways to use general centrality criteria. With the help of the graph produced, the criteria are: Degree, Pageranks, Random walk, Eccentricity, Closeness, Betweenness, Clustering coefficient, Eigenvector, and Straightness. The intensity of crash point is counted for every highway by dividing the number of crashes in that highway to the total number of crashes. Intensity of crash point is calculated for each highway. Then, criteria and crash point were normalized and the correlation between them was calculated to determine the criteria that are not dependent on each other. The proposed hybrid approach is a good way to regression issues because these effective measures result to a more desirable output. R2 values for geographically weighted regression using the Gaussian kernel was 0.539 and also 0.684 was obtained using a triple-core cube. The results showed that the triple-core cube kernel is better for modeling the crash intensity.

  14. Utilizing genotype imputation for the augmentation of sequence data.

    Directory of Open Access Journals (Sweden)

    Brooke L Fridley

    2010-06-01

    Full Text Available In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci.A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project.Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many "anchor" markers as possible.

  15. Geographic Names

    Data.gov (United States)

    Minnesota Department of Natural Resources — The Geographic Names Information System (GNIS), developed by the United States Geological Survey in cooperation with the U.S. Board of Geographic Names, provides...

  16. Multiple Imputation of Item Scores in Test and Questionnaire Data, and Influence on Psychometric Results

    Science.gov (United States)

    van Ginkel, Joost R.; van der Ark, L. Andries; Sijtsma, Klaas

    2007-01-01

    The performance of five simple multiple imputation methods for dealing with missing data were compared. In addition, random imputation and multivariate normal imputation were used as lower and upper benchmark, respectively. Test data were simulated and item scores were deleted such that they were either missing completely at random, missing at…

  17. Comparing methodologies for imputing ethnicity in an urban ophthalmology clinic.

    Science.gov (United States)

    Storey, Philip; Murchison, Ann P; Dai, Yang; Hark, Lisa; Pizzi, Laura T; Leiby, Benjamin E; Haller, Julia A

    2014-04-01

    To compare methodologies for imputing ethnicity in an urban ophthalmology clinic. Using data from 19,165 patients with self-reported ethnicity, surname, and home address, we compared the accuracy of three methodologies for imputing ethnicity: (1) a surname method based on tabulation from the 2000 US Census; (2) a geocoding method based on tract data from the 2010 US Census; and (3) a combined surname geocoding method using Bayes' theorem. The combined surname geocoding model had the highest accuracy of the three methodologies, imputing black ethnicity with a sensitivity of 84% and positive predictive value (PPV) of 94%, white ethnicity with a sensitivity of 92% and PPV of 82%, Hispanic ethnicity with a sensitivity of 77% and PPV of 71%, and Asian ethnicity with a sensitivity of 83% and PPV of 79%. Overall agreement of imputed and self-reported ethnicity was fair for the surname method (κ 0.23), moderate for the geocoding method (κ 0.58), and strong for the combined method (κ 0.76). A methodology combining surname analysis and Census tract data using Bayes' theorem to determine ethnicity is superior to other methods tested and is ideally suited for research purposes of clinical and administrative data.

  18. Multiple imputation for cure rate quantile regression with censored data.

    Science.gov (United States)

    Wu, Yuanshan; Yin, Guosheng

    2017-03-01

    The main challenge in the context of cure rate analysis is that one never knows whether censored subjects are cured or uncured, or whether they are susceptible or insusceptible to the event of interest. Considering the susceptible indicator as missing data, we propose a multiple imputation approach to cure rate quantile regression for censored data with a survival fraction. We develop an iterative algorithm to estimate the conditionally uncured probability for each subject. By utilizing this estimated probability and Bernoulli sample imputation, we can classify each subject as cured or uncured, and then employ the locally weighted method to estimate the quantile regression coefficients with only the uncured subjects. Repeating the imputation procedure multiple times and taking an average over the resultant estimators, we obtain consistent estimators for the quantile regression coefficients. Our approach relaxes the usual global linearity assumption, so that we can apply quantile regression to any particular quantile of interest. We establish asymptotic properties for the proposed estimators, including both consistency and asymptotic normality. We conduct simulation studies to assess the finite-sample performance of the proposed multiple imputation method and apply it to a lung cancer study as an illustration. © 2016, The International Biometric Society.

  19. A Distribution-Based Multiple Imputation Method for Handling Bivariate Pesticide Data with Values below the Limit of Detection

    Science.gov (United States)

    Chen, Haiying; Quandt, Sara A.; Grzywacz, Joseph G.; Arcury, Thomas A.

    2011-01-01

    Background Environmental and biomedical researchers frequently encounter laboratory data constrained by a lower limit of detection (LOD). Commonly used methods to address these left-censored data, such as simple substitution of a constant for all values LOD, may bias parameter estimation. In contrast, multiple imputation (MI) methods yield valid and robust parameter estimates and explicit imputed values for variables that can be analyzed as outcomes or predictors. Objective In this article we expand distribution-based MI methods for left-censored data to a bivariate setting, specifically, a longitudinal study with biological measures at two points in time. Methods We have presented the likelihood function for a bivariate normal distribution taking into account values LOD as well as missing data assumed missing at random, and we use the estimated distributional parameters to impute values LOD and to generate multiple plausible data sets for analysis by standard statistical methods. We conducted a simulation study to evaluate the sampling properties of the estimators, and we illustrate a practical application using data from the Community Participatory Approach to Measuring Farmworker Pesticide Exposure (PACE3) study to estimate associations between urinary acephate (APE) concentrations (indicating pesticide exposure) at two points in time and self-reported symptoms. Results Simulation study results demonstrated that imputed and observed values together were consistent with the assumed and estimated underlying distribution. Our analysis of PACE3 data using MI to impute APE values LOD showed that urinary APE concentration was significantly associated with potential pesticide poisoning symptoms. Results based on simple substitution methods were substantially different from those based on the MI method. Conclusions The distribution-based MI method is a valid and feasible approach to analyze bivariate data with values LOD, especially when explicit values for the

  20. Sensitivity analysis in multiple imputation in effectiveness studies of psychotherapy.

    Directory of Open Access Journals (Sweden)

    Aureliano eCrameri

    2015-07-01

    Full Text Available The importance of preventing and treating incomplete data in effectiveness studies is nowadays emphasized. However, most of the publications focus on randomized clinical trials. One flexible technique for statistical inference with missing data is multiple imputation (MI. Since methods such as MI rely on the assumption of missing data being at random (MAR, a sensitivity analysis for testing the robustness against departures from this assumption is required.In this paper we present a sensitivity analysis technique based on posterior predictive checking, which takes into consideration the concept of clinical significance used in the evaluation of intra-individual changes. We demonstrate the possibilities this technique can offer with the example of irregular longitudinal data collected with the Outcome Questionnaire-45 (OQ-45 and the Helping Alliance Questionnaire (HAQ in a sample of 260 outpatients.The sensitivity analysis can be used to (1 quantify the degree of bias introduced by missing not at random data (MNAR in a worst reasonable case scenario, (2 compare the performance of different analysis methods for dealing with missing data, or (3 detect the influence of possible violations to the model assumptions (e.g. lack of normality.Moreover, our analysis showed that ratings from the patient’s and therapist’s version of the HAQ could significantly improve the predictive value of the routine outcome monitoring based on the OQ-45. Since analysis dropouts always occur, repeated measurements with the OQ-45 and the HAQ analyzed with MI are useful to improve the accuracy of outcome estimates in quality assurance assessments and nonrandomized effectiveness studies in the field of outpatient psychotherapy.

  1. Sequence imputation of HPV16 genomes for genetic association studies.

    Directory of Open Access Journals (Sweden)

    Benjamin Smith

    Full Text Available Human Papillomavirus type 16 (HPV16 causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs is unknown. In addition, we neither know which nucleotides vary across and within HPV types and lineages, nor which of the single nucleotide polymorphisms (SNPs determine oncogenicity.A reference set of 62 HPV16 complete genome sequences was established and used to examine patterns of evolutionary relatedness amongst variants using a pairwise identity heatmap and HPV16 phylogeny. A BLAST-based algorithm was developed to impute complete genome data from partial sequence information using the reference database. To interrogate the oncogenic risk of determined and imputed HPV16 SNPs, odds-ratios for each SNP were calculated in a case-control viral genome-wide association study (VWAS using biopsy confirmed high-grade cervix neoplasia and self-limited HPV16 infections from Guanacaste, Costa Rica.HPV16 variants display evolutionarily stable lineages that contain conserved diagnostic SNPs. The imputation algorithm indicated that an average of 97.5±1.03% of SNPs could be accurately imputed. The VWAS revealed specific HPV16 viral SNPs associated with variant lineages and elevated odds ratios; however, individual causal SNPs could not be distinguished with certainty due to the nature of HPV evolution.Conserved and lineage-specific SNPs can be imputed with a high degree of accuracy from limited viral polymorphic data due to the lack of recombination and the stochastic mechanism of variation accumulation in the HPV genome. However, to determine the role of novel variants or non-lineage-specific SNPs by VWAS will require direct sequence analysis. The investigation of patterns of genetic variation and the identification of diagnostic SNPs for lineages of HPV16 variants provides a valuable resource for future studies of HPV16

  2. IMPROVEMENT EVALUATION ON CERAMIC ROOF EXTRACTION USING WORLDVIEW-2 IMAGERY AND GEOGRAPHIC DATA MINING APPROACH

    Directory of Open Access Journals (Sweden)

    V. S. Brum-Bastos

    2016-06-01

    Full Text Available Advances in geotechnologies and in remote sensing have improved analysis of urban environments. The new sensors are increasingly suited to urban studies, due to the enhancement in spatial, spectral and radiometric resolutions. Urban environments present high heterogeneity, which cannot be tackled using pixel–based approaches on high resolution images. Geographic Object–Based Image Analysis (GEOBIA has been consolidated as a methodology for urban land use and cover monitoring; however, classification of high resolution images is still troublesome. This study aims to assess the improvement on ceramic roof classification using WorldView-2 images due to the increase of 4 new bands besides the standard “Blue-Green-Red-Near Infrared” bands. Our methodology combines GEOBIA, C4.5 classification tree algorithm, Monte Carlo simulation and statistical tests for classification accuracy. Two samples groups were considered: 1 eight multispectral and panchromatic bands, and 2 four multispectral and panchromatic bands, representing previous high-resolution sensors. The C4.5 algorithm generates a decision tree that can be used for classification; smaller decision trees are closer to the semantic networks produced by experts on GEOBIA, while bigger trees, are not straightforward to implement manually, but are more accurate. The choice for a big or small tree relies on the user’s skills to implement it. This study aims to determine for what kind of user the addition of the 4 new bands might be beneficial: 1 the common user (smaller trees or 2 a more skilled user with coding and/or data mining abilities (bigger trees. In overall the classification was improved by the addition of the four new bands for both types of users.

  3. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel.

    Science.gov (United States)

    Mitt, Mario; Kals, Mart; Pärn, Kalle; Gabriel, Stacey B; Lander, Eric S; Palotie, Aarno; Ripatti, Samuli; Morris, Andrew P; Metspalu, Andres; Esko, Tõnu; Mägi, Reedik; Palta, Priit

    2017-06-01

    Genetic imputation is a cost-efficient way to improve the power and resolution of genome-wide association (GWA) studies. Current publicly accessible imputation reference panels accurately predict genotypes for common variants with minor allele frequency (MAF)≥5% and low-frequency variants (0.5≤MAFWGS) based reference panel, comprising of 2244 Estonian individuals (0.25% of adult Estonians). Although the Estonian-specific panel contains fewer haplotypes and variants, the imputation confidence and accuracy of imputed low-frequency and rare variants was significantly higher. The results indicate the utility of population-specific reference panels for human genetic studies.

  4. Evaluation of the 3d Urban Modelling Capabilities in Geographical Information Systems

    Science.gov (United States)

    Dogru, A. O.; Seker, D. Z.

    2010-12-01

    Geographical Information System (GIS) Technology, which provides successful solutions to basic spatial problems, is currently widely used in 3 dimensional (3D) modeling of physical reality with its developing visualization tools. The modeling of large and complicated phenomenon is a challenging problem in terms of computer graphics currently in use. However, it is possible to visualize that phenomenon in 3D by using computer systems. 3D models are used in developing computer games, military training, urban planning, tourism and etc. The use of 3D models for planning and management of urban areas is very popular issue of city administrations. In this context, 3D City models are produced and used for various purposes. However the requirements of the models vary depending on the type and scope of the application. While a high level visualization, where photorealistic visualization techniques are widely used, is required for touristy and recreational purposes, an abstract visualization of the physical reality is generally sufficient for the communication of the thematic information. The visual variables, which are the principle components of cartographic visualization, such as: color, shape, pattern, orientation, size, position, and saturation are used for communicating the thematic information. These kinds of 3D city models are called as abstract models. Standardization of technologies used for 3D modeling is now available by the use of CityGML. CityGML implements several novel concepts to support interoperability, consistency and functionality. For example it supports different Levels-of-Detail (LoD), which may arise from independent data collection processes and are used for efficient visualization and efficient data analysis. In one CityGML data set, the same object may be represented in different LoD simultaneously, enabling the analysis and visualization of the same object with regard to different degrees of resolution. Furthermore, two CityGML data sets

  5. Evaluation of trap capture in a geographically closed population of brown treesnakes on Guam

    Science.gov (United States)

    Tyrrell, C.L.; Christy, M.T.; Rodda, G.H.; Yackel Adams, A.A.; Ellingson, A.R.; Savidge, J.A.; Dean-Bradley, K.; Bischof, R.

    2009-01-01

    1. Open population mark-recapture analysis of unbounded populations accommodates some types of closure violations (e.g. emigration, immigration). In contrast, closed population analysis of such populations readily allows estimation of capture heterogeneity and behavioural response, but requires crucial assumptions about closure (e.g. no permanent emigration) that are suspect and rarely tested empirically. 2. In 2003, we erected a double-sided barrier to prevent movement of snakes in or out of a 5-ha semi-forested study site in northern Guam. This geographically closed population of >100 snakes was monitored using a series of transects for visual searches and a 13 ?? 13 trapping array, with the aim of marking all snakes within the site. Forty-five marked snakes were also supplemented into the resident population to quantify the efficacy of our sampling methods. We used the program mark to analyse trap captures (101 occasions), referenced to census data from visual surveys, and quantified heterogeneity, behavioural response, and size bias in trappability. Analytical inclusion of untrapped individuals greatly improved precision in the estimation of some covariate effects. 3. A novel discovery was that trap captures for individual snakes consisted of asynchronous bouts of high capture probability lasting about 7 days (ephemeral behavioural effect). There was modest behavioural response (trap happiness) and significant latent (unexplained) heterogeneity, with small influences on capture success of date, gender, residency status (translocated or not), and body condition. 4. Trapping was shown to be an effective tool for eradicating large brown treesnakes Boiga irregularis (>900 mm snout-vent length, SVL). 5. Synthesis and applications. Mark-recapture modelling is commonly used by ecological managers to estimate populations. However, existing models involve making assumptions about either closure violations or response to capture. Physical closure of our population on a

  6. Application of imputation methods to genomic selection in Chinese Holstein cattle

    Directory of Open Access Journals (Sweden)

    Weng Ziqing

    2012-02-01

    Full Text Available Abstract Missing genotypes are a common feature of high density SNP datasets obtained using SNP chip technology and this is likely to decrease the accuracy of genomic selection. This problem can be circumvented by imputing the missing genotypes with estimated genotypes. When implementing imputation, the criteria used for SNP data quality control and whether to perform imputation before or after data quality control need to consider. In this paper, we compared six strategies of imputation and quality control using different imputation methods, different quality control criteria and by changing the order of imputation and quality control, against a real dataset of milk production traits in Chinese Holstein cattle. The results demonstrated that, no matter what imputation method and quality control criteria were used, strategies with imputation before quality control performed better than strategies with imputation after quality control in terms of accuracy of genomic selection. The different imputation methods and quality control criteria did not significantly influence the accuracy of genomic selection. We concluded that performing imputation before quality control could increase the accuracy of genomic selection, especially when the rate of missing genotypes is high and the reference population is small.

  7. Evaluating the effects of urban congestion pricing: Geographical accessibility versus social surplus

    NARCIS (Netherlands)

    Tillema, T.; Verhoef, E.T.; van Wee, G.P.; van Amelsfort, D.

    2011-01-01

    In urbanised areas around the world, road pricing policies are considered more and more frequently, the aim often being to alleviate (some of the) external traffic-related costs. To assess the effects of a proposed road pricing measure, several evaluation measures can be used, coming from different

  8. Evaluating the effects of urban congestion pricing : geographical accessibility versus social surplus

    NARCIS (Netherlands)

    Tillema, Taede; Verhoef, Erik; van Wee, Bert; van Amelsfort, Dirk; van Wee, G.P

    2011-01-01

    In urbanised areas around the world, road pricing policies are considered more and more frequently, the aim often being to alleviate (some of the) external traffic-related costs. To assess the effects of a proposed road pricing measure, several evaluation measures can be used, coming from different

  9. Evaluation Models for E-Learning Platform in Riyadh City Universities (RCU with Applied of Geographical Information System (GIS

    Directory of Open Access Journals (Sweden)

    Abdulaziz I. Alharrah

    2014-12-01

    Full Text Available E-learning that integrates digital knowledge content, network and information technology has become an emerging learning method. As the e-learning platform approach is becoming an important tool to allow the flexibility and quality requested by such a kind of learning process. There is a new kind of problem faced by organizations consisting in the selection of the most suitable e-learning platform. This paper proposes evaluation model for E-Learning platform in Riyadh City universities (RCU with Applied Geographic Information System (GIS. The E-Learning platform solution selection is a multiple criteria decision-making problem that needs to be addressed objectively taking into consideration the relative weights of the criteria for any organization. We formulate the quoted multi criteria problem as a decision hierarchy to be solved using GIS. AGIS-based evaluation index system and web-based evaluating platform were established. In this paper we will show the general evaluation strategy and some obtained results using our model to evaluate some existing commercial platforms.The results of evaluation model are outlined as follows: Total weights of the proposed framework in management feature is 20.25/25, in collaborative feature is 9.2/10, in adaption learning path is 6.8/10 and in interactive learning object is 5/5. The total weights of all features are 41.25/50. In this study an evaluation model was applied on Riyadh City universities like KSU, IMAMU, NAUSS, YU and FU. Then, the results were compared with each other. The total weighs of KSU was 41. While the total weights of FU, IMAMU, YU and NAUSS was 40, 37, 36 and 32, respectively. Evaluation process shows that the proposed framework satisfied the objectives with applied GIS.

  10. Untangling human and environmental effects on geographical gradients of mammal species richness: a global and regional evaluation.

    Science.gov (United States)

    Torres-Romero, Erik Joaquín; Olalla-Tárraga, Miguel Á

    2015-05-01

    Different hypotheses (geographical, ecological, evolutionary or a combination of them) have been suggested to account for the spatial variation in species richness. However, the relative importance of environment and human impacts in explaining these patterns, either globally or at the biogeographical region level, remains largely unexplored. Here, we jointly evaluate how current environmental conditions and human impacts shape global and regional gradients of species richness in terrestrial mammals. We processed IUCN global distributional data for 3939 mammal species and a set of seven environmental and two human impact variables at a spatial resolution of 96.5 × 96.5 km. We used simple, multiple and partial regression techniques to evaluate environmental and human effects on species richness. Actual evapotranspiration (AET) is the main driver of mammal species richness globally. Together with our results at the biogeographical realm level, this lends strong support for the water-energy hypothesis (i.e. global diversity gradients are best explained by the interaction of water and energy, with a latitudinal shift in the relative importance of ambient energy vs. water availability as we move from the poles to the equator). While human effects on species richness are not easily detected at a global scale due to the large proportion of shared variance with the environment, these effects significantly emerge at the regional level. In the Nearctic, Palearctic and Oriental regions, the independent contribution of human impacts is almost as important as current environmental conditions in explaining richness patterns. The intersection of human impacts with climate drives the geographical variation in mammal species richness in the Palearctic, Nearctic and Oriental regions. Using a human accessibility variable, we show, for the first time, that the zones most accessible to humans are often those where we find lower mammal species richness. © 2014 The Authors. Journal of

  11. Missing Data and Multiple Imputation: An Unbiased Approach

    Science.gov (United States)

    Foy, M.; VanBaalen, M.; Wear, M.; Mendez, C.; Mason, S.; Meyers, V.; Alexander, D.; Law, J.

    2014-01-01

    The default method of dealing with missing data in statistical analyses is to only use the complete observations (complete case analysis), which can lead to unexpected bias when data do not meet the assumption of missing completely at random (MCAR). For the assumption of MCAR to be met, missingness cannot be related to either the observed or unobserved variables. A less stringent assumption, missing at random (MAR), requires that missingness not be associated with the value of the missing variable itself, but can be associated with the other observed variables. When data are truly MAR as opposed to MCAR, the default complete case analysis method can lead to biased results. There are statistical options available to adjust for data that are MAR, including multiple imputation (MI) which is consistent and efficient at estimating effects. Multiple imputation uses informing variables to determine statistical distributions for each piece of missing data. Then multiple datasets are created by randomly drawing on the distributions for each piece of missing data. Since MI is efficient, only a limited number, usually less than 20, of imputed datasets are required to get stable estimates. Each imputed dataset is analyzed using standard statistical techniques, and then results are combined to get overall estimates of effect. A simulation study will be demonstrated to show the results of using the default complete case analysis, and MI in a linear regression of MCAR and MAR simulated data. Further, MI was successfully applied to the association study of CO2 levels and headaches when initial analysis showed there may be an underlying association between missing CO2 levels and reported headaches. Through MI, we were able to show that there is a strong association between average CO2 levels and the risk of headaches. Each unit increase in CO2 (mmHg) resulted in a doubling in the odds of reported headaches.

  12. A framework for multiple imputation in cluster analysis.

    Science.gov (United States)

    Basagaña, Xavier; Barrera-Gómez, Jose; Benet, Marta; Antó, Josep M; Garcia-Aymerich, Judith

    2013-04-01

    Multiple imputation is a common technique for dealing with missing values and is mostly applied in regression settings. Its application in cluster analysis problems, where the main objective is to classify individuals into homogenous groups, involves several difficulties which are not well characterized in the current literature. In this paper, we propose a framework for applying multiple imputation to cluster analysis when the original data contain missing values. The proposed framework incorporates the selection of the final number of clusters and a variable reduction procedure, which may be needed in data sets where the ratio of the number of persons to the number of variables is small. We suggest some ways to report how the uncertainty due to multiple imputation of missing data affects the cluster analysis outcomes-namely the final number of clusters, the results of a variable selection procedure (if applied), and the assignment of individuals to clusters. The proposed framework is illustrated with data from the Phenotype and Course of Chronic Obstructive Pulmonary Disease (PAC-COPD) Study (Spain, 2004-2008), which aimed to classify patients with chronic obstructive pulmonary disease into different disease subtypes.

  13. Landscape-scale parameterization of a tree-level forest growth model: a k-nearest neighbor imputation approach incorporating LiDAR data

    Science.gov (United States)

    Michael J. Falkowski; Andrew T. Hudak; Nicholas L. Crookston; Paul E. Gessler; Edward H. Uebler; Alistair M. S. Smith

    2010-01-01

    Sustainable forest management requires timely, detailed forest inventory data across large areas, which is difficult to obtain via traditional forest inventory techniques. This study evaluated k-nearest neighbor imputation models incorporating LiDAR data to predict tree-level inventory data (individual tree height, diameter at breast height, and...

  14. Fitting additive hazards models for case-cohort studies: a multiple imputation approach.

    Science.gov (United States)

    Jung, Jinhyouk; Harel, Ofer; Kang, Sangwook

    2016-07-30

    In this paper, we consider fitting semiparametric additive hazards models for case-cohort studies using a multiple imputation approach. In a case-cohort study, main exposure variables are measured only on some selected subjects, but other covariates are often available for the whole cohort. We consider this as a special case of a missing covariate by design. We propose to employ a popular incomplete data method, multiple imputation, for estimation of the regression parameters in additive hazards models. For imputation models, an imputation modeling procedure based on a rejection sampling is developed. A simple imputation modeling that can naturally be applied to a general missing-at-random situation is also considered and compared with the rejection sampling method via extensive simulation studies. In addition, a misspecification aspect in imputation modeling is investigated. The proposed procedures are illustrated using a cancer data example. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  15. Multiple imputation of continuous data via a semiparametric probability integral transformation.

    Science.gov (United States)

    Helenowski, Irene B; Demirtas, Hakan

    2014-01-01

    We propose a semiparametric approach incorporating principles of multiple imputation under the normality assumption, multivariate number generation, and computation of empirical cumulative distribution function (eCDF) values to impute continuous data with variables following any marginal distribution. This method involves mapping the data to normally distributed values, imputing these values, and back-transforming the data onto the scale of the original data. The transformations associated with eCDF computations constitute the nonparametric portion of our algorithm, while imputation under the normality assumption constitutes the parametric portion. Application of this method to simulated and real data leads to promising results.

  16. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    Science.gov (United States)

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. Evaluation of the Systematic Status of Geographical Variations in Arcuphantes hibanus (Arachnida: Araneae: Linyphiidae), with Descriptions of Two New Species.

    Science.gov (United States)

    Nakano, Takafumi; Ihara, Yoh; Kumasaki, Yusuke; Baba, Yuki G; Tomikawa, Ko

    2017-08-01

    The systematic status of geographical variants of Arcuphantes hibanus Saito, 1992 belonging to the A. longiscapus species group, indigenous to western Honshu and Shikoku, Japan, was evaluated using morphological and molecular data. Two species, A. enmusubi Ihara, Nakano and Tomikawa, sp. nov. and A. occidentalis Ihara, Nakano and Tomikawa, sp. nov., are described, and A. hibanus is redescribed with redefinition of its taxonomic status. These three species are diagnosed by the characteristics of paracymbium, pseudolamella, and epigynal basal part. Phylogenetic trees obtained with mitochondrial cytochrome c oxidase subunit I and 16S rRNA markers showed that the variants are mutually genetically highly diverged. However, the mtDNA phylogenies failed to recover the monophyly of A. hibanus redefined herein. Contrary to the mtDNA phylogenetic analyses, a neighbor-network analysis of nuclear internal transcribed spacer 1 sequences of A. hibanus, A. enmusubi and A. occidentalis spiders showed that each of them forms a cluster. The results of mitochondrial and nuclear DNA analyses in each of the three species are briefly discussed, along with their taxonomic identities.

  18. Rapid discrimination of geographical origin and evaluation of antioxidant activity of Salvia miltiorrhiza var. alba by Fourier transform near infrared spectroscopy

    Science.gov (United States)

    Duan, Xiaoju; Zhang, Danlu; Nie, Lei; Zang, Hengchang

    2014-03-01

    Radix Salvia miltiorrhiza Bge. var. alba C.Y. Wu and H.W. Li and Radix S. miltrorrhiza belong to the same genus. S. miltiorrhiza var. alba has a unique effectiveness for thromboangiitis besides therapeutical efficay of S. miltrorrhiza. It exhibits antioxidant activity (AA), while its quality and efficacy also vary with geographic locations. Therefore, a rapid and nondestructive method based on Fourier transform near infrared spectroscopy (FT-NIRS) was developed for discrimination of geographical origin and evaluation of AA of S. miltiorrhiza var. alba. The discrimination of geographical origin was achieved by using discriminant analysis and the accuracy was 100%. Partial least squares (PLS) regression was employed to establish the model for evaluation of AA by NIRS. The spectral regions were selected by interval PLS (i-PLS) method. Different pre-treated methods were compared for the spectral pre-processing. The final optimal results of PLS model showed that correlation coefficients in the calibration set (Rc) and the prediction set (Rp), root mean square error of prediction (RMSEP) and residual prediction deviation (RPD) were 0.974, 0.950, 0.163 mg mL-1 and 2.66, respectively. The results demonstrated that NIRs combined with chemometric methods could be a rapid and nondestructive tool to discriminate geographical origin and evaluate AA of S. miltiorrhiza var. alba. The developed NIRS method might have a potential application to high-throughput screening of a great number of raw S. miltiorrhiza var. alba samples for AA.

  19. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Science.gov (United States)

    2010-10-01

    ... money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS AND... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the representative...

  20. Consequences of Splitting Sequencing Effort over Multiple Breeds on Imputation Accuracy

    NARCIS (Netherlands)

    Bouwman, A.C.; Veerkamp, R.F.

    2014-01-01

    Imputation from a high-density SNP panel (777k) to whole-genome sequence with a reference population of 20 Holstein resulted in an average imputation accuracy of 0.70, and increased to 0.83 when the reference population was increased by including 3 other dairy breeds with 20 animals each. When the

  1. Mapping gradients of community composition with nearest-neighbour imputation: extending plot data for landscape analysis

    Science.gov (United States)

    Janet L. Ohmann; Matthew J. Gregory; Emilie B. Henderson; Heather M. Roberts

    2011-01-01

    Question: How can nearest-neighbour (NN) imputation be used to develop maps of multiple species and plant communities? Location: Western and central Oregon, USA, but methods are applicable anywhere. Methods: We demonstrate NN imputation by mapping woody plant communities for >100 000 km2 of diverse forests and woodlands. Species abundances on...

  2. Multiple imputation of discrete and continuous data by fully conditional specification

    NARCIS (Netherlands)

    Buuren, S. van

    2010-01-01

    The goal of multiple imputation is to provide valid inferences for statistical estimates from incomplete data. To achieve that goal, imputed values should preserve the structure in the data, as well as the uncertainty about this structure, and include any knowledge about the process that generated

  3. Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model.

    Science.gov (United States)

    Bartlett, Jonathan W; Seaman, Shaun R; White, Ian R; Carpenter, James R

    2015-08-01

    Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation. Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of multiple imputation may impute covariates from models that are incompatible with such substantive models. We show how imputation by fully conditional specification, a popular approach for performing multiple imputation, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it with existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible. Stata software implementing the approach is freely available. © The Author(s) 2014.

  4. An imputed forest composition map for New England screened by species range boundaries

    Science.gov (United States)

    Matthew J. Duveneck; Jonathan R. Thompson; B. Tyler. Wilson

    2015-01-01

    Initializing forest landscape models (FLMs) to simulate changes in tree species composition requires accurate fine-scale forest attribute information mapped continuously over large areas. Nearest-neighbor imputation maps, maps developed from multivariate imputation of field plots, have high potential for use as the initial condition within FLMs, but the tendency for...

  5. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    DEFF Research Database (Denmark)

    Tachmazidou, Ioanna; Süveges, Dániel; Min, Josine L

    2017-01-01

    Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader alleli...

  6. Geographical Indications

    OpenAIRE

    Anechitoae Constantin; Grigoru? Lavinia-Maria

    2011-01-01

    “The denomination of origin” may be the name of a region, a specific place or country used to describe an agricultural or food product. "The geographical indication" may be the name of a region, a specific place or a country, used to describe an agricultural or food product. The indication of provenance and the denomination of origin serve to identify the source and origin of goods or services.

  7. An imputation-based genome-wide association study on traits related to male reproduction in a White Duroc × Erhualian F2 population.

    Science.gov (United States)

    Zhao, Xueyan; Zhao, Kewei; Ren, Jun; Zhang, Feng; Jiang, Chao; Hong, Yuan; Jiang, Kai; Yang, Qiang; Wang, Chengbin; Ding, Nengshui; Huang, Lusheng; Zhang, Zhiyan; Xing, Yuyun

    2016-05-01

    Boar reproductive traits are economically important for the pig industry. Here we conducted a genome-wide association study (GWAS) for 13 reproductive traits measured on 205 F2 boars at day 300 using 60 K single nucleotide polymorphism (SNP) data imputed from a reference panel of 1200 pigs in a White Duroc × Erhualian F2 intercross population. We identified 10 significant loci for seven traits on eight pig chromosomes (SSC). Two loci surpassed the genome-wide significance level, including one for epididymal weight around 60.25 Mb on SSC7 and one for semen temperature around 43.69 Mb on SSC4. Four of the 10 significant loci that we identified were consistent with previously reported quantitative trait loci for boar reproduction traits. We highlighted several interesting candidate genes at these loci, including APN, TEP1, PARP2, SPINK1 and PDE1C. To evaluate the imputation accuracy, we further genotyped nine GWAS top SNPs using PCR restriction fragment length polymorphism or Sanger sequencing. We found an average of 91.44% of genotype concordance, 95.36% of allelic concordance and 0.85 of r(2) correlation between imputed and real genotype data. This indicates that our GWAS mapping results based on imputed SNP data are reliable, providing insights into the genetic basis of boar reproductive traits. © 2015 Japanese Society of Animal Science.

  8. Comparison of different methods for imputing genome-wide marker genotypes in Swedish and Finnish Red Cattle

    DEFF Research Database (Denmark)

    Ma, Peipei; Brøndum, Rasmus Froberg; Qin, Zahng

    2013-01-01

    This study investigated the imputation accuracy of different methods, considering both the minor allele frequency and relatedness between individuals in the reference and test data sets. Two data sets from the combined population of Swedish and Finnish Red Cattle were used to test the influence...... of these factors on the accuracy of imputation. Data set 1 consisted of 2,931 reference bulls and 971 test bulls, and was used for validation of imputation from 3,000 markers (3K) to 54,000 markers (54K). Data set 2 contained 341 bulls in the reference set and 117 in the test set, and was used for validation...... of imputation from 54K to high density [777,000 markers (777K)]. Both test sets were divided into 4 groups according to their relationship to the reference population. Five imputation methods (Beagle, IMPUTE2, findhap, AlphaImpute, and FImpute) were used in this study. Imputation accuracy was measured...

  9. [Imputing missing data in public health: general concepts and application to dichotomous variables].

    Science.gov (United States)

    Hernández, Gilma; Moriña, David; Navarro, Albert

    The presence of missing data in collected variables is common in health surveys, but the subsequent imputation thereof at the time of analysis is not. Working with imputed data may have certain benefits regarding the precision of the estimators and the unbiased identification of associations between variables. The imputation process is probably still little understood by many non-statisticians, who view this process as highly complex and with an uncertain goal. To clarify these questions, this note aims to provide a straightforward, non-exhaustive overview of the imputation process to enable public health researchers ascertain its strengths. All this in the context of dichotomous variables which are commonplace in public health. To illustrate these concepts, an example in which missing data is handled by means of simple and multiple imputation is introduced. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.

  10. Genotype imputation in a coalescent model with infinitely-many-sites mutation.

    Science.gov (United States)

    Huang, Lucy; Buzbas, Erkan O; Rosenberg, Noah A

    2013-08-01

    Empirical studies have identified population-genetic factors as important determinants of the properties of genotype-imputation accuracy in imputation-based disease association studies. Here, we develop a simple coalescent model of three sequences that we use to explore the theoretical basis for the influence of these factors on genotype-imputation accuracy, under the assumption of infinitely-many-sites mutation. Employing a demographic model in which two populations diverged at a given time in the past, we derive the approximate expectation and variance of imputation accuracy in a study sequence sampled from one of the two populations, choosing between two reference sequences, one sampled from the same population as the study sequence and the other sampled from the other population. We show that, under this model, imputation accuracy-as measured by the proportion of polymorphic sites that are imputed correctly in the study sequence-increases in expectation with the mutation rate, the proportion of the markers in a chromosomal region that are genotyped, and the time to divergence between the study and reference populations. Each of these effects derives largely from an increase in information available for determining the reference sequence that is genetically most similar to the sequence targeted for imputation. We analyze as a function of divergence time the expected gain in imputation accuracy in the target using a reference sequence from the same population as the target rather than from the other population. Together with a growing body of empirical investigations of genotype imputation in diverse human populations, our modeling framework lays a foundation for extending imputation techniques to novel populations that have not yet been extensively examined. Copyright © 2012 Elsevier Inc. All rights reserved.

  11. A new approach for efficient genotype imputation using information from relatives.

    Science.gov (United States)

    Sargolzaei, Mehdi; Chesnais, Jacques P; Schenkel, Flavio S

    2014-06-17

    Genotype imputation can help reduce genotyping costs particularly for implementation of genomic selection. In applications entailing large populations, recovering the genotypes of untyped loci using information from reference individuals that were genotyped with a higher density panel is computationally challenging. Popular imputation methods are based upon the Hidden Markov model and have computational constraints due to an intensive sampling process. A fast, deterministic approach, which makes use of both family and population information, is presented here. All individuals are related and, therefore, share haplotypes which may differ in length and frequency based on their relationships. The method starts with family imputation if pedigree information is available, and then exploits close relationships by searching for long haplotype matches in the reference group using overlapping sliding windows. The search continues as the window size is shrunk in each chromosome sweep in order to capture more distant relationships. The proposed method gave higher or similar imputation accuracy than Beagle and Impute2 in cattle data sets when all available information was used. When close relatives of target individuals were present in the reference group, the method resulted in higher accuracy compared to the other two methods even when the pedigree was not used. Rare variants were also imputed with higher accuracy. Finally, computing requirements were considerably lower than those of Beagle and Impute2. The presented method took 28 minutes to impute from 6 k to 50 k genotypes for 2,000 individuals with a reference size of 64,429 individuals. The proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation. In addition to its high imputation accuracy, the method is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical.

  12. Geographical Tatoos

    Directory of Open Access Journals (Sweden)

    Valéria Cazetta

    2014-08-01

    Full Text Available The article deals with maps tattooed on bodies. My interest in studying the corporeality is inserted in a broader project entitled Geographies and (in Bodies. There is several published research on tattoos, but none in particular about tattooed maps. However some of these works interested me because they present important discussions in contemporary about body modification that helped me locate the body modifications most within the culture than on the nature. At this time, I looked at pictures of geographical tattoos available in several sites of the internet.

  13. High-density marker imputation accuracy in sixteen French cattle breeds.

    Science.gov (United States)

    Hozé, Chris; Fouilloux, Marie-Noëlle; Venot, Eric; Guillaume, François; Dassonneville, Romain; Fritz, Sébastien; Ducrocq, Vincent; Phocas, Florence; Boichard, Didier; Croiseau, Pascal

    2013-09-03

    Genotyping with the medium-density Bovine SNP50 BeadChip® (50K) is now standard in cattle. The high-density BovineHD BeadChip®, which contains 777,609 single nucleotide polymorphisms (SNPs), was developed in 2010. Increasing marker density increases the level of linkage disequilibrium between quantitative trait loci (QTL) and SNPs and the accuracy of QTL localization and genomic selection. However, re-genotyping all animals with the high-density chip is not economically feasible. An alternative strategy is to genotype part of the animals with the high-density chip and to impute high-density genotypes for animals already genotyped with the 50K chip. Thus, it is necessary to investigate the error rate when imputing from the 50K to the high-density chip. Five thousand one hundred and fifty three animals from 16 breeds (89 to 788 per breed) were genotyped with the high-density chip. Imputation error rates from the 50K to the high-density chip were computed for each breed with a validation set that included the 20% youngest animals. Marker genotypes were masked for animals in the validation population in order to mimic 50K genotypes. Imputation was carried out using the Beagle 3.3.0 software. Mean allele imputation error rates ranged from 0.31% to 2.41% depending on the breed. In total, 1980 SNPs had high imputation error rates in several breeds, which is probably due to genome assembly errors, and we recommend to discard these in future studies. Differences in imputation accuracy between breeds were related to the high-density-genotyped sample size and to the genetic relationship between reference and validation populations, whereas differences in effective population size and level of linkage disequilibrium showed limited effects. Accordingly, imputation accuracy was higher in breeds with large populations and in dairy breeds than in beef breeds. More than 99% of the alleles were correctly imputed if more than 300 animals were genotyped at high-density. No

  14. Differential network analysis with multiply imputed lipidomic data.

    Directory of Open Access Journals (Sweden)

    Maiju Kujala

    Full Text Available The importance of lipids for cell function and health has been widely recognized, e.g., a disorder in the lipid composition of cells has been related to atherosclerosis caused cardiovascular disease (CVD. Lipidomics analyses are characterized by large yet not a huge number of mutually correlated variables measured and their associations to outcomes are potentially of a complex nature. Differential network analysis provides a formal statistical method capable of inferential analysis to examine differences in network structures of the lipids under two biological conditions. It also guides us to identify potential relationships requiring further biological investigation. We provide a recipe to conduct permutation test on association scores resulted from partial least square regression with multiple imputed lipidomic data from the LUdwigshafen RIsk and Cardiovascular Health (LURIC study, particularly paying attention to the left-censored missing values typical for a wide range of data sets in life sciences. Left-censored missing values are low-level concentrations that are known to exist somewhere between zero and a lower limit of quantification. To make full use of the LURIC data with the missing values, we utilize state of the art multiple imputation techniques and propose solutions to the challenges that incomplete data sets bring to differential network analysis. The customized network analysis helps us to understand the complexities of the underlying biological processes by identifying lipids and lipid classes that interact with each other, and by recognizing the most important differentially expressed lipids between two subgroups of coronary artery disease (CAD patients, the patients that had a fatal CVD event and the ones who remained stable during two year follow-up.

  15. Vojaškogeografsko vrednotenje območja Julijskih Alp = Military-geographic evaluation of the Julian Alps area

    Directory of Open Access Journals (Sweden)

    Zvonimir Bratun

    1998-01-01

    Full Text Available The Julian Alps have been ofmilitary significance since Roman times in a military geographic sensebecause of its valleys, mountain passes and lines of defence on mountainridges. They became especially important in the 19th and 20th century. Thelargest mountain front in World War I was located there,and evidence ofthat front is still visible today. The border between Italy and Yugoslaviain the heart of the Julian Alps was clearly a line of demarcation along theSoča and Sava watersheds and was reinforced with fortification, obstaclesand trenches. During the Cold War, there was an ideological line ofdemarcation along the western edge of the Julian Alps as well. Militarystrategy in that area included the use of military geographic approaches inboth westerly and easterly directions. After the geopolitical changes of1991, the Julian Alps no longer had same military geographic significancein terms of Slovenian national security. Today other military activitiesare more important: training under mountains conditions for NATO soldiers,non-commissioned and commissioned officers takes place in the Pokljukaregion and on the Triglav mountain chain. Military facilities have beentaken on significance in the terms of tourism as well.

  16. Botanical and geographical characterization of green coffee (Coffea arabica and Coffea canephora): chemometric evaluation of phenolic and methylxanthine contents.

    Science.gov (United States)

    Alonso-Salces, Rosa M; Serra, Francesca; Reniero, Fabiano; Héberger, Károly

    2009-05-27

    Green coffee beans of the two main commercial coffee varieties, Coffea arabica (Arabica) and Coffea canephora (Robusta), from the major growing regions of America, Africa, Asia, and Oceania were studied. The contents of chlorogenic acids, cinnamoyl amides, cinnamoyl glycosides, free phenolic acids, and methylxanthines of green coffee beans were analyzed by liquid chromatography coupled with UV spectrophotometry to determine their botanical and geographical origins. The analysis of caffeic acid, 3-feruloylquinic acid, 5-feruloylquinic acid, 4-feruloylquinic acid, 3,4-dicaffeoylquinic acid, 3-caffeoyl-5-feruloylquinic acid, 3-caffeoyl-4-feruloylquinic acid, 3-p-coumaroyl-4-caffeoylquinic acid, 3-caffeoyl-4-dimethoxycinnamoylquinic acid, 3-caffeoyl-5-dimethoxycinnamoylquinic acid, p-coumaroyl-N-tryptophan, feruloyl-N-tryptophan, caffeoyl-N-tryptophan, and caffeine enabled the unequivocal botanical characterization of green coffee beans. Moreover, some free phenolic acids and cinnamate conjugates of green coffee beans showed great potential as means for the geographical characterization of coffee. Thus, p-coumaroyl-N-tyrosine, caffeoyl-N-phenylalanine, caffeoyl-N-tyrosine, 3-dimethoxycinnamoyl-5-feruloylquinic acid, and dimethoxycinnamic acid were found to be characteristic markers for Ugandan Robusta green coffee beans. Multivariate data analysis of the phenolic and methylxanthine profiles provided preliminary results that allowed showing their potential for the determination of the geographical origin of green coffees. Linear discriminant analysis (LDA) and partial least-squares discriminant analysis (PLS-DA) provided classification models that correctly identified all authentic Robusta green coffee beans from Cameroon and Vietnam and 94% of those from Indonesia. Moreover, PLS-DA afforded independent models for Robusta samples from these three countries with sensitivities and specificities of classifications close to 100% and for Arabica samples from America and

  17. Increasing imputation and prediction accuracy for Chinese Holsteins using joint Chinese-Nordic reference population

    DEFF Research Database (Denmark)

    Ma, Peipei; Lund, Mogens Sandø; Ding, X

    2015-01-01

    This study investigated the effect of including Nordic Holsteins in the reference population on the imputation accuracy and prediction accuracy for Chinese Holsteins. The data used in this study include 85 Chinese Holstein bulls genotyped with both 54K chip and 777K (HD) chip, 2862 Chinese cows g...... to increase reference population rather than increasing marker density......This study investigated the effect of including Nordic Holsteins in the reference population on the imputation accuracy and prediction accuracy for Chinese Holsteins. The data used in this study include 85 Chinese Holstein bulls genotyped with both 54K chip and 777K (HD) chip, 2862 Chinese cows...... in Chinese Holstein were assessed. The allele correct rate increased around 2.7 and 1.7% in imputation from the 54K to the HD marker data for Chinese Holstein bulls and cows, respectively, when the Nordic HD-genotyped bulls were included in the reference data for imputation. However, the prediction accuracy...

  18. Missing value imputation for microarray gene expression data using histone acetylation information

    Directory of Open Access Journals (Sweden)

    Feng Jihua

    2008-05-01

    Full Text Available Abstract Background It is an important pre-processing step to accurately estimate missing values in microarray data, because complete datasets are required in numerous expression profile analysis in bioinformatics. Although several methods have been suggested, their performances are not satisfactory for datasets with high missing percentages. Results The paper explores the feasibility of doing missing value imputation with the help of gene regulatory mechanism. An imputation framework called histone acetylation information aided imputation method (HAIimpute method is presented. It incorporates the histone acetylation information into the conventional KNN(k-nearest neighbor and LLS(local least square imputation algorithms for final prediction of the missing values. The experimental results indicated that the use of acetylation information can provide significant improvements in microarray imputation accuracy. The HAIimpute methods consistently improve the widely used methods such as KNN and LLS in terms of normalized root mean squared error (NRMSE. Meanwhile, the genes imputed by HAIimpute methods are more correlated with the original complete genes in terms of Pearson correlation coefficients. Furthermore, the proposed methods also outperform GOimpute, which is one of the existing related methods that use the functional similarity as the external information. Conclusion We demonstrated that the using of histone acetylation information could greatly improve the performance of the imputation especially at high missing percentages. This idea can be generalized to various imputation methods to facilitate the performance. Moreover, with more knowledge accumulated on gene regulatory mechanism in addition to histone acetylation, the performance of our approach can be further improved and verified.

  19. Predictors of clinical outcome in pediatric oligodendroglioma: meta-analysis of individual patient data and multiple imputation.

    Science.gov (United States)

    Wang, Kevin Yuqi; Vankov, Emilian R; Lin, Doris Da May

    2017-12-01

    OBJECTIVE Oligodendroglioma is a rare primary CNS neoplasm in the pediatric population, and only a limited number of studies in the literature have characterized this entity. Existing studies are limited by small sample sizes and discrepant interstudy findings in identified prognostic factors. In the present study, the authors aimed to increase the statistical power in evaluating for potential prognostic factors of pediatric oligodendrogliomas and sought to reconcile the discrepant findings present among existing studies by performing an individual-patient-data (IPD) meta-analysis and using multiple imputation to address data not directly available from existing studies. METHODS A systematic search was performed, and all studies found to be related to pediatric oligodendrogliomas and associated outcomes were screened for inclusion. Each study was searched for specific demographic and clinical characteristics of each patient and the duration of event-free survival (EFS) and overall survival (OS). Given that certain demographic and clinical information of each patient was not available within all studies, a multivariable imputation via chained equations model was used to impute missing data after the mechanism of missing data was determined. The primary end points of interest were hazard ratios for EFS and OS, as calculated by the Cox proportional-hazards model. Both univariate and multivariate analyses were performed. The multivariate model was adjusted for age, sex, tumor grade, mixed pathologies, extent of resection, chemotherapy, radiation therapy, tumor location, and initial presentation. A p value of less than 0.05 was considered statistically significant. RESULTS A systematic search identified 24 studies with both time-to-event and IPD characteristics available, and a total of 237 individual cases were available for analysis. A median of 19.4% of the values among clinical, demographic, and outcome variables in the compiled 237 cases were missing. Multivariate

  20. Imputation-based strategies for clinical trial longitudinal data with nonignorable missing values

    Science.gov (United States)

    Yang, Xiaowei; Li, Jinhui; Shoptaw, Steven

    2011-01-01

    SUMMARY Biomedical research is plagued with problems of missing data, especially in clinical trials of medical and behavioral therapies adopting longitudinal design. After a literature review on modeling incomplete longitudinal data based on full-likelihood functions, this paper proposes a set of imputation-based strategies for implementing selection, pattern-mixture, and shared-parameter models for handling intermittent missing values and dropouts that are potentially nonignorable according to various criteria. Within the framework of multiple partial imputation, intermittent missing values are first imputed several times; then, each partially imputed data set is analyzed to deal with dropouts with or without further imputation. Depending on the choice of imputation model or measurement model, there exist various strategies that can be jointly applied to the same set of data to study the effect of treatment or intervention from multi-faceted perspectives. For illustration, the strategies were applied to a data set with continuous repeated measures from a smoking cessation clinical trial. PMID:18205247

  1. First Use of Multiple Imputation with the National Tuberculosis Surveillance System

    Directory of Open Access Journals (Sweden)

    Christopher Vinnard

    2013-01-01

    Full Text Available Aims. The purpose of this study was to compare methods for handling missing data in analysis of the National Tuberculosis Surveillance System of the Centers for Disease Control and Prevention. Because of the high rate of missing human immunodeficiency virus (HIV infection status in this dataset, we used multiple imputation methods to minimize the bias that may result from less sophisticated methods. Methods. We compared analysis based on multiple imputation methods with analysis based on deleting subjects with missing covariate data from regression analysis (case exclusion, and determined whether the use of increasing numbers of imputed datasets would lead to changes in the estimated association between isoniazid resistance and death. Results. Following multiple imputation, the odds ratio for initial isoniazid resistance and death was 2.07 (95% CI 1.30, 3.29; with case exclusion, this odds ratio decreased to 1.53 (95% CI 0.83, 2.83. The use of more than 5 imputed datasets did not substantively change the results. Conclusions. Our experience with the National Tuberculosis Surveillance System dataset supports the use of multiple imputation methods in epidemiologic analysis, but also demonstrates that close attention should be paid to the potential impact of missing covariates at each step of the analysis.

  2. Missing Value Imputation Based on Gaussian Mixture Model for the Internet of Things

    Directory of Open Access Journals (Sweden)

    Xiaobo Yan

    2015-01-01

    Full Text Available This paper addresses missing value imputation for the Internet of Things (IoT. Nowadays, the IoT has been used widely and commonly by a variety of domains, such as transportation and logistics domain and healthcare domain. However, missing values are very common in the IoT for a variety of reasons, which results in the fact that the experimental data are incomplete. As a result of this, some work, which is related to the data of the IoT, can’t be carried out normally. And it leads to the reduction in the accuracy and reliability of the data analysis results. This paper, for the characteristics of the data itself and the features of missing data in IoT, divides the missing data into three types and defines three corresponding missing value imputation problems. Then, we propose three new models to solve the corresponding problems, and they are model of missing value imputation based on context and linear mean (MCL, model of missing value imputation based on binary search (MBS, and model of missing value imputation based on Gaussian mixture model (MGI. Experimental results showed that the three models can improve the accuracy, reliability, and stability of missing value imputation greatly and effectively.

  3. Variable Selection in the Presence of Missing Data: Imputation-based Methods.

    Science.gov (United States)

    Zhao, Yize; Long, Qi

    2017-01-01

    Variable selection plays an essential role in regression analysis as it identifies important variables that associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid used under the assumptions of missing at random (MAR) and missing completely at random (MCAR), largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combine variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under-developed and offers fertile ground for further research.

  4. Imputation of genotypes in Danish purebred and two-way crossbred pigs using low-density panels

    DEFF Research Database (Denmark)

    Xiang, Tao; Ma, Peipei; Ostersen, Tage

    2015-01-01

    that could explain the differences observed. Results Genotype imputation performs as well in crossbred animals as in purebred animals when both parental breeds are included in the reference population. When the size of the reference population is very large, it is not necessary to use a reference population...... that combines the two breeds to impute the genotypes of purebred animals because a within-breed reference population can provide a very high level of imputation accuracy (correct rate ≥ 0.99, correlation ≥ 0.95). However, to ensure that similar imputation accuracies are obtained for crossbred animals......, a reference population that combines both parental purebred animals is required. Imputation accuracies are higher when a larger proportion of haplotypes are shared between the reference population and the validation (imputed) populations. Conclusions The results from both real data and pedigree...

  5. Binary variable multiple-model multiple imputation to address missing data mechanism uncertainty: application to a smoking cessation trial.

    Science.gov (United States)

    Siddique, Juned; Harel, Ofer; Crespi, Catherine M; Hedeker, Donald

    2014-07-30

    The true missing data mechanism is never known in practice. We present a method for generating multiple imputations for binary variables, which formally incorporates missing data mechanism uncertainty. Imputations are generated from a distribution of imputation models rather than a single model, with the distribution reflecting subjective notions of missing data mechanism uncertainty. Parameter estimates and standard errors are obtained using rules for nested multiple imputation. Using simulation, we investigate the impact of missing data mechanism uncertainty on post-imputation inferences and show that incorporating this uncertainty can increase the coverage of parameter estimates. We apply our method to a longitudinal smoking cessation trial where nonignorably missing data were a concern. Our method provides a simple approach for formalizing subjective notions regarding nonresponse and can be implemented using existing imputation software. Copyright © 2014 John Wiley & Sons, Ltd.

  6. Development of a Geographic Information System-Based Decision Support Tool for Evaluating Windfarm Sitings in Great Lakes Aquatic Habitats

    Energy Technology Data Exchange (ETDEWEB)

    Wehrly, Kevin E. [Michigan Dept. Natural Resources and Environment, Lansing, MI (United States); Rutherford, Edward S. [Great Lakes Environmental Research Lab., Ann Harbor, MI (United States); Wang, Lizhu [Michigan Dept. Natural Resources and Environment, Lansing, MI (United States); Breck, Jason [Univ. of Michigan, Ann Arbor, MI (United States). School of Natural Resources and Environment (UM-SNRE); Mason, Lacey [Univ. of Michigan, Ann Arbor, MI (United States). School of Natural Resources and Environment (UM-SNRE); Nelson, Scott [USGS Great Lakes Science Center, Ann Arbor, MI (United States)

    2011-07-31

    As an outcome of our research project, we developed software and data for the Lakebed Alteration Decision Support Tool (LADST), a web-based decision support program to assist resource managers in making siting decisions for offshore wind farms (as well as other lakebed-altering projects) in the United States' waters of the Great Lakes. Users of the LADST can create their own offshore wind farm suitability maps, based upon suitability criteria of their own choosing by visiting a public web site. The LADST can be used to represent the different priorities or values of different Great Lakes stakeholders for wind farm siting, as well as the different suitability requirements of wind farms (or different types of development projects) in a single suitability analysis system. The LADST makes this type of customized suitability analysis easily accessible to users who have no specialized software or experience with geographic information systems (GIS). It also may increase the transparency of the siting and permitting process for offshore wind farms, as it makes the suitability analysis equally accessible to resource managers, wind farm developers, and concerned citizens.

  7. Examining spatially varying relationships between land use and water quality using geographically weighted regression I: model design and evaluation.

    Science.gov (United States)

    Tu, Jun; Xia, Zong-Guo

    2008-12-15

    Traditional regression techniques such as ordinary least squares (OLS) can hide important local variations in the model parameters, and are not able to deal with spatial autocorrelations existing in the variables. A recently developed technique, geographically weighted regression (GWR), is used to examine the relationships between land use and water quality in eastern Massachusetts, USA. GWR models make great improvements of model performance over OLS models, which is proved by F-test and comparisons of model R2 and corrected Akaike Information Criterion (AICc) from both GWR and OLS. GWR models also improve the reliabilities of the relationships by reducing spatial autocorrelations. The application of GWR models finds that the relationships between land use and water quality are not constant over space but show great spatial non-stationarity. GWR models are able to reveal the information previously ignored by OLS models on the local causes of water pollution, and so improve the model ability to explain local situation of water quality. The results of this study suggest that GWR technique has the potential to serve as a useful tool for environmental research and management at watershed, regional, national and even global scales.

  8. Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework.

    Science.gov (United States)

    Voillet, Valentin; Besse, Philippe; Liaubet, Laurence; San Cristobal, Magali; González, Ignacio

    2016-10-03

    In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue, we propose a multiple imputation (MI) approach in a multivariate framework. In this study, we focus on multiple factor analysis (MFA) as a tool to compare and integrate multiple layers of information. MI involves filling the missing rows with plausible values, resulting in M completed datasets. MFA is then applied to each completed dataset to produce M different configurations (the matrices of coordinates of individuals). Finally, the M configurations are combined to yield a single consensus solution. We assessed the performance of our method, named MI-MFA, on two real omics datasets. Incomplete artificial datasets with different patterns of missingness were created from these data. The MI-MFA results were compared with two other approaches i.e., regularized iterative MFA (RI-MFA) and mean variable imputation (MVI-MFA). For each configuration resulting from these three strategies, the suitability of the solution was determined against the true MFA configuration obtained from the original data and a comprehensive graphical comparison showing how the MI-, RI- or MVI-MFA configurations diverge from the true configuration was produced. Two approaches i.e., confidence ellipses and convex hulls, to visualize and assess the uncertainty due to missing values were also described. We showed how the areas of ellipses and convex hulls increased with the number of missing individuals. A free and easy-to-use code was proposed to implement the MI-MFA method in the R statistical environment. We believe that MI-MFA provides a useful and attractive method for estimating the coordinates of individuals on the first MFA components despite missing rows. MI-MFA configurations were close to the true

  9. GeneImp: Fast Imputation to Large Reference Panels Using Genotype Likelihoods from Ultralow Coverage Sequencing.

    Science.gov (United States)

    Spiliopoulou, Athina; Colombo, Marco; Orchard, Peter; Agakov, Felix; McKeigue, Paul

    2017-05-01

    We address the task of genotype imputation to a dense reference panel given genotype likelihoods computed from ultralow coverage sequencing as inputs. In this setting, the data have a high-level of missingness or uncertainty, and are thus more amenable to a probabilistic representation. Most existing imputation algorithms are not well suited for this situation, as they rely on prephasing for computational efficiency, and, without definite genotype calls, the prephasing task becomes computationally expensive. We describe GeneImp, a program for genotype imputation that does not require prephasing and is computationally tractable for whole-genome imputation. GeneImp does not explicitly model recombination, instead it capitalizes on the existence of large reference panels-comprising thousands of reference haplotypes-and assumes that the reference haplotypes can adequately represent the target haplotypes over short regions unaltered. We validate GeneImp based on data from ultralow coverage sequencing (0.5×), and compare its performance to the most recent version of BEAGLE that can perform this task. We show that GeneImp achieves imputation quality very close to that of BEAGLE, using one to two orders of magnitude less time, without an increase in memory complexity. Therefore, GeneImp is the first practical choice for whole-genome imputation to a dense reference panel when prephasing cannot be applied, for instance, in datasets produced via ultralow coverage sequencing. A related future application for GeneImp is whole-genome imputation based on the off-target reads from deep whole-exome sequencing. Copyright © 2017 by the Genetics Society of America.

  10. Effects of imputation on correlation: implications for analysis of mass spectrometry data from multiple biological matrices.

    Science.gov (United States)

    Taylor, Sandra L; Ruhaak, L Renee; Kelly, Karen; Weiss, Robert H; Kim, Kyoungmi

    2017-03-01

    With expanded access to, and decreased costs of, mass spectrometry, investigators are collecting and analyzing multiple biological matrices from the same subject such as serum, plasma, tissue and urine to enhance biomarker discoveries, understanding of disease processes and identification of therapeutic targets. Commonly, each biological matrix is analyzed separately, but multivariate methods such as MANOVAs that combine information from multiple biological matrices are potentially more powerful. However, mass spectrometric data typically contain large amounts of missing values, and imputation is often used to create complete data sets for analysis. The effects of imputation on multiple biological matrix analyses have not been studied. We investigated the effects of seven imputation methods (half minimum substitution, mean substitution, k-nearest neighbors, local least squares regression, Bayesian principal components analysis, singular value decomposition and random forest), on the within-subject correlation of compounds between biological matrices and its consequences on MANOVA results. Through analysis of three real omics data sets and simulation studies, we found the amount of missing data and imputation method to substantially change the between-matrix correlation structure. The magnitude of the correlations was generally reduced in imputed data sets, and this effect increased with the amount of missing data. Significant results from MANOVA testing also were substantially affected. In particular, the number of false positives increased with the level of missing data for all imputation methods. No one imputation method was universally the best, but the simple substitution methods (Half Minimum and Mean) consistently performed poorly. © The Author 2016. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  11. Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets

    OpenAIRE

    Jostins, Luke; Morley, Katherine I.; Barrett, Jeffrey C.

    2011-01-01

    Imputation allows the inference of unobserved genotypes in low-density data sets, and is often used to test for disease association at variants that are poorly captured by standard genotyping chips (such as low-frequency variants). Although much effort has gone into developing the best imputation algorithms, less is known about the effects of reference set choice on imputation accuracy. We assess the improvements afforded by increases in reference size and diversity, specifically comparing th...

  12. Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Guldbrandtsen, Bernt; Sahana, Goutam

    2014-01-01

    autosome 29 using 387,436 bi-allelic variants and 13,612 SNP markers from the bovine HD panel. Results A combined breed reference population led to higher imputation accuracies than did a single breed reference. The highest accuracy of imputation for all three test breeds was achieved when using BEAGLE...... with un-phased reference data (mean genotype correlations of 0.90, 0.89 and 0.87 for Holstein, Jersey and Nordic Red respectively) but IMPUTE2 with un-phased reference data gave similar accuracies for Holsteins and Nordic Red. Pre-phasing the reference data only lead to a minor decrease in the imputation...

  13. Multiple regression based imputation for individualizing template human model from a small number of measured dimensions.

    Science.gov (United States)

    Nohara, Ryuki; Endo, Yui; Murai, Akihiko; Takemura, Hiroshi; Kouchi, Makiko; Tada, Mitsunori

    2016-08-01

    Individual human models are usually created by direct 3D scanning or deforming a template model according to the measured dimensions. In this paper, we propose a method to estimate all the necessary dimensions (full set) for the human model individualization from a small number of measured dimensions (subset) and human dimension database. For this purpose, we solved multiple regression equation from the dimension database given full set dimensions as the objective variable and subset dimensions as the explanatory variables. Thus, the full set dimensions are obtained by simply multiplying the subset dimensions to the coefficient matrix of the regression equation. We verified the accuracy of our method by imputing hand, foot, and whole body dimensions from their dimension database. The leave-one-out cross validation is employed in this evaluation. The mean absolute errors (MAE) between the measured and the estimated dimensions computed from 4 dimensions (hand length, breadth, middle finger breadth at proximal, and middle finger depth at proximal) in the hand, 2 dimensions (foot length, breadth, and lateral malleolus height) in the foot, and 1 dimension (height) and weight in the whole body are computed. The average MAE of non-measured dimensions were 4.58% in the hand, 4.42% in the foot, and 3.54% in the whole body, while that of measured dimensions were 0.00%.

  14. Dealing with missing data in a multi-question depression scale: a comparison of imputation methods

    Directory of Open Access Journals (Sweden)

    Stuart Heather

    2006-12-01

    Full Text Available Abstract Background Missing data present a challenge to many research projects. The problem is often pronounced in studies utilizing self-report scales, and literature addressing different strategies for dealing with missing data in such circumstances is scarce. The objective of this study was to compare six different imputation techniques for dealing with missing data in the Zung Self-reported Depression scale (SDS. Methods 1580 participants from a surgical outcomes study completed the SDS. The SDS is a 20 question scale that respondents complete by circling a value of 1 to 4 for each question. The sum of the responses is calculated and respondents are classified as exhibiting depressive symptoms when their total score is over 40. Missing values were simulated by randomly selecting questions whose values were then deleted (a missing completely at random simulation. Additionally, a missing at random and missing not at random simulation were completed. Six imputation methods were then considered; 1 multiple imputation, 2 single regression, 3 individual mean, 4 overall mean, 5 participant's preceding response, and 6 random selection of a value from 1 to 4. For each method, the imputed mean SDS score and standard deviation were compared to the population statistics. The Spearman correlation coefficient, percent misclassified and the Kappa statistic were also calculated. Results When 10% of values are missing, all the imputation methods except random selection produce Kappa statistics greater than 0.80 indicating 'near perfect' agreement. MI produces the most valid imputed values with a high Kappa statistic (0.89, although both single regression and individual mean imputation also produced favorable results. As the percent of missing information increased to 30%, or when unbalanced missing data were introduced, MI maintained a high Kappa statistic. The individual mean and single regression method produced Kappas in the 'substantial agreement' range

  15. SparRec: An effective matrix completion framework of missing data imputation for GWAS

    Science.gov (United States)

    Jiang, Bo; Ma, Shiqian; Causey, Jason; Qiao, Linbo; Hardin, Matthew Price; Bitts, Ian; Johnson, Daniel; Zhang, Shuzhong; Huang, Xiuzhen

    2016-10-01

    Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.

  16. Evaluation of genetic and geographical diversity of garlic (Allium sativum L. ecotypes of Iran using ISSR and M13 molecular markers

    Directory of Open Access Journals (Sweden)

    M. Fakhrfeshani

    2016-05-01

    Full Text Available Garlic (Allium sativum L. as one of the most valuable industrial and pharmaceutical plants has been studied from many aspects because of its importance. But there is not any sufficient and reliable information about its distribution and classification. So its types are categorized according to traditional, local or geographical names or some visual traits. The most important reason is the sterility of garlic and its flowering inability. This study, as the first report of using ISSR and M13 markers on Iranian garlic ecotypes, was performed to evaluate the genetic diversity and relationship and distinguish the repetitious clones among populations from Iran. According to our results, 26 studied clones were categorized as 24 different genotypes with a possibility of classifying them into four groups coincide with their geographical gathering zone. Group one contains ecotypes from north and western North of Hamadan province and group two contains clones from west and south west of Hamadan province, central, east and south east of Iran. Sample from Ahvaz was the only member of group three and ecotypes from North and eastern north of Iran formed group four.

  17. A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers

    Directory of Open Access Journals (Sweden)

    Concepción Crespo Turrado

    2015-12-01

    Full Text Available Nowadays, data collection is a key process in the study of electrical power networks when searching for harmonics and a lack of balance among phases. In this context, the lack of data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, and current in each phase and power factor adversely affects any time series study performed. When this occurs, a data imputation process must be accomplished in order to substitute the data that is missing for estimated values. This paper presents a novel missing data imputation method based on multivariate adaptive regression splines (MARS and compares it with the well-known technique called multivariate imputation by chained equations (MICE. The results obtained demonstrate how the proposed method outperforms the MICE algorithm.

  18. An Improved Generalized-Trend-Diffusion-Based Data Imputation for Steel Industry

    Directory of Open Access Journals (Sweden)

    Ying Liu

    2013-01-01

    Full Text Available Integrality and validity of industrial data are the fundamental factors in the domain of data-driven modeling. Aiming at the data missing problem of gas flow in steel industry, an improved Generalized-Trend-Diffusion (iGTD algorithm is proposed in this study, where in particular it considers the sort of problem with data properties of consecutively missing and small samples. And, the imputation accuracy can be greatly increased by the proposed Gaussian membership-based GTD which expands the useful knowledge of data samples. In addition, the imputation order is further discussed to enhance the sequential forecasting accuracy of gas flow. To verify the effectiveness of the proposed method, a series of experiments that consists of three categories of data features in the gas system is presented, and the results indicate that this method is comprehensively better for the imputation of the periodical-like data and the time-series-like data.

  19. Multiple imputation for item scores when test data are factorially complex.

    Science.gov (United States)

    van Ginkel, Joost R; van der Ark, L Andries; Sijtsma, Klaas

    2007-11-01

    Multiple imputation under a two-way model with error is a simple and effective method that has been used to handle missing item scores in unidimensional test and questionnaire data. Extensions of this method to multidimensional data are proposed. A simulation study is used to investigate whether these extensions produce biased estimates of important statistics in multidimensional data, and to compare them with lower benchmark listwise deletion, two-way with error and multivariate normal imputation. The new methods produce smaller bias in several psychometrically interesting statistics than the existing methods of two-way with error and multivariate normal imputation. One of these new methods clearly is preferable for handling missing item scores in multidimensional test data.

  20. A multi breed reference improves genotype imputation accuracy in Nordic Red cattle

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Ma, Peipei; Lund, Mogens Sandø

    2012-01-01

    the subsequent effect of the imputed HD data on the reliability of genomic prediction. HD genotype data was available for 247 Danish, 210 Swedish and 249 Finnish Red bulls, and for 546 Holstein bulls. A subset 50 of bulls from each of the Nordic Red populations was selected for validation. After quality control...... 612,615 SNPs on chromosome 1-29 remained for analysis. Validation was done by masking markers in true HD data and imputing them using Beagle v. 3.3 and a reference group of either national Red, combined Red or combined Red and Holstein bulls. Results show a decrease in allele error rate from 2.64, 1......The objective of this study was to investigate if a multi breed reference would improve genotype imputation accuracy from 50K to high density (HD) single nucleotide polymorphism (SNP) marker data in Nordic Red Dairy Cattle, compared to using only a single breed reference, and to check...

  1. A multi breed reference improves genotype imputation accuracy in Nordic Red cattle

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Ma, Peipei; Lund, Mogens Sandø

    the subsequent effect of the imputed HD data on the reliability of genomic prediction. HD genotype data was available for 247 Danish, 210 Swedish and 249 Finnish Red bulls, and for 546 Holstein bulls. A subset 50 of bulls from each of the Nordic Red populations was selected for validation. After quality control...... 612,615 SNPs on chromosome 1-29 remained for analysis. Validation was done by masking markers in true HD data and imputing them using Beagle v. 3.3 and a reference group of either national Red, combined Red or combined Red and Holstein bulls. Results show a decrease in allele error rate from 2.64, 1......The objective of this study was to investigate if a multi breed reference would improve genotype imputation accuracy from 50K to high density (HD) single nucleotide polymorphism (SNP) marker data in Nordic Red Dairy Cattle, compared to using only a single breed reference, and to check...

  2. On multivariate imputation and forecasting of decadal wind speed missing data.

    Science.gov (United States)

    Wesonga, Ronald

    2015-01-01

    This paper demonstrates the application of multiple imputations by chained equations and time series forecasting of wind speed data. The study was motivated by the high prevalence of missing wind speed historic data. Findings based on the fully conditional specification under multiple imputations by chained equations, provided reliable wind speed missing data imputations. Further, the forecasting model shows, the smoothing parameter, alpha (0.014) close to zero, confirming that recent past observations are more suitable for use to forecast wind speeds. The maximum decadal wind speed for Entebbe International Airport was estimated to be 17.6 metres per second at a 0.05 level of significance with a bound on the error of estimation of 10.8 metres per second. The large bound on the error of estimations confirms the dynamic tendencies of wind speed at the airport under study.

  3. A suggested approach for imputation of missing dietary data for young children in daycare

    Science.gov (United States)

    Stevens, June; Ou, Fang-Shu; Truesdale, Kimberly P.; Zeng, Donglin; Vaughn, Amber E.; Pratt, Charlotte; Ward, Dianne S.

    2015-01-01

    Background Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult. Objective The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method. Design Data were from children aged 2-5 years in the My Parenting SOS project (n=308; 870 24-h diet recalls). Mixed models were used to simultaneously predict breakfast, dinner, and evening snacks (B+D+ES); lunch; and daytime snacks for all children after adjusting for age, sex, and body mass index (BMI). From these models, we imputed the missing weekday daycare lunches by interpolation using the mean lunch to B+D+ES [L/(B+D+ES)] ratio among non-daycare children on weekdays and the L/(B+D+ES) ratio for all children on weekends. Daytime snack data were used to impute snacks. Results The reported mean (± standard deviation) weekday intake was lower for daycare children [725 (±324) kcal] compared to non-daycare children [1,048 (±463) kcal]. Weekend intake for all children was 1,173 (±427) kcal. After imputation, weekday caloric intake for daycare children was 1,230 (±409) kcal. Daily intakes that included imputed data were associated with age and sex but not with BMI. Conclusion This work indicates that imputation is a promising method for improving the precision of daily nutrient data from young children. PMID:26689313

  4. A suggested approach for imputation of missing dietary data for young children in daycare

    Directory of Open Access Journals (Sweden)

    June Stevens

    2015-12-01

    Full Text Available Background: Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult. Objective: The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method. Design: Data were from children aged 2-5 years in the My Parenting SOS project (n=308; 870 24-h diet recalls. Mixed models were used to simultaneously predict breakfast, dinner, and evening snacks (B+D+ES; lunch; and daytime snacks for all children after adjusting for age, sex, and body mass index (BMI. From these models, we imputed the missing weekday daycare lunches by interpolation using the mean lunch to B+D+ES [L/(B+D+ES] ratio among non-daycare children on weekdays and the L/(B+D+ES ratio for all children on weekends. Daytime snack data were used to impute snacks. Results: The reported mean (± standard deviation weekday intake was lower for daycare children [725 (±324 kcal] compared to non-daycare children [1,048 (±463 kcal]. Weekend intake for all children was 1,173 (±427 kcal. After imputation, weekday caloric intake for daycare children was 1,230 (±409 kcal. Daily intakes that included imputed data were associated with age and sex but not with BMI. Conclusion: This work indicates that imputation is a promising method for improving the precision of daily nutrient data from young children.

  5. A suggested approach for imputation of missing dietary data for young children in daycare.

    Science.gov (United States)

    Stevens, June; Ou, Fang-Shu; Truesdale, Kimberly P; Zeng, Donglin; Vaughn, Amber E; Pratt, Charlotte; Ward, Dianne S

    2015-01-01

    Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult. The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method. Data were from children aged 2-5 years in the My Parenting SOS project (n=308; 870 24-h diet recalls). Mixed models were used to simultaneously predict breakfast, dinner, and evening snacks (B+D+ES); lunch; and daytime snacks for all children after adjusting for age, sex, and body mass index (BMI). From these models, we imputed the missing weekday daycare lunches by interpolation using the mean lunch to B+D+ES [L/(B+D+ES)] ratio among non-daycare children on weekdays and the L/(B+D+ES) ratio for all children on weekends. Daytime snack data were used to impute snacks. The reported mean (± standard deviation) weekday intake was lower for daycare children [725 (±324) kcal] compared to non-daycare children [1,048 (±463) kcal]. Weekend intake for all children was 1,173 (±427) kcal. After imputation, weekday caloric intake for daycare children was 1,230 (±409) kcal. Daily intakes that included imputed data were associated with age and sex but not with BMI. This work indicates that imputation is a promising method for improving the precision of daily nutrient data from young children.

  6. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

    Directory of Open Access Journals (Sweden)

    Ward Judson A

    2013-01-01

    Full Text Available Abstract Background Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry. Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Results Genotyping by Sequencing (GBS was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs linked these results to published maps for cross-validation and map comparison. Conclusions GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation

  7. FISH: fast and accurate diploid genotype imputation via segmental hidden Markov model.

    Science.gov (United States)

    Zhang, Lei; Pei, Yu-Fang; Fu, Xiaoying; Lin, Yong; Wang, Yu-Ping; Deng, Hong-Wen

    2014-07-01

    Fast and accurate genotype imputation is necessary for facilitating gene-mapping studies, especially with the ever increasing numbers of both common and rare variants generated by high-throughput-sequencing experiments. However, most of the existing imputation approaches suffer from either inaccurate results or heavy computational demand. In this article, aiming to perform fast and accurate genotype-imputation analysis, we propose a novel, fast and yet accurate method to impute diploid genotypes. Specifically, we extend a hidden Markov model that is widely used to describe haplotype structures. But we model hidden states onto single reference haplotypes rather than onto pairs of haplotypes. Consequently the computational complexity is linear to size of reference haplotypes. We further develop an algorithm 'merge-and-recover (MAR)' to speed up the calculation. Working on compact representation of segmental reference haplotypes, the MAR algorithm always calculates an exact form of transition probabilities regardless of partition of segments. Both simulation studies and real-data analyses demonstrated that our proposed method was comparable to most of the existing popular methods in terms of imputation accuracy, but was much more efficient in terms of computation. The MAR algorithm can further speed up the calculation by several folds without loss of accuracy. The proposed method will be useful in large-scale imputation studies with a large number of reference subjects. The implemented multi-threading software FISH is freely available for academic use at https://sites.google.com/site/lzhanghomepage/FISH. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. USING A GEOGRAPHIC INFORMATION SYSTEM (GIS IN EVALUATING THE ACCESSIBILITY OF HEALTH FACILITIES FOR BREAST CANCER PATIENTS IN PENANG STATE, MALAYSIA

    Directory of Open Access Journals (Sweden)

    Narimah Samat

    2010-01-01

    Full Text Available Geographical Information Systems (GIS have been used widely in manydeveloped countries to map health-related events, and the results areused for planning of health services (such as locating screening centres and in assessing clusters of cases to help identify possible aetiological factors. In the United States in particular, cancer cases notified to registries are routinely entered into GIS, and trends are monitored over time. The Penang Cancer Registry (PCR is a population-based registry that collects and collates data of all cancer cases diagnosed in Penang as well as cancer cases diagnosed elsewhere of Penang residents. Cancer case reports are generated by providing counts of cases based on Penang home addresses. Mapping of cases using information from the PCR gives a fairly complete picture of cancer cases from the Penang state, and any clusters of cases can be readily identified. This studydemonstrates the application of GIS in the mapping and evaluation ofthe accessibility to health facilities of breast cancer cases in the Penang state. The ArcGIS 9.3 software was used to map and evaluate the spatial clustering of cancer cases, and the Network Analysis function was used to determine the distance between breast cancer cases and health facilities. Although the Penang state is considered one of the more developed states in Malaysia with good health facilities, in some parts of the state, health facilities are quite inaccessible. The study results suggest that new health facilities with screening and treatment services should be built in the south of the Seberang Perai and Balik Pulau areas. In addition, this study provides the opportunity to include geographical factors in examining cancer data, which gives a fresh outlook on the issue.

  9. A reference panel of 64,976 haplotypes for genotype imputation

    Science.gov (United States)

    McCarthy, Shane; Das, Sayantan; Kretzschmar, Warren; Delaneau, Olivier; Wood, Andrew R.; Teumer, Alexander; Kang, Hyun Min; Fuchsberger, Christian; Danecek, Petr; Sharp, Kevin; Luo, Yang; Sidore, Carlo; Kwong, Alan; Timpson, Nicholas; Koskinen, Seppo; Vrieze, Scott; Scott, Laura J.; Zhang, He; Mahajan, Anubha; Veldink, Jan; Peters, Ulrike; Pato, Carlos; van Duijn, Cornelia M.; Gillies, Christopher E.; Gandin, Ilaria; Mezzavilla, Massimo; Gilly, Arthur; Cocca, Massimiliano; Traglia, Michela; Angius, Andrea; Barrett, Jeffrey; Boomsma, Dorret I.; Branham, Kari; Breen, Gerome; Brummet, Chad; Busonero, Fabio; Campbell, Harry; Chan, Andrew; Chen, Sai; Chew, Emily; Collins, Francis S.; Corbin, Laura; Davey Smith, George; Dedoussis, George; Dorr, Marcus; Farmaki, Aliki-Eleni; Ferrucci, Luigi; Forer, Lukas; Fraser, Ross M.; Gabriel, Stacey; Levy, Shawn; Groop, Leif; Harrison, Tabitha; Hattersley, Andrew; Holmen, Oddgeir L.; Hveem, Kristian; Kretzler, Matthias; Lee, James; McGue, Matt; Meitinger, Thomas; Melzer, David; Min, Josine; Mohlke, Karen L.; Vincent, John; Nauck, Matthias; Nickerson, Deborah; Palotie, Aarno; Pato, Michele; Pirastu, Nicola; McInnis, Melvin; Richards, Brent; Sala, Cinzia; Salomaa, Veikko; Schlessinger, David; Schoenheer, Sebastian; Slagboom, P Eline; Small, Kerrin; Spector, Timothy; Stambolian, Dwight; Tuke, Marcus; Tuomilehto, Jaakko; Van den Berg, Leonard; Van Rheenen, Wouter; Volker, Uwe; Wijmenga, Cisca; Toniolo, Daniela; Zeggini, Eleftheria; Gasparini, Paolo; Sampson, Matthew G.; Wilson, James F.; Frayling, Timothy; de Bakker, Paul; Swertz, Morris A.; McCarroll, Steven; Kooperberg, Charles; Dekker, Annelot; Altshuler, David; Willer, Cristen; Iacono, William; Ripatti, Samuli; Soranzo, Nicole; Walter, Klaudia; Swaroop, Anand; Cucca, Francesco; Anderson, Carl; Boehnke, Michael; McCarthy, Mark I.; Durbin, Richard; Abecasis, Gonçalo; Marchini, Jonathan

    2017-01-01

    We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1%, a large increase in the number of SNPs tested in association studies and can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently. PMID:27548312

  10. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Bryan N Howie

    2009-06-01

    Full Text Available Genotype imputation methods are now being widely used in the analysis of genome-wide association studies. Most imputation analyses to date have used the HapMap as a reference dataset, but new reference panels (such as controls genotyped on multiple SNP chips and densely typed samples from the 1,000 Genomes Project will soon allow a broader range of SNPs to be imputed with higher accuracy, thereby increasing power. We describe a genotype imputation method (IMPUTE version 2 that is designed to address the challenges presented by these new datasets. The main innovation of our approach is a flexible modelling framework that increases accuracy and combines information across multiple reference panels while remaining computationally feasible. We find that IMPUTE v2 attains higher accuracy than other methods when the HapMap provides the sole reference panel, but that the size of the panel constrains the improvements that can be made. We also find that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%-20% lower than those of the closest competing method. One particularly challenging aspect of next-generation association studies is to integrate information across multiple reference panels genotyped on different sets of SNPs; we show that our approach to this problem has practical advantages over other suggested solutions.

  11. Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis

    NARCIS (Netherlands)

    Eekhout, I.; Wiel, M.A. van de; Heymans, M.W.

    2017-01-01

    Background. Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels

  12. Influence of Imputation and EM Methods on Factor Analysis When Item Nonresponse in Questionnaire Data Is Nonignorable.

    Science.gov (United States)

    Bernaards, Coen A.; Sijtsma, Klaas

    2000-01-01

    Using simulation, studied the influence of each of 12 imputation methods and 2 methods using the EM algorithm on the results of maximum likelihood factor analysis as compared with results from the complete data factor analysis (no missing scores). Discusses why EM methods recovered complete data factor loadings better than imputation methods. (SLD)

  13. Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations

    DEFF Research Database (Denmark)

    Dassonneville, R; Brøndum, Rasmus Froberg; Druet, T

    2011-01-01

    The purpose of this study was to investigate the imputation error and loss of reliability of direct genomic values (DGV) or genomically enhanced breeding values (GEBV) when using genotypes imputed from a 3,000-marker single nucleotide polymorphism (SNP) panel to a 50,000-marker SNP panel. Data co...

  14. Land suitability evaluation for irrigating wheat by Geopedological approach and Geographic Information System: A case study of Qazvin plain, Iran

    Directory of Open Access Journals (Sweden)

    Sayed Roholla Mousavi

    2017-07-01

    Full Text Available Land evaluation, using a scientific method, is essential to recognize the potential and limitation of a given land for specific use in terms of its suitability, and certifies its sustainable use. The soil is such a source that its renewal takes a long time, so effective use of soil and land resources requires a thorough understanding of the effective morphological processes of soil forming in different regions. The current study identified available soil in the area in terms of interpretation of aerial photographs and Geopedological approach. After mapping the geoform area, 61 profiles of the designated area were drilled and sampling was done for all diagnostic horizons. Then, the samples were transported to the laboratory for Physico-chemical analysis. By the end of the profile classification process, which was based on the Soil Survey Staff (2014, the soil map, was prepared by integration of the soil data and the geoform map in ArcGIS software. There are several limiting factors for wheat in Qazvin plain, namely; electric conductivity (EC, gypsum, coarse fragment, soil depth, soil organic carbon (SOC, texture, calcium carbonate and climate. The map of the land units was prepared, and land requirements for the type of utility were calculated. Land suitability evaluation was performed according to FAO. The results showed that land unit’s number 17 and 18 were unsuitable (N1 for irrigating wheat with limiting factors such as; high levels of EC and gypsum in the studied profiles. Moreover, the land unit’s number 10, 20, and 23 are suitable (S1 for the wheat production and have the highest rate of predicted yield.

  15. Land use planning of paddy field using geographic information system and land evaluation in West Lombok, Indonesia

    Directory of Open Access Journals (Sweden)

    Widiatmaka .

    2014-06-01

    Full Text Available Planning analysis to increase rice production either through intensiḀcation of existing paddy Ḁeld area or ex-tensiḀcation in potential land area was conducted in West Lombok Regency, West Nusa Tenggara Province, Indonesia. Existing paddy Ḁeld was delineated using high-resolution data from IKONOS imagery of 2012. Land use and land cover outside existing paddy Ḁeld were interpreted using SPOT-5 imagery of 2012. ἀe Automated Land Evaluation System (ALES was used for land suitability analysis for paddy. ἀe results are interpreted in terms of the potential of paddy Ḁeld intensiḀcation in existing paddy Ḁeld area and the potential of extensiḀcation in land potentially used for paddy Ḁeld. ἀe result of analysis showed that in West Lombok Regency, there are still possible to do intensiḀcation and extensiḀca-tion of paddy Ḁeld to increase rice production in order to improve regional food security.

  16. Evaluation of heavy metals level (arsenic, nickel, mercury and lead effecting on health in drinking water resource of Kohgiluyeh county using geographic information system (GIS

    Directory of Open Access Journals (Sweden)

    Abdolazim Alinejad

    2016-08-01

    Full Text Available This study was conducted to determine the amount of heavy metals (Arsenic, Nickel, Mercury, and Lead in drinking water resource of Kohgiluyeh County using Geographic Information System (GIS. This cross-sectional study was conducted on drinking water resource of Kohgiluyeh County (33 water supplies and 4 heavy metals in 2013. 264 samples were analyzed in this study. The experiments were performed at the laboratory of Water and Wastewater Company based on Standard Method. The Atomic Adsorption was used to evaluate the amount of heavy metals. The results were mapping by Geographic Information System software (GIS 9.3 after processing of parameters. Finally, the data were analyzed by SPSS 16 and Excel 2007. The maximum amount of each heavy metal and its resource were shown as follow: Nickel or Ni (Source of w12, 124ppb, Arsenic or As (w33, 42 ppb, Mercury or Hg (w22 and w30, 96ppb, Lead or Pb (w21, 1553ppb. Also, the GIS maps showed that Lead in the central region was very high, Mercury and Arsenic in the northern region were high and Nickel in the eastern and western regions was high. The Kriging method and Gauss model were introduced as best method for interpolation of these metals. Since the concentration of these heavy metals was higher than standard levels in most drinking water supplies in Kohgiluyeh County and these high levels of heavy metals can cause the adverse effects on human health; therefore, the environmental and geological studies are necessary to identify the pollution resource and elimination and removal of heavy metals

  17. Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies

    Directory of Open Access Journals (Sweden)

    McElwee Joshua

    2009-06-01

    Full Text Available Abstract Background Although high-throughput genotyping arrays have made whole-genome association studies (WGAS feasible, only a small proportion of SNPs in the human genome are actually surveyed in such studies. In addition, various SNP arrays assay different sets of SNPs, which leads to challenges in comparing results and merging data for meta-analyses. Genome-wide imputation of untyped markers allows us to address these issues in a direct fashion. Methods 384 Caucasian American liver donors were genotyped using Illumina 650Y (Ilmn650Y arrays, from which we also derived genotypes from the Ilmn317K array. On these data, we compared two imputation methods: MACH and BEAGLE. We imputed 2.5 million HapMap Release22 SNPs, and conducted GWAS on ~40,000 liver mRNA expression traits (eQTL analysis. In addition, 200 Caucasian American and 200 African American subjects were genotyped using the Affymetrix 500 K array plus a custom 164 K fill-in chip. We then imputed the HapMap SNPs and quantified the accuracy by randomly masking observed SNPs. Results MACH and BEAGLE perform similarly with respect to imputation accuracy. The Ilmn650Y results in excellent imputation performance, and it outperforms Affx500K or Ilmn317K sets. For Caucasian Americans, 90% of the HapMap SNPs were imputed at 98% accuracy. As expected, imputation of poorly tagged SNPs (untyped SNPs in weak LD with typed markers was not as successful. It was more challenging to impute genotypes in the African American population, given (1 shorter LD blocks and (2 admixture with Caucasian populations in this population. To address issue (2, we pooled HapMap CEU and YRI data as an imputation reference set, which greatly improved overall performance. The approximate 40,000 phenotypes scored in these populations provide a path to determine empirically how the power to detect associations is affected by the imputation procedures. That is, at a fixed false discovery rate, the number of cis

  18. Multiple imputation for model checking: completed-data plots with missing and latent data.

    Science.gov (United States)

    Gelman, Andrew; Van Mechelen, Iven; Verbeke, Geert; Heitjan, Daniel F; Meulders, Michel

    2005-03-01

    In problems with missing or latent data, a standard approach is to first impute the unobserved data, then perform all statistical analyses on the completed dataset--corresponding to the observed data and imputed unobserved data--using standard procedures for complete-data inference. Here, we extend this approach to model checking by demonstrating the advantages of the use of completed-data model diagnostics on imputed completed datasets. The approach is set in the theoretical framework of Bayesian posterior predictive checks (but, as with missing-data imputation, our methods of missing-data model checking can also be interpreted as "predictive inference" in a non-Bayesian context). We consider the graphical diagnostics within this framework. Advantages of the completed-data approach include: (1) One can often check model fit in terms of quantities that are of key substantive interest in a natural way, which is not always possible using observed data alone. (2) In problems with missing data, checks may be devised that do not require to model the missingness or inclusion mechanism; the latter is useful for the analysis of ignorable but unknown data collection mechanisms, such as are often assumed in the analysis of sample surveys and observational studies. (3) In many problems with latent data, it is possible to check qualitative features of the model (for example, independence of two variables) that can be naturally formalized with the help of the latent data. We illustrate with several applied examples.

  19. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation

    NARCIS (Netherlands)

    Artigas, Maria Soler; Wain, Louise V.; Miller, Suzanne; Kheirallah, Abdul Kader; Huffman, Jennifer E.; Ntalla, Ioanna; Shrine, Nick; Obeidat, Ma'en; Trochet, Holly; McArdle, Wendy L.; Alves, Alexessander Couto; Hui, Jennie; Zhao, Jing Hua; Joshi, Peter K.; Teumer, Alexander; Albrecht, Eva; Imboden, Medea; Rawal, Rajesh; Lopez, Lorna M.; Marten, Jonathan; Enroth, Stefan; Surakka, Ida; Polasek, Ozren; Lyytikainen, Leo-Pekka; Granell, Raquel; Hysi, Pirro G.; Flexeder, Claudia; Mahajan, Anubha; Beilby, John; Bosse, Yohan; Brandsma, Corry-Anke; Campbell, Harry; Gieger, Christian; Glaeser, Sven; Gonzalez, Juan R.; Grallert, Harald; Hammond, Chris J.; Harris, Sarah E.; Hartikainen, Anna-Liisa; Heliovaara, Markku; Henderson, John; Hocking, Lynne; Horikoshi, Momoko; Hutri-Kahonen, Nina; Ingelsson, Erik; Johansson, Asa; Kemp, John P.; Kolcic, Ivana; Kumar, Ashish; Lind, Lars; Melen, Erik; Musk, Arthur W.; Navarro, Pau; Nickle, David C.; Padmanabhan, Sandosh; Raitakari, Olli T.; Ried, Janina S.; Ripatti, Samuli; Schulz, Holger; Scott, Robert A.; Sin, Don D.; Starr, John M.; Vinuela, Ana; Voelzke, Henry; Wild, Sarah H.; Wright, Alan F.; Zemunik, Tatijana; Jarvis, Deborah L.; Spector, Tim D.; Evans, David M.; Lehtimaki, Terho; Vitart, Veronique; Kahonen, Mika; Gyllensten, Ulf; Rudan, Igor; Deary, Ian J.; Karrasch, Stefan; Probst-Hensch, Nicole M.; Heinrich, Joachim; Stubbe, Beate; Wilson, James F.; Wareham, Nicholas J.; James, Alan L.; Morris, Andrew P.; Jarvelin, Marjo-Riitta; Hayward, Caroline; Sayers, Ian; Strachan, David P.; Hall, Ian P.; Tobin, Martin D.; Deloukas, Panos; Hansell, Anna L.; Hubbard, Richard; Jackson, Victoria E.; Marchini, Jonathan; Pavord, Ian; Thomson, Neil C.; Zeggini, Eleftheria

    2015-01-01

    Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed

  20. Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle

    NARCIS (Netherlands)

    Binsbergen, van R.; Bink, M.C.A.M.; Calus, M.P.L.; Eeuwijk, van F.A.; Hayes, B.J.; Hulsegge, B.; Veerkamp, R.F.

    2014-01-01

    Background The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina

  1. Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy

    NARCIS (Netherlands)

    Bouwman, A.C.; Veerkamp, R.F.

    2014-01-01

    The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chicken

  2. Limitations in Using Multiple Imputation to Harmonize Individual Participant Data for Meta-Analysis.

    Science.gov (United States)

    Siddique, Juned; de Chavez, Peter J; Howe, George; Cruden, Gracelyn; Brown, C Hendricks

    2017-02-27

    Individual participant data (IPD) meta-analysis is a meta-analysis in which the individual-level data for each study are obtained and used for synthesis. A common challenge in IPD meta-analysis is when variables of interest are measured differently in different studies. The term harmonization has been coined to describe the procedure of placing variables on the same scale in order to permit pooling of data from a large number of studies. Using data from an IPD meta-analysis of 19 adolescent depression trials, we describe a multiple imputation approach for harmonizing 10 depression measures across the 19 trials by treating those depression measures that were not used in a study as missing data. We then apply diagnostics to address the fit of our imputation model. Even after reducing the scale of our application, we were still unable to produce accurate imputations of the missing values. We describe those features of the data that made it difficult to harmonize the depression measures and provide some guidelines for using multiple imputation for harmonization in IPD meta-analysis.

  3. The search for stable prognostic models in multiple imputed data sets

    NARCIS (Netherlands)

    Vergouw, D.; Heijmans, M.W.; Peat, G.M.; Kuijpers, T.; Croft, P.R.; de Vet, H.C.W.; van der Horst, H.E.; van der Windt, D.A.W.M.

    2010-01-01

    Background: In prognostic studies model instability and missing data can be troubling factors. Proposed methods for handling these situations are bootstrapping (B) and Multiple imputation (MI). The authors examined the influence of these methods on model composition. Methods: Models were constructed

  4. Learning-Based Adaptive Imputation Methodwith kNN Algorithm for Missing Power Data

    Directory of Open Access Journals (Sweden)

    Minkyung Kim

    2017-10-01

    Full Text Available This paper proposes a learning-based adaptive imputation method (LAI for imputing missing power data in an energy system. This method estimates the missing power data by using the pattern that appears in the collected data. Here, in order to capture the patterns from past power data, we newly model a feature vector by using past data and its variations. The proposed LAI then learns the optimal length of the feature vector and the optimal historical length, which are significant hyper parameters of the proposed method, by utilizing intentional missing data. Based on a weighted distance between feature vectors representing a missing situation and past situation, missing power data are estimated by referring to the k most similar past situations in the optimal historical length. We further extend the proposed LAI to alleviate the effect of unexpected variation in power data and refer to this new approach as the extended LAI method (eLAI. The eLAI selects a method between linear interpolation (LI and the proposed LAI to improve accuracy under unexpected variations. Finally, from a simulation under various energy consumption profiles, we verify that the proposed eLAI achieves about a 74% reduction of the average imputation error in an energy system, compared to the existing imputation methods.

  5. A reference panel of 64,976 haplotypes for genotype imputation

    NARCIS (Netherlands)

    McCarthy, Shane; Das, Sayantan; Kretzschmar, Warren; Delaneau, Olivier; Wood, Andrew R.; Teumer, Alexander; Kang, Hyun Min; Fuchsberger, Christian; Danecek, Petr; Sharp, Kevin; Luo, Yang; Sidorel, Carlo; Kwong, Alan; Timpson, Nicholas; Koskinen, Seppo; Vrieze, Scott; Scott, Laura J.; Zhang, He; Mahajan, Anubha; Veldink, Jan; Peters, Ulrike; Pato, Carlos; van Duijn, Cornelia M.; Gillies, Christopher E.; Gandin, Ilaria; Mezzavilla, Massimo; Gilly, Arthur; Cocca, Massimiliano; Traglia, Michela; Angius, Andrea; Barrett, Jeffrey C.; Boomsma, Dorrett; Branham, Kari; Breen, Gerome; Brummett, Chad M.; Busonero, Fabio; Campbell, Harry; Chan, Andrew; Che, Sai; Chew, Emily; Collins, Francis S.; Corbin, Laura J.; Smith, George Davey; Dedoussis, George; Dorr, Marcus; Farmaki, Aliki-Eleni; Ferrucci, Luigi; Forer, Lukas; Fraser, Ross M.; Gabriel, Stacey; Levy, Shawn; Groop, Leif; Harrison, Tabitha; Hattersley, Andrew; Holmen, Oddgeir L.; Hveem, Kristian; Kretzler, Matthias; Lee, James C.; McGue, Matt; Meitinger, Thomas; Melzer, David; Min, Josine L.; Mohlke, Karen L.; Vincent, John B.; Nauck, Matthias; Nickerson, Deborah; Palotie, Aarno; Pato, Michele; Pirastu, Nicola; McInnis, Melvin; Richards, J. Brent; Sala, Cinzia; Salomaa, Veikko; Schlessinger, David; Schoenherr, Sebastian; Slagboom, P. Eline; Small, Kerrin; Spector, Timothy; Stambolian, Dwight; Tuke, Marcus; Tuomilehto, Jaakko; van den Berg, Leonard H.; Van Rheenen, Wouter; Volker, Uwe; Wijmenga, Cisca; Toniolo, Daniela; Zeggini, Eleftheria; Gasparini, Paolo; Sampson, Matthew G.; Wilson, James F.; Frayling, Timothy; de Bakker, Paul I. W.; Swertz, Morris A.; McCarroll, Steven; Kooperberg, Charles; Dekker, Annelot; Altshuler, David; Willer, Cristen; Iacono, William; Ripatti, Samuli; Soranzo, Nicole; Walter, Klaudia; Swaroop, Anand; Cucca, Francesco; Anderson, Carl A.; Myers, Richard M.; Boehnke, Michael; McCarthy, Mark I.; Durbin, Richard; Abecasis, Goncalo; Marchini, Jonathan

    2016-01-01

    We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in

  6. Statistical Analysis of a Class: Monte Carlo and Multiple Imputation Spreadsheet Methods for Estimation and Extrapolation

    Science.gov (United States)

    Fish, Laurel J.; Halcoussis, Dennis; Phillips, G. Michael

    2017-01-01

    The Monte Carlo method and related multiple imputation methods are traditionally used in math, physics and science to estimate and analyze data and are now becoming standard tools in analyzing business and financial problems. However, few sources explain the application of the Monte Carlo method for individuals and business professionals who are…

  7. Evaluation of location and number of aid post for sustainable humanitarian relief using agent based modeling (ABM) and geographic information system (GIS)

    Science.gov (United States)

    Khair, Fauzi; Sopha, Bertha Maya

    2017-12-01

    One of the crucial phases in disaster management is the response phase or the emergency response phase. It requires a sustainable system and a well-integrated management system. Any errors in the system on this phase will impact on significant increase of the victims number as well as material damage caused. Policies related to the location of aid posts are important decisions. The facts show that there are many failures in the process of providing assistance to the refugees due to lack of preparation and determination of facilities and aid post location. Therefore, this study aims to evaluate the number and location of aid posts on Merapi eruption in 2010. This study uses an integration between Agent Based Modeling (ABM) and Geographic Information System (GIS) about evaluation of the number and location of the aid post using some scenarios. The ABM approach aims to describe the agents behaviour (refugees and volunteers) in the event of a disaster with their respective characteristics. While the spatial data, GIS useful to describe real condition of the Sleman regency road. Based on the simulation result, it shows alternative scenarios that combine DERU UGM post, Maguwoharjo Stadium, Tagana Post and Pakem Main Post has better result in handling and distributing aid to evacuation barrack compared to initial scenario. Alternative scenarios indicates the unmet demands are less than the initial scenario.

  8. Estimation of Tree Lists from Airborne Laser Scanning Using Tree Model Clustering and k-MSN Imputation

    Directory of Open Access Journals (Sweden)

    Jörgen Wallerman

    2013-04-01

    Full Text Available Individual tree crowns may be delineated from airborne laser scanning (ALS data by segmentation of surface models or by 3D analysis. Segmentation of surface models benefits from using a priori knowledge about the proportions of tree crowns, which has not yet been utilized for 3D analysis to any great extent. In this study, an existing surface segmentation method was used as a basis for a new tree model 3D clustering method applied to ALS returns in 104 circular field plots with 12 m radius in pine-dominated boreal forest (64°14'N, 19°50'E. For each cluster below the tallest canopy layer, a parabolic surface was fitted to model a tree crown. The tree model clustering identified more trees than segmentation of the surface model, especially smaller trees below the tallest canopy layer. Stem attributes were estimated with k-Most Similar Neighbours (k-MSN imputation of the clusters based on field-measured trees. The accuracy at plot level from the k-MSN imputation (stem density root mean square error or RMSE 32.7%; stem volume RMSE 28.3% was similar to the corresponding results from the surface model (stem density RMSE 33.6%; stem volume RMSE 26.1% with leave-one-out cross-validation for one field plot at a time. Three-dimensional analysis of ALS data should also be evaluated in multi-layered forests since it identified a larger number of small trees below the tallest canopy layer.

  9. Geographical information systems

    DEFF Research Database (Denmark)

    Möller, Bernd

    2004-01-01

    The chapter gives an introduction to Geographical Information Systems (GIS) with particular focus on their application within environmental management.......The chapter gives an introduction to Geographical Information Systems (GIS) with particular focus on their application within environmental management....

  10. Hap-seq: an optimal algorithm for haplotype phasing with imputation using sequencing data.

    Science.gov (United States)

    He, Dan; Han, Buhm; Eskin, Eleazar

    2013-02-01

    Inference of haplotypes, or the sequence of alleles along each chromosome, is a fundamental problem in genetics and is important for many analyses, including admixture mapping, identifying regions of identity by descent, and imputation. Traditionally, haplotypes are inferred from genotype data obtained from microarrays using information on population haplotype frequencies inferred from either a large sample of genotyped individuals or a reference dataset such as the HapMap. Since the availability of large reference datasets, modern approaches for haplotype phasing along these lines are closely related to imputation methods. When applied to data obtained from sequencing studies, a straightforward way to obtain haplotypes is to first infer genotypes from the sequence data and then apply an imputation method. However, this approach does not take into account that alleles on the same sequence read originate from the same chromosome. Haplotype assembly approaches take advantage of this insight and predict haplotypes by assigning the reads to chromosomes in such a way that minimizes the number of conflicts between the reads and the predicted haplotypes. Unfortunately, assembly approaches require very high sequencing coverage and are usually not able to fully reconstruct the haplotypes. In this work, we present a novel approach, Hap-seq, which is simultaneously an imputation and assembly method that combines information from a reference dataset with the information from the reads using a likelihood framework. Our method applies a dynamic programming algorithm to identify the predicted haplotype, which maximizes the joint likelihood of the haplotype with respect to the reference dataset and the haplotype with respect to the observed reads. We show that our method requires only low sequencing coverage and can reconstruct haplotypes containing both common and rare alleles with higher accuracy compared to the state-of-the-art imputation methods.

  11. Geographic Media Literacy

    Science.gov (United States)

    Lukinbeal, Chris

    2014-01-01

    While the use of media permeates geographic research and pedagogic practice, the underlying literacies that link geography and media remain uncharted. This article argues that geographic media literacy incorporates visual literacy, information technology literacy, information literacy, and media literacy. Geographic media literacy is the ability…

  12. Multi-task Gaussian process for imputing missing data in multi-trait and multi-environment trials.

    Science.gov (United States)

    Hori, Tomoaki; Montcho, David; Agbangla, Clement; Ebana, Kaworu; Futakuchi, Koichi; Iwata, Hiroyoshi

    2016-11-01

    A method based on a multi-task Gaussian process using self-measuring similarity gave increased accuracy for imputing missing phenotypic data in multi-trait and multi-environment trials. Multi-environmental trial (MET) data often encounter the problem of missing data. Accurate imputation of missing data makes subsequent analysis more effective and the results easier to understand. Moreover, accurate imputation may help to reduce the cost of phenotyping for thinned-out lines tested in METs. METs are generally performed for multiple traits that are correlated to each other. Correlation among traits can be useful information for imputation, but single-trait-based methods cannot utilize information shared by traits that are correlated. In this paper, we propose imputation methods based on a multi-task Gaussian process (MTGP) using self-measuring similarity kernels reflecting relationships among traits, genotypes, and environments. This framework allows us to use genetic correlation among multi-trait multi-environment data and also to combine MET data and marker genotype data. We compared the accuracy of three MTGP methods and iterative regularized PCA using rice MET data. Two scenarios for the generation of missing data at various missing rates were considered. The MTGP performed a better imputation accuracy than regularized PCA, especially at high missing rates. Under the 'uniform' scenario, in which missing data arise randomly, inclusion of marker genotype data in the imputation increased the imputation accuracy at high missing rates. Under the 'fiber' scenario, in which missing data arise in all traits for some combinations between genotypes and environments, the inclusion of marker genotype data decreased the imputation accuracy for most traits while increasing the accuracy in a few traits remarkably. The proposed methods will be useful for solving the missing data problem in MET data.

  13. Geographic information systems as a tool for environmental evaluation of hydropower potential; Sistemas de informacoes geograficas como ferramenta para avaliacao ambiental de potenciais hidreletricos

    Energy Technology Data Exchange (ETDEWEB)

    Dzedzej, Maira; Correa, Fabio; Malta, Joao [IX Consultoria e Representacoes Ltda, Itajuba, MG (Brazil); Flauzino, Barbara Karoline [IX Consultoria e Representacoes Ltda, Itajuba, MG (Brazil); Universidade Federal de Itajuba (UNIFEI), MG (Brazil); Santos, Afonso Henriques Moreira [MS Consultoria Ltda, Itajuba, MG (Brazil); Universidade Federal de Itajuba (UNIFEI), MG (Brazil)

    2010-07-01

    The hydropower plants are responsible for much of the energy generated in the country, there is also a large hydro potential in Brazilian rivers. This form of power generation is considered renewable and fits into the concept of sustainable development, however, social and environmental impacts from the implementation of hydropower projects are known and widely discussed, especially when it comes to large plants. In this context, study the environmental analysis of potential hydropower was incorporated at various stages of the studies implementation, in order to, identify environmental factors and that will restrict or impede construction, to obtain the best option for the environment, evaluate the role and of social and environmental impacts, contribute to improving the design and functionality of the enterprises in order to reduce overall costs, minimize conflicts and assist in preserving the environment. To fulfill these functions to a satisfactory and reliable level, it the study has increasingly used the techniques, tools and applications of Geographic Information Systems in the process of environmental assessment, since they provide procurement, integration, visualization and data analysis of natural resources, its uses and protection, offering greater security and speed in decision making. This paper presents some applications of GIS in environmental assessment processes, developed mainly in the steps of estimating hydropower potential, hydropower inventory, basic design and environmental licensing. (author)

  14. Developing an analytical tool for evaluating EMS system design changes and their impact on cardiac arrest outcomes: combining geographic information systems with register data on survival rates

    Directory of Open Access Journals (Sweden)

    Sund Björn

    2013-02-01

    Full Text Available Abstract Background Out-of-hospital cardiac arrest (OHCA is a frequent and acute medical condition that requires immediate care. We estimate survival rates from OHCA in the area of Stockholm, through developing an analytical tool for evaluating Emergency Medical Services (EMS system design changes. The study also is an attempt to validate the proposed model used to generate the outcome measures for the study. Methods and results This was done by combining a geographic information systems (GIS simulation of driving times with register data on survival rates. The emergency resources comprised ambulance alone and ambulance plus fire services. The simulation model predicted a baseline survival rate of 3.9 per cent, and reducing the ambulance response time by one minute increased survival to 4.6 per cent. Adding the fire services as first responders (dual dispatch increased survival to 6.2 per cent from the baseline level. The model predictions were validated using empirical data. Conclusion We have presented an analytical tool that easily can be generalized to other regions or countries. The model can be used to predict outcomes of cardiac arrest prior to investment in EMS design changes that affect the alarm process, e.g. (1 static changes such as trimming the emergency call handling time or (2 dynamic changes such as location of emergency resources or which resources should carry a defibrillator.

  15. Archipelago colonization by ecologically dissimilar amphibians: evaluating the expectation of common evolutionary history of geographical diffusion in co-distributed rainforest tree frogs in islands of Southeast Asia.

    Science.gov (United States)

    Gonzalez, Paulette; Su, Yong-Chao; Siler, Cameron D; Barley, Anthony J; Sanguila, Marites B; Diesmos, Arvin C; Brown, Rafe M

    2014-03-01

    Widespread, co-distributed species with limited relative dispersal abilities represent compelling focal taxa for comparative phylogeography. Forest vertebrates in island archipelagos often exhibit pronounced population structure resulting from limited dispersal abilities or capacity to overcome marine barriers to dispersal. The exceptionally diverse Old World tree frogs of the family Rhacophoridae have colonized the forested island archipelagos of Southeast Asia on multiple occasions, entering the islands of Indonesia and the Philippines via a "stepping stone" mode of dispersal along elongate island chains, separated by a series of marine channels. Here we evaluate the prediction that two tightly co-distributed Philippine rhacophorids colonized the archipelago during concomitant timescales and in the same, linear, "island-hopping" progression. We use a new multilocus dataset, utilize dense genetic sampling from the eastern arc of the Philippines, and we take a model-based phylogeographic approach to examining the two species for similar topological patterns of diversification, genetic structure, and timescales of diversification. Our results support some common mechanistic predictions (a general south-to-north polarity of colonization) but not others (timescale for colonization and manner and degree of lineage diversification), suggesting differing biogeographic scenarios of geographical diffusion through the archipelago and unique and idiosyncratic ecological capacities and evolutionary histories of each species. Copyright © 2013 Elsevier Inc. All rights reserved.

  16. Developing an analytical tool for evaluating EMS system design changes and their impact on cardiac arrest outcomes: combining geographic information systems with register data on survival rates.

    Science.gov (United States)

    Sund, Björn

    2013-02-15

    Out-of-hospital cardiac arrest (OHCA) is a frequent and acute medical condition that requires immediate care. We estimate survival rates from OHCA in the area of Stockholm, through developing an analytical tool for evaluating Emergency Medical Services (EMS) system design changes. The study also is an attempt to validate the proposed model used to generate the outcome measures for the study. This was done by combining a geographic information systems (GIS) simulation of driving times with register data on survival rates. The emergency resources comprised ambulance alone and ambulance plus fire services. The simulation model predicted a baseline survival rate of 3.9 per cent, and reducing the ambulance response time by one minute increased survival to 4.6 per cent. Adding the fire services as first responders (dual dispatch) increased survival to 6.2 per cent from the baseline level. The model predictions were validated using empirical data. We have presented an analytical tool that easily can be generalized to other regions or countries. The model can be used to predict outcomes of cardiac arrest prior to investment in EMS design changes that affect the alarm process, e.g. (1) static changes such as trimming the emergency call handling time or (2) dynamic changes such as location of emergency resources or which resources should carry a defibrillator.

  17. A geographical information system-based multicriteria evaluation to map areas at risk for Rift Valley fever vector-borne transmission in Italy.

    Science.gov (United States)

    Tran, A; Ippoliti, C; Balenghien, T; Conte, A; Gely, M; Calistri, P; Goffredo, M; Baldet, T; Chevalier, V

    2013-11-01

    Rift Valley fever (RVF) is a severe mosquito-borne disease that is caused by a Phlebovirus (Bunyaviridae) and affects domestic ruminants and humans. Recently, its distribution widened, threatening Europe. The probability of the introduction and large-scale spread of Rift Valley fever virus (RVFV) in Europe is low, but localized RVF outbreaks may occur in areas where populations of ruminants and potential vectors are present. In this study, we assumed the introduction of the virus into Italy and focused on the risk of vector-borne transmission of RVFV to three main European potential hosts (cattle, sheep and goats). Five main potential mosquito vectors belonging to the Culex and Aedes genera that are present in Italy were identified in a literature review. We first modelled the geographical distribution of these five species based on expert knowledge and using land cover as a proxy of mosquito presence. The mosquito distribution maps were compared with field mosquito collections from Italy to validate the model. Next, the risk of RVFV transmission was modelled using a multicriteria evaluation (MCE) approach, integrating expert knowledge and the results of a literature review on host sensitivity and vector competence, feeding behaviour and abundance. A sensitivity analysis was performed to assess the robustness of the results with respect to expert choices. The resulting maps include (i) five maps of the vector distribution, (ii) a map of suitable areas for vector-borne transmission of RVFV and (iii) a map of the risk of RVFV vector-borne transmission to sensitive hosts given a viral introduction. Good agreement was found between the modelled presence probability and the observed presence or absence of each vector species. The resulting RVF risk map highlighted strong spatial heterogeneity and could be used to target surveillance. In conclusion, the geographical information system (GIS)-based MCE served as a valuable framework and a flexible tool for mapping the

  18. Imputing forest carbon stock estimates from inventory plots to a nationally continuous coverage.

    Science.gov (United States)

    Wilson, Barry Tyler; Woodall, Christopher W; Griffith, Douglas M

    2013-01-11

    The U.S. has been providing national-scale estimates of forest carbon (C) stocks and stock change to meet United Nations Framework Convention on Climate Change (UNFCCC) reporting requirements for years. Although these currently are provided as national estimates by pool and year to meet greenhouse gas monitoring requirements, there is growing need to disaggregate these estimates to finer scales to enable strategic forest management and monitoring activities focused on various ecosystem services such as C storage enhancement. Through application of a nearest-neighbor imputation approach, spatially extant estimates of forest C density were developed for the conterminous U.S. using the U.S.'s annual forest inventory. Results suggest that an existing forest inventory plot imputation approach can be readily modified to provide raster maps of C density across a range of pools (e.g., live tree to soil organic carbon) and spatial scales (e.g., sub-county to biome). Comparisons among imputed maps indicate strong regional differences across C pools. The C density of pools closely related to detrital input (e.g., dead wood) is often highest in forests suffering from recent mortality events such as those in the northern Rocky Mountains (e.g., beetle infestations). In contrast, live tree carbon density is often highest on the highest quality forest sites such as those found in the Pacific Northwest. Validation results suggest strong agreement between the estimates produced from the forest inventory plots and those from the imputed maps, particularly when the C pool is closely associated with the imputation model (e.g., aboveground live biomass and live tree basal area), with weaker agreement for detrital pools (e.g., standing dead trees). Forest inventory imputed plot maps provide an efficient and flexible approach to monitoring diverse C pools at national (e.g., UNFCCC) and regional scales (e.g., Reducing Emissions from Deforestation and Forest Degradation projects) while

  19. Imputing forest carbon stock estimates from inventory plots to a nationally continuous coverage

    Directory of Open Access Journals (Sweden)

    Wilson Barry Tyler

    2013-01-01

    Full Text Available Abstract The U.S. has been providing national-scale estimates of forest carbon (C stocks and stock change to meet United Nations Framework Convention on Climate Change (UNFCCC reporting requirements for years. Although these currently are provided as national estimates by pool and year to meet greenhouse gas monitoring requirements, there is growing need to disaggregate these estimates to finer scales to enable strategic forest management and monitoring activities focused on various ecosystem services such as C storage enhancement. Through application of a nearest-neighbor imputation approach, spatially extant estimates of forest C density were developed for the conterminous U.S. using the U.S.’s annual forest inventory. Results suggest that an existing forest inventory plot imputation approach can be readily modified to provide raster maps of C density across a range of pools (e.g., live tree to soil organic carbon and spatial scales (e.g., sub-county to biome. Comparisons among imputed maps indicate strong regional differences across C pools. The C density of pools closely related to detrital input (e.g., dead wood is often highest in forests suffering from recent mortality events such as those in the northern Rocky Mountains (e.g., beetle infestations. In contrast, live tree carbon density is often highest on the highest quality forest sites such as those found in the Pacific Northwest. Validation results suggest strong agreement between the estimates produced from the forest inventory plots and those from the imputed maps, particularly when the C pool is closely associated with the imputation model (e.g., aboveground live biomass and live tree basal area, with weaker agreement for detrital pools (e.g., standing dead trees. Forest inventory imputed plot maps provide an efficient and flexible approach to monitoring diverse C pools at national (e.g., UNFCCC and regional scales (e.g., Reducing Emissions from Deforestation and Forest

  20. Assessment of Consequences of Replacement of System of the Uniform Tax on Imputed Income Patent System of the Taxation

    National Research Council Canada - National Science Library

    Galina A. Manokhina; Raisa S. Vidishcheva

    2012-01-01

    The article highlights the main questions concerning possible consequences of replacement of nowadays operating system in the form of a single tax in reference to imputed income with patent system of the taxation...

  1. A new genotype imputation method with tolerance to high missing rate and rare variants.

    Directory of Open Access Journals (Sweden)

    Yumei Yang

    Full Text Available We report a novel algorithm, iBLUP, to impute missing genotypes by simultaneously and comprehensively using identity by descent and linkage disequilibrium information. The simulation studies showed that the algorithm exhibited drastically tolerance to high missing rate, especially for rare variants than other common imputation methods, e.g. BEAGLE and fastPHASE. At a missing rate of 70%, the accuracy of BEAGLE and fastPHASE dropped to 0.82 and 0.74 respectively while iBLUP retained an accuracy of 0.95. For minor allele, the accuracy of BEAGLE and fastPHASE decreased to -0.1 and 0.03, while iBLUP still had an accuracy of 0.61.We implemented the algorithm in a publicly available software package also named iBLUP. The application of iBLUP for processing real sequencing data in an outbred pig population was demonstrated.

  2. Integration and imputation of survey data in R: the StatMatch package

    Directory of Open Access Journals (Sweden)

    Marcello D’Orazio

    2015-06-01

    Full Text Available Statistical matching methods permit to integrate two or more data sources with the purpose of investigating the relationship between variables not jointly observed. Recently these methods received much attention as valid alternative to produce new statistical outputs. The paper provides an overview on the statistical matching methods implemented in the package StatMatch for the R environment, focusing on the most widespread methods and how they were improved. Particular attention is devoted to hot deck matching methods, strictly related to the ones developed for the imputation of missing values. The corresponding functions in StatMatch are very powerful and are flexible enough to be applied for imputing missing values in a survey. The paper tackles also the problem of matching data from complex sample surveys, a very important topic in National Statistical Institutes. Finally it is described the concept of uncertainty characterizing the statistical matching framework and how this alternative approach can be exploited for different purposes.

  3. Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population.

    Science.gov (United States)

    Jattawa, Danai; Elzo, Mauricio A; Koonawootrittriron, Skorn; Suwanasopee, Thanathip

    2016-04-01

    The objective of this study was to investigate the accuracy of imputation from low density (LDC) to moderate density SNP chips (MDC) in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244) from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570), GGP26K (n = 540) and GGP80K (n = 134) chips. After checking for single nucleotide polymorphism (SNP) quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912) and a test group (n = 332). The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652). The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm), FImpute 2.2 (combined family- and population-based algorithms) and Findhap 4 (combined family- and population-based algorithms). Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94%) than Findhap (84.64%) and Beagle (76.79%). Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73%) or low (80%) imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart). Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information.

  4. Airports Geographic Information System -

    Data.gov (United States)

    Department of Transportation — The Airports Geographic Information System maintains the airport and aeronautical data required to meet the demands of the Next Generation National Airspace System....

  5. Evaluation of strontium isotope abundance ratios in combination with multi-elemental analysis as a possible tool to study the geographical origin of ciders

    Energy Technology Data Exchange (ETDEWEB)

    Garcia-Ruiz, Silvia [Department of Physical and Analytical Chemistry, University of Oviedo, Julian Claveria 8, 33006 Oviedo (Spain); Moldovan, Mariella [Department of Physical and Analytical Chemistry, University of Oviedo, Julian Claveria 8, 33006 Oviedo (Spain); Fortunato, Giuseppino [Swiss Federal Laboratories for Materials Testing and Research EMPA, 9014 St. Gallen (Switzerland); Wunderli, Samuel [Swiss Federal Laboratories for Materials Testing and Research EMPA, 9014 St. Gallen (Switzerland); Garcia Alonso, J. Ignacio [Department of Physical and Analytical Chemistry, University of Oviedo, Julian Claveria 8, 33006 Oviedo (Spain)]. E-mail: jiga@uniovi.es

    2007-05-02

    In order to evaluate alternative analytical methodologies to study the geographical origin of ciders, both multi-elemental analysis and Sr isotope abundance ratios in combination with multivariate statistical analysis were estimated in 67 samples from England, Switzerland, France and two Spanish regions (Asturias and the Basque Country). A methodology for the precise and accurate determination of the {sup 87}Sr/{sup 86}Sr isotope abundance ratio in ciders by multicollector inductively coupled plasma mass spectrometry (MC-ICP-MS) was developed. Major elements (Na, K, Ca and Mg) were measured by ICP-AES and minor and trace elements (Li, Be, B, Al, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, As, Se, Rb, Sr, Y, Mo, Cd, Sn, Sb, Cs, Ba, La, Ce, W, Tl, Pb, Bi, Th and U) were measured by ICP-MS using a collision cell instrument operated in multitune mode. An analysis of variance (ANOVA test) indicated that group means for B, Cr, Fe, Ni, Cu, Se, Cd, Cs, Ce, W, Pb, Bi and U did not show any significant differences at the 95% confidence level, so these elements were rejected for further statistical analysis. Another group of elements (Li, Be, Sc, Co, Ga, Y, Sn, Sb, La, Tl, Th) was removed from the data set because concentrations were close to the limits of detection for many samples. Therefore, the remaining elements (Na, Mg, Al, K, Ca, Ti, V, Mn, Zn, As, Rb, Sr, Mo, Ba) together with {sup 87}Sr/{sup 86}Sr isotope abundance ratio were considered for principal component analysis (PCA) and linear discriminant analysis (LDA). Finally, LDA was able to classify correctly 100% of cider samples coming from different Spanish regions, France, England and Switzerland when considering Na, Mg, Al, K, Ca, Ti, V, Mn, Zn, As, Rb, Sr, Mo, Ba and {sup 87}Sr/{sup 86}Sr isotope abundance ratio as original variables.

  6. Effects of height and live crown ratio imputation strategies on stand biomass estimation

    Science.gov (United States)

    Elijah J. Allensworth; Temesgen. Hailemariam

    2015-01-01

    The effects of subsample design and imputation of total height (ht) and live crown ratio (cr) on the accuracy of stand-level estimates of component and total aboveground biomass are not well investigated in the current body of literature. To assess this gap in research, this study uses a data set of 3,454 Douglas-fir trees obtained from 102 stands in southwestern...

  7. Cost-effective and accurate method of measuring fetal fraction using SNP imputation.

    Science.gov (United States)

    Kim, Minjeong; Kim, Jai-Hoon; Kim, Kangseok; Kim, Sunshin

    2017-11-08

    With the discovery of cell-free fetal DNA in maternal blood, the demand for non-invasive prenatal testing (NIPT) has been increasing. To obtain reliable NIPT results, it is important to accurately estimate the fetal fraction. In this study, we propose an accurate and cost-effective method for measuring fetal fractions using single-nucleotide polymorphisms (SNPs). A total of 84 samples were sequenced via semiconductor sequencing using a 0.3x sequencing coverage. SNPs were genotyped to estimate the fetal fraction. Approximately 900,000 SNPs were genotyped, and 250,000 of these SNPs matched the semiconductor sequencing results. We performed SNP imputation (1000Genome phase3 and HRC v1.1 reference panel) to increase the number of SNPs. The correlation coefficients (R2) of the fetal fraction estimated using the ratio of non-maternal alleles when coverage was reduced to 0.01 following SNP imputation were 0.93 (HRC v1.1 reference panel) and 0.90 (1000GP3 reference panel). An R2 of 0.72 was found at 0.01x sequencing coverage with no imputation performed. We developed an accurate method to measure fetal fraction using SNP imputation, showing cost-effectiveness by using different commercially available SNP chips and lowering the coverage. We also showed that semiconductor sequencing, which is an inexpensive option, was useful for measuring fetal fraction. python source code and guidelines can be found at https://github.com/KMJ403/fetalfraction-SNPimpute. kangskim@ajou.ac.kr, sunshinkim3@gmail.com. Supplementary data are available at Bioinformatics online.

  8. Interaction association analysis of imputed SNPs in case-control and follow-up studies.

    Science.gov (United States)

    Subirana, Isaac; González, Juan R

    2015-03-01

    A new method is described to assess the interactions of imputed SNPs (single nucleotide polymorphisms) in case-control and follow-up studies, properly incorporating SNP imputation uncertainty in the likelihood model. Using simulation studies and analysis of real data obtained from the Framingham study cohort, we compare the performance of this new method to DOSAGE and NAIVE (also known as Best-Guess) methods, developed and commonly used in the context of single SNP and extended to SNP-by-SNP interaction. The results show that only our new method is unbiased under all examined scenarios regarding allele frequencies, imputation uncertainty degree, and interaction effect size. In addition, our method achieves at least as much power as the other two, and exceeds their statistical power in certain follow-up analysis situations. This method is fast enough to perform Genome Wide Interaction Studies (GWIS) with hundreds of thousands of interactions. By performing an exhaustive simulation study let us to provide recommendations for selecting the most appropriated method depending on MAF, interaction effect size, and uncertainty degree. In general, DOSAGE and our proposed method are recommended in most situations being our method more powerful and accurate when uncertainty and effect increase. © 2015 Wiley Periodicals, Inc.

  9. A Hybrid Algorithm for Missing Data Imputation and Its Application to Electrical Data Loggers

    Directory of Open Access Journals (Sweden)

    Concepción Crespo Turrado

    2016-09-01

    Full Text Available The storage of data is a key process in the study of electrical power networks related to the search for harmonics and the finding of a lack of balance among phases. The presence of missing data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, current in each phase and power factor affects any time series study in a negative way that has to be addressed. When this occurs, missing data imputation algorithms are required. These algorithms are able to substitute the data that are missing for estimated values. This research presents a new algorithm for the missing data imputation method based on Self-Organized Maps Neural Networks and Mahalanobis distances and compares it not only with a well-known technique called Multivariate Imputation by Chained Equations (MICE but also with an algorithm previously proposed by the authors called Adaptive Assignation Algorithm (AAA. The results obtained demonstrate how the proposed method outperforms both algorithms.

  10. TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION.

    Science.gov (United States)

    Allen, Genevera I; Tibshirani, Robert

    2010-06-01

    Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable , meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal , in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.

  11. Exact Inference for Hardy-Weinberg Proportions with Missing Genotypes: Single and Multiple Imputation.

    Science.gov (United States)

    Graffelman, Jan; Nelson, S; Gogarten, S M; Weir, B S

    2015-09-15

    This paper addresses the issue of exact-test based statistical inference for Hardy-Weinberg equilibrium in the presence of missing genotype data. Missing genotypes often are discarded when markers are tested for Hardy-Weinberg equilibrium, which can lead to bias in the statistical inference about equilibrium. Single and multiple imputation can improve inference on equilibrium. We develop tests for equilibrium in the presence of missingness by using both inbreeding coefficients (or, equivalently, χ(2) statistics) and exact p-values. The analysis of a set of markers with a high missing rate from the GENEVA project on prematurity shows that exact inference on equilibrium can be altered considerably when missingness is taken into account. For markers with a high missing rate (>5%), we found that both single and multiple imputation tend to diminish evidence for Hardy-Weinberg disequilibrium. Depending on the imputation method used, 6-13% of the test results changed qualitatively at the 5% level. Copyright © 2015 Graffelman et al.

  12. Bootstrap imputation with a disease probability model minimized bias from misclassification due to administrative database codes.

    Science.gov (United States)

    van Walraven, Carl

    2017-04-01

    Diagnostic codes used in administrative databases cause bias due to misclassification of patient disease status. It is unclear which methods minimize this bias. Serum creatinine measures were used to determine severe renal failure status in 50,074 hospitalized patients. The true prevalence of severe renal failure and its association with covariates were measured. These were compared to results for which renal failure status was determined using surrogate measures including the following: (1) diagnostic codes; (2) categorization of probability estimates of renal failure determined from a previously validated model; or (3) bootstrap methods imputation of disease status using model-derived probability estimates. Bias in estimates of severe renal failure prevalence and its association with covariates were minimal when bootstrap methods were used to impute renal failure status from model-based probability estimates. In contrast, biases were extensive when renal failure status was determined using codes or methods in which model-based condition probability was categorized. Bias due to misclassification from inaccurate diagnostic codes can be minimized using bootstrap methods to impute condition status using multivariable model-derived probability estimates. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION

    Science.gov (United States)

    Allen, Genevera I.; Tibshirani, Robert

    2015-01-01

    Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility. PMID:26877823

  14. Imputation of microsatellite alleles from dense SNP genotypes for parental verification.

    Science.gov (United States)

    McClure, Matthew; Sonstegard, Tad; Wiggans, George; Van Tassell, Curtis P

    2012-01-01

    Microsatellite (MS) markers have recently been used for parental verification and are still the international standard despite higher cost, error rate, and turnaround time compared with Single Nucleotide Polymorphisms (SNP)-based assays. Despite domestic and international interest from producers and research communities, no viable means currently exist to verify parentage for an individual unless all familial connections were analyzed using the same DNA marker type (MS or SNP). A simple and cost-effective method was devised to impute MS alleles from SNP haplotypes within breeds. For some MS, imputation results may allow inference across breeds. A total of 347 dairy cattle representing four dairy breeds (Brown Swiss, Guernsey, Holstein, and Jersey) were used to generate reference haplotypes. This approach has been verified (>98% accurate) for imputing the International Society of Animal Genetics recommended panel of 12 MS for cattle parentage verification across a validation set of 1,307 dairy animals. Implementation of this method will allow producers and breed associations to transition to SNP-based parentage verification utilizing MS genotypes from historical data on parents where SNP genotypes are missing. This approach may be applicable to additional cattle breeds and other species that wish to migrate from MS- to SNP-based parental verification.

  15. Imputation of microsatellite alleles from dense SNP genotypes for parental verification

    Directory of Open Access Journals (Sweden)

    Matthew eMcclure

    2012-08-01

    Full Text Available Microsatellite (MS markers have recently been used for parental verification and are still the international standard despite higher cost, error rate, and turnaround time compared with Single Nucleotide Polymorphisms (SNP-based assays. Despite domestic and international interest from producers and research communities, no viable means currently exist to verify parentage for an individual unless all familial connections were analyzed using the same DNA marker type (MS or SNP. A simple and cost-effective method was devised to impute MS alleles from SNP haplotypes within breeds. For some MS, imputation results may allow inference across breeds. A total of 347 dairy cattle representing 4 dairy breeds (Brown Swiss, Guernsey, Holstein, and Jersey were used to generate reference haplotypes. This approach has been verified (>98% accurate for imputing the International Society of Animal Genetics (ISAG recommended panel of 12 MS for cattle parentage verification across a validation set of 1,307 dairy animals.. Implementation of this method will allow producers and breed associations to transition to SNP-based parentage verification utilizing MS genotypes from historical data on parents where SNP genotypes are missing. This approach may be applicable to additional cattle breeds and other species that wish to migrate from MS- to SNP- based parental verification.

  16. Data Editing and Imputation in Business Surveys Using “R”

    Directory of Open Access Journals (Sweden)

    Elena Romascanu

    2014-06-01

    Full Text Available Purpose – Missing data are a recurring problem that can cause bias or lead to inefficient analyses. The objective of this paper is a direct comparison between the two statistical software features R and SPSS, in order to take full advantage of the existing automated methods for data editing process and imputation in business surveys (with a proper design of consistency rules as a partial alternative to the manual editing of data. Approach – The comparison of different methods on editing surveys data, in R with the ‘editrules’ and ‘survey’ packages because inside those, exist commonly used transformations in official statistics, as visualization of missing values pattern using ‘Amelia’ and ‘VIM’ packages, imputation approaches for longitudinal data using ‘VIMGUI’ and a comparison of another statistical software performance on the same features, such as SPSS. Findings – Data on business statistics received by NIS’s (National Institute of Statistics are not ready to be used for direct analysis due to in-record inconsistencies, errors and missing values from the collected data sets. The appropriate automatic methods from R packages, offers the ability to set the erroneous fields in edit-violating records, to verify the results after the imputation of missing values providing for users a flexible, less time consuming approach and easy to perform automation in R than in SPSS Macros syntax situations, when macros are very handy.

  17. Quick, “Imputation-free” meta-analysis with proxy-SNPs

    Directory of Open Access Journals (Sweden)

    Meesters Christian

    2012-09-01

    Full Text Available Abstract Background Meta-analysis (MA is widely used to pool genome-wide association studies (GWASes in order to a increase the power to detect strong or weak genotype effects or b as a result verification method. As a consequence of differing SNP panels among genotyping chips, imputation is the method of choice within GWAS consortia to avoid losing too many SNPs in a MA. YAMAS (Yet Another Meta Analysis Software, however, enables cross-GWAS conclusions prior to finished and polished imputation runs, which eventually are time-consuming. Results Here we present a fast method to avoid forfeiting SNPs present in only a subset of studies, without relying on imputation. This is accomplished by using reference linkage disequilibrium data from 1,000 Genomes/HapMap projects to find proxy-SNPs together with in-phase alleles for SNPs missing in at least one study. MA is conducted by combining association effect estimates of a SNP and those of its proxy-SNPs. Our algorithm is implemented in the MA software YAMAS. Association results from GWAS analysis applications can be used as input files for MA, tremendously speeding up MA compared to the conventional imputation approach. We show that our proxy algorithm is well-powered and yields valuable ad hoc results, possibly providing an incentive for follow-up studies. We propose our method as a quick screening step prior to imputation-based MA, as well as an additional main approach for studies without available reference data matching the ethnicities of study participants. As a proof of principle, we analyzed six dbGaP Type II Diabetes GWAS and found that the proxy algorithm clearly outperforms naïve MA on the p-value level: for 17 out of 23 we observe an improvement on the p-value level by a factor of more than two, and a maximum improvement by a factor of 2127. Conclusions YAMAS is an efficient and fast meta-analysis program which offers various methods, including conventional MA as well as inserting proxy

  18. Quick, "imputation-free" meta-analysis with proxy-SNPs.

    Science.gov (United States)

    Meesters, Christian; Leber, Markus; Herold, Christine; Angisch, Marina; Mattheisen, Manuel; Drichel, Dmitriy; Lacour, André; Becker, Tim

    2012-09-12

    Meta-analysis (MA) is widely used to pool genome-wide association studies (GWASes) in order to a) increase the power to detect strong or weak genotype effects or b) as a result verification method. As a consequence of differing SNP panels among genotyping chips, imputation is the method of choice within GWAS consortia to avoid losing too many SNPs in a MA. YAMAS (Yet Another Meta Analysis Software), however, enables cross-GWAS conclusions prior to finished and polished imputation runs, which eventually are time-consuming. Here we present a fast method to avoid forfeiting SNPs present in only a subset of studies, without relying on imputation. This is accomplished by using reference linkage disequilibrium data from 1,000 Genomes/HapMap projects to find proxy-SNPs together with in-phase alleles for SNPs missing in at least one study. MA is conducted by combining association effect estimates of a SNP and those of its proxy-SNPs. Our algorithm is implemented in the MA software YAMAS. Association results from GWAS analysis applications can be used as input files for MA, tremendously speeding up MA compared to the conventional imputation approach. We show that our proxy algorithm is well-powered and yields valuable ad hoc results, possibly providing an incentive for follow-up studies. We propose our method as a quick screening step prior to imputation-based MA, as well as an additional main approach for studies without available reference data matching the ethnicities of study participants. As a proof of principle, we analyzed six dbGaP Type II Diabetes GWAS and found that the proxy algorithm clearly outperforms naïve MA on the p-value level: for 17 out of 23 we observe an improvement on the p-value level by a factor of more than two, and a maximum improvement by a factor of 2127. YAMAS is an efficient and fast meta-analysis program which offers various methods, including conventional MA as well as inserting proxy-SNPs for missing markers to avoid unnecessary power loss

  19. Genomic predictions for economically important traits in Brazilian Braford and Hereford beef cattle using true and imputed genotypes.

    Science.gov (United States)

    Piccoli, Mario L; Brito, Luiz F; Braccini, José; Cardoso, Fernando F; Sargolzaei, Mehdi; Schenkel, Flávio S

    2017-01-18

    Genomic selection (GS) has played an important role in cattle breeding programs. However, genotyping prices are still a challenge for implementation of GS in beef cattle and there is still a lack of information about the use of low-density Single Nucleotide Polymorphisms (SNP) chip panels for genomic predictions in breeds such as Brazilian Braford and Hereford. Therefore, this study investigated the effect of using imputed genotypes in the accuracy of genomic predictions for twenty economically important traits in Brazilian Braford and Hereford beef cattle. Various scenarios composed by different percentages of animals with imputed genotypes and different sizes of the training population were compared. De-regressed EBVs (estimated breeding values) were used as pseudo-phenotypes in a Genomic Best Linear Unbiased Prediction (GBLUP) model using two different mimicked panels derived from the 50 K (8 K and 15 K SNP panels), which were subsequently imputed to the 50 K panel. In addition, genomic prediction accuracies generated from a 777 K SNP (imputed from the 50 K SNP) were presented as another alternate scenario. The accuracy of genomic breeding values averaged over the twenty traits ranged from 0.38 to 0.40 across the different scenarios. The average losses in expected genomic estimated breeding values (GEBV) accuracy (accuracy obtained from the inverse of the mixed model equations) relative to the true 50 K genotypes ranged from -0.0007 to -0.0012 and from -0.0002 to -0.0005 when using the 50 K imputed from the 8 K or 15 K, respectively. When using the imputed 777 K panel the average losses in expected GEBV accuracy was -0.0021. The average gain in expected EBVs accuracy by including genomic information when compared to simple BLUP was between 0.02 and 0.03 across scenarios and traits. The percentage of animals with imputed genotypes in the training population did not significantly influence the validation accuracy. However, the size of the training

  20. Comparing strategies for selection of low-density SNPs for imputation-mediated genomic prediction in U. S. Holsteins.

    Science.gov (United States)

    He, Jun; Xu, Jiaqi; Wu, Xiao-Lin; Bauck, Stewart; Lee, Jungjae; Morota, Gota; Kachman, Stephen D; Spangler, Matthew L

    2017-12-14

    SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821-0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825-0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction.

  1. Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa.

    Directory of Open Access Journals (Sweden)

    Katya L Masconi

    Full Text Available Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation.Data from the Cape Town Bellville-South cohort served as the basis for this study. Imputation methods and models were identified via recent systematic reviews. Models' discrimination was assessed and compared using C-statistic and non-parametric methods, before and after recalibration through simple intercept adjustment.The study sample consisted of 1256 individuals, of whom 173 were excluded due to previously diagnosed diabetes. Of the final 1083 individuals, 329 (30.4% had missing data. Family history had the highest proportion of missing data (25%. Imputation of the outcome, undiagnosed diabetes, was highest in stochastic regression imputation (163 individuals. Overall, deletion resulted in the lowest model performances while simple imputation yielded the highest C-statistic for the Cambridge Diabetes Risk model, Kuwaiti Risk model, Omani Diabetes Risk model and Rotterdam Predictive model. Multiple imputation only yielded the highest C-statistic for the Rotterdam Predictive model, which were matched by simpler imputation methods.Deletion was confirmed as a poor technique for handling missing data. However, despite the emphasized disadvantages of simpler imputation methods, this study showed that implementing these methods results in similar predictive utility for undiagnosed diabetes when compared to multiple imputation.

  2. Imputation of variants from the 1000 Genomes Project modestly improves known associations and can identify low-frequency variant-phenotype associations undetected by HapMap based imputation.

    Science.gov (United States)

    Wood, Andrew R; Perry, John R B; Tanaka, Toshiko; Hernandez, Dena G; Zheng, Hou-Feng; Melzer, David; Gibbs, J Raphael; Nalls, Michael A; Weedon, Michael N; Spector, Tim D; Richards, J Brent; Bandinelli, Stefania; Ferrucci, Luigi; Singleton, Andrew B; Frayling, Timothy M

    2013-01-01

    Genome-wide association (GWA) studies have been limited by the reliance on common variants present on microarrays or imputable from the HapMap Project data. More recently, the completion of the 1000 Genomes Project has provided variant and haplotype information for several million variants derived from sequencing over 1,000 individuals. To help understand the extent to which more variants (including low frequency (1% ≤ MAF HapMap and 1000 Genomes imputation, respectively, and 9 and 11 that reached a stricter, likely conservative, threshold of PHapMap imputed data. We also detected an association between a low frequency variant and phenotype that was previously missed by HapMap based imputation approaches. An association between rs112635299 and alpha-1 globulin near the SERPINA gene represented the known association between rs28929474 (MAF = 0.007) and alpha1-antitrypsin that predisposes to emphysema (P = 2.5×10(-12)). Our data provide important proof of principle that 1000 Genomes imputation will detect novel, low frequency-large effect associations.

  3. Geographic information abstractions: conceptual clarity for geographic modeling

    OpenAIRE

    T L Nyerges

    1991-01-01

    Just as we abstract our reality to make life intellectually manageable, we must create abstractions when we build models of geographic structure and process. Geographic information abstractions with aspects of theme, time, and space can be used to provide a comprehensive description of geographic reality in a geographic information system (GIS). In the context of geographic modeling a geographic information abstraction is defined as a simultaneous focus on important characteristics of geograp...

  4. Providing an imputation algorithm for missing values of longitudinal data using Cuckoo search algorithm: A case study on cervical dystonia.

    Science.gov (United States)

    Golabpour, Amin; Etminani, Kobra; Doosti, Hassan; Miri, Hamid Heidarian; Ghanbari, Reza

    2017-06-01

    Missing values in data are found in a large number of studies in the field of medical sciences, especially longitudinal ones, in which repeated measurements are taken from each person during the study. In this regard, several statistical endeavors have been performed on the concepts, issues, and theoretical methods during the past few decades. Herein, we focused on the missing data related to patients excluded from longitudinal studies. To this end, two statistical parameters of similarity and correlation coefficient were employed. In addition, metaheuristic algorithms were applied to achieve an optimal solution. The selected metaheuristic algorithm, which has a great search functionality, was the Cuckoo search algorithm. Profiles of subjects with cervical dystonia (CD) were used to evaluate the proposed model after applying missingness. It was concluded that the algorithm used in this study had a higher accuracy (98.48%), compared with similar approaches. Concomitant use of similar parameters and correlation coefficients led to a significant increase in accuracy of missing data imputation.

  5. Missing Value Imputation Improves Mortality Risk Prediction Following Cardiac Surgery: An Investigation of an Australian Patient Cohort.

    Science.gov (United States)

    Karim, Md Nazmul; Reid, Christopher M; Tran, Lavinia; Cochrane, Andrew; Billah, Baki

    2017-03-01

    The aim of this study was to evaluate the impact of missing values on the prediction performance of the model predicting 30-day mortality following cardiac surgery as an example. Information from 83,309 eligible patients, who underwent cardiac surgery, recorded in the Australia and New Zealand Society of Cardiac and Thoracic Surgeons (ANZSCTS) database registry between 2001 and 2014, was used. An existing 30-day mortality risk prediction model developed from ANZSCTS database was re-estimated using the complete cases (CC) analysis and using multiple imputation (MI) analysis. Agreement between the risks generated by the CC and MI analysis approaches was assessed by the Bland-Altman method. Performances of the two models were compared. One or more missing predictor variables were present in 15.8% of the patients in the dataset. The Bland-Altman plot demonstrated significant disagreement between the risk scores (prisk of mortality. Compared to CC analysis, MI analysis resulted in an average of 8.5% decrease in standard error, a measure of uncertainty. The MI model provided better prediction of mortality risk (observed: 2.69%; MI: 2.63% versus CC: 2.37%, Pvalues improved the 30-day mortality risk prediction following cardiac surgery. Copyright © 2016 Australian and New Zealand Society of Cardiac and Thoracic Surgeons (ANZSCTS) and the Cardiac Society of Australia and New Zealand (CSANZ). Published by Elsevier B.V. All rights reserved.

  6. Evaluation of genetic and geographical diversity of garlic (Allium sativum L.) ecotypes of Iran using ISSR and M13 molecular markers

    OpenAIRE

    M. Fakhrfeshani; F. Shahriari

    2016-01-01

    Garlic (Allium sativum L.) as one of the most valuable industrial and pharmaceutical plants has been studied from many aspects because of its importance. But there is not any sufficient and reliable information about its distribution and classification. So its types are categorized according to traditional, local or geographical names or some visual traits. The most important reason is the sterility of garlic and its flowering inability. This study, as the first report of using ISSR and M13 m...

  7. Evaluating outcome-correlated recruitment and geographic recruitment bias in a respondent-driven sample of people who inject drugs in Tijuana, Mexico.

    Science.gov (United States)

    Rudolph, Abby E; Gaines, Tommi L; Lozada, Remedios; Vera, Alicia; Brouwer, Kimberly C

    2014-12-01

    Respondent-driven sampling's (RDS) widespread use and reliance on untested assumptions suggests a need for new exploratory/diagnostic tests. We assessed geographic recruitment bias and outcome-correlated recruitment among 1,048 RDS-recruited people who inject drugs (Tijuana, Mexico). Surveys gathered demographics, drug/sex behaviors, activity locations, and recruiter-recruit pairs. Simulations assessed geographic and network clustering of active syphilis (RPR titers ≥1:8). Gender-specific predicted probabilities were estimated using logistic regression with GEE and robust standard errors. Active syphilis prevalence was 7 % (crude: men = 5.7 % and women = 16.6 %; RDS-adjusted: men = 6.7 % and women = 7.6 %). Syphilis clustered in the Zona Norte, a neighborhood known for drug and sex markets. Network simulations revealed geographic recruitment bias and non-random recruitment by syphilis status. Gender-specific prevalence estimates accounting for clustering were highest among those living/working/injecting/buying drugs in the Zona Norte and directly/indirectly connected to syphilis cases (men: 15.9 %, women: 25.6 %) and lowest among those with neither exposure (men: 3.0 %, women: 6.1 %). Future RDS analyses should assess/account for network and spatial dependencies.

  8. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.

    Science.gov (United States)

    Lazar, Cosmin; Gatto, Laurent; Ferro, Myriam; Bruley, Christophe; Burger, Thomas

    2016-04-01

    Missing values are a genuine issue in label-free quantitative proteomics. Recent works have surveyed the different statistical methods to conduct imputation and have compared them on real or simulated data sets and recommended a list of missing value imputation methods for proteomics application. Although insightful, these comparisons do not account for two important facts: (i) depending on the proteomics data set, the missingness mechanism may be of different natures and (ii) each imputation method is devoted to a specific type of missingness mechanism. As a result, we believe that the question at stake is not to find the most accurate imputation method in general but instead the most appropriate one. We describe a series of comparisons that support our views: For instance, we show that a supposedly "under-performing" method (i.e., giving baseline average results), if applied at the "appropriate" time in the data-processing pipeline (before or after peptide aggregation) on a data set with the "appropriate" nature of missing values, can outperform a blindly applied, supposedly "better-performing" method (i.e., the reference method from the state-of-the-art). This leads us to formulate few practical guidelines regarding the choice and the application of an imputation method in a proteomics context.

  9. Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research.

    Science.gov (United States)

    Antonelli, Joseph; Zigler, Corwin; Dominici, Francesca

    2017-07-01

    In comparative effectiveness research, we are often interested in the estimation of an average causal effect from large observational data (the main study). Often this data does not measure all the necessary confounders. In many occasions, an extensive set of additional covariates is measured for a smaller and non-representative population (the validation study). In this setting, standard approaches for missing data imputation might not be adequate due to the large number of missing covariates in the main data relative to the smaller sample size of the validation data. We propose a Bayesian approach to estimate the average causal effect in the main study that borrows information from the validation study to improve confounding adjustment. Our approach combines ideas of Bayesian model averaging, confounder selection, and missing data imputation into a single framework. It allows for different treatment effects in the main study and in the validation study, and propagates the uncertainty due to the missing data imputation and confounder selection when estimating the average causal effect (ACE) in the main study. We compare our method to several existing approaches via simulation. We apply our method to a study examining the effect of surgical resection on survival among 10 396 Medicare beneficiaries with a brain tumor when additional covariate information is available on 2220 patients in SEER-Medicare. We find that the estimated ACE decreases by 30% when incorporating additional information from SEER-Medicare. The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  10. Imputing HIV treatment start dates from routine laboratory data in South Africa: a validation study.

    Science.gov (United States)

    Maskew, Mhairi; Bor, Jacob; Hendrickson, Cheryl; MacLeod, William; Bärnighausen, Till; Pillay, Deenan; Sanne, Ian; Carmona, Sergio; Stevens, Wendy; Fox, Matthew P

    2017-01-17

    Poor clinical record keeping hinders health systems monitoring and patient care in many low resource settings. We develop and validate a novel method to impute dates of antiretroviral treatment (ART) initiation from routine laboratory data in South Africa's public sector HIV program. This method will enable monitoring of the national ART program using real-time laboratory data, avoiding the error potential of chart review. We developed an algorithm to impute ART start dates based on the date of a patient's "ART workup", i.e. the laboratory tests used to determine treatment readiness in national guidelines, and the time from ART workup to initiation based on clinical protocols (21 days). To validate the algorithm, we analyzed data from two large clinical HIV cohorts: Hlabisa HIV Treatment and Care Programme in rural KwaZulu-Natal; and Right to Care Cohort in urban Gauteng. Both cohorts contain known ART initiation dates and laboratory results imported directly from the National Health Laboratory Service. We assessed median time from ART workup to ART initiation and calculated sensitivity (SE), specificity (SP), positive predictive value (PPV), and negative predictive value (NPV) of our imputed start date vs. the true start date within a 6 month window. Heterogeneity was assessed across individual clinics and over time. We analyzed data from over 80,000 HIV-positive adults. Among patients who had a workup and initiated ART, median time to initiation was 16 days (IQR 7,31) in Hlabisa and 21 (IQR 8,43) in RTC cohort. Among patients with known ART start dates, SE of the imputed start date was 83% in Hlabisa and 88% in RTC, indicating this method accurately predicts ART start dates for about 85% of all ART initiators. In Hlabisa, PPV was 95%, indicating that for patients with a lab workup, true start dates were predicted with high accuracy. SP (100%) and NPV (92%) were also very high. Routine laboratory data can be used to infer ART initiation dates in South Africa

  11. A suggested approach for imputation of missing dietary data for young children in daycare

    OpenAIRE

    Stevens, June; Ou, Fang-Shu; Truesdale, Kimberly P.; Zeng, Donglin; Vaughn, Amber E.; Pratt, Charlotte; Ward, Dianne S.

    2015-01-01

    Background: Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult.Objective: The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method.Design: Data were from children aged 2-5 years in the My Parenting...

  12. Comparison of results from different imputation techniques for missing data from an anti-obesity drug trial

    DEFF Research Database (Denmark)

    Jørgensen, Anders W.; Lundstrøm, Lars H; Wetterslev, Jørn

    2014-01-01

    BACKGROUND: In randomised trials of medical interventions, the most reliable analysis follows the intention-to-treat (ITT) principle. However, the ITT analysis requires that missing outcome data have to be imputed. Different imputation techniques may give different results and some may lead to bias....... RESULTS: 561 participants were randomised. Compared to placebo, there was a significantly greater weight loss with topiramate in all analyses: 9.5 kg (SE 1.17) in the complete case analysis (N = 86), 6.8 kg (SE 0.66) using LOCF (N = 561), 6.4 kg (SE 0.90) using MI (N = 561) and 1.5 kg (SE 0.28) using BOCF...... (N = 561). CONCLUSIONS: The different imputation methods gave very different results. Contrary to widely stated claims, LOCF did not produce a conservative (i.e., lower) efficacy estimate compared to MI. Also, LOCF had a lower SE than MI....

  13. Trend in BMI z-score among Private Schools’ Students in Delhi using Multiple Imputation for Growth Curve Model

    Directory of Open Access Journals (Sweden)

    Vinay K Gupta

    2016-06-01

    Full Text Available Objective: The aim of the study is to assess the trend in mean BMI z-score among private schools’ students from their anthropometric records when there were missing values in the outcome. Methodology: The anthropometric measurements of student from class 1 to 12 were taken from the records of two private schools in Delhi, India from 2005 to 2010. These records comprise of an unbalanced longitudinal data that is not all the students had measurements recorded at each year. The trend in mean BMI z-score was estimated through growth curve model. Prior to that, missing values of BMI z-score were imputed through multiple imputation using the same model. A complete case analysis was also performed after excluding missing values to compare the results with those obtained from analysis of multiply imputed data. Results: The mean BMI z-score among school student significantly decreased over time in imputed data (β= -0.2030, se=0.0889, p=0.0232 after adjusting age, gender, class and school. Complete case analysis also shows a decrease in mean BMI z-score though it was not statistically significant (β= -0.2861, se=0.0987, p=0.065. Conclusions: The estimates obtained from multiple imputation analysis were better than those of complete data after excluding missing values in terms of lower standard errors. We showed that anthropometric measurements from schools records can be used to monitor the weight status of children and adolescents and multiple imputation using growth curve model can be useful while analyzing such data

  14. Using full-cohort data in nested case-control and case-cohort studies by multiple imputation.

    Science.gov (United States)

    Keogh, Ruth H; White, Ian R

    2013-10-15

    In many large prospective cohorts, expensive exposure measurements cannot be obtained for all individuals. Exposure-disease association studies are therefore often based on nested case-control or case-cohort studies in which complete information is obtained only for sampled individuals. However, in the full cohort, there may be a large amount of information on cheaply available covariates and possibly a surrogate of the main exposure(s), which typically goes unused. We view the nested case-control or case-cohort study plus the remainder of the cohort as a full-cohort study with missing data. Hence, we propose using multiple imputation (MI) to utilise information in the full cohort when data from the sub-studies are analysed. We use the fully observed data to fit the imputation models. We consider using approximate imputation models and also using rejection sampling to draw imputed values from the true distribution of the missing values given the observed data. Simulation studies show that using MI to utilise full-cohort information in the analysis of nested case-control and case-cohort studies can result in important gains in efficiency, particularly when a surrogate of the main exposure is available in the full cohort. In simulations, this method outperforms counter-matching in nested case-control studies and a weighted analysis for case-cohort studies, both of which use some full-cohort information. Approximate imputation models perform well except when there are interactions or non-linear terms in the outcome model, where imputation using rejection sampling works well. Copyright © 2013 John Wiley & Sons, Ltd.

  15. FCMPSO: An Imputation for Missing Data Features in Heart Disease Classification

    Science.gov (United States)

    Salleh, Mohd Najib Mohd; Ashikin Samat, Nurul

    2017-08-01

    The application of data mining and machine learning in directing clinical research into possible hidden knowledge is becoming greatly influential in medical areas. Heart Disease is a killer disease around the world, and early prevention through efficient methods can help to reduce the mortality number. Medical data may contain many uncertainties, as they are fuzzy and vague in nature. Nonetheless, imprecise features data such as no values and missing values can affect quality of classification results. Nevertheless, the other complete features are still capable to give information in certain features. Therefore, an imputation approach based on Fuzzy C-Means and Particle Swarm Optimization (FCMPSO) is developed in preprocessing stage to help fill in the missing values. Then, the complete dataset is trained in classification algorithm, Decision Tree. The experiment is trained with Heart Disease dataset and the performance is analysed using accuracy, precision, and ROC values. Results show that the performance of Decision Tree is increased after the application of FCMSPO for imputation.

  16. Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans.

    Science.gov (United States)

    Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne; Bourgeois, Stephane; Svensson, Peter J; Wadelius, Mia; Deloukas, Panos; Montgomery, Stephen B; Altman, Russ B

    2017-11-24

    Genome-wide association studies are useful for discovering genotype-phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into "gene level" effects. Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression-on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort.

  17. Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans

    Directory of Open Access Journals (Sweden)

    Assaf Gottlieb

    2017-11-01

    Full Text Available Abstract Background Genome-wide association studies are useful for discovering genotype–phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into “gene level” effects. Methods Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression—on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. Results We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Conclusions Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort

  18. Local exome sequences facilitate imputation of less common variants and increase power of genome wide association studies.

    Directory of Open Access Journals (Sweden)

    Peter K Joshi

    Full Text Available The analysis of less common variants in genome-wide association studies promises to elucidate complex trait genetics but is hampered by low power to reliably detect association. We show that addition of population-specific exome sequence data to global reference data allows more accurate imputation, particularly of less common SNPs (minor allele frequency 1-10% in two very different European populations. The imputation improvement corresponds to an increase in effective sample size of 28-38%, for SNPs with a minor allele frequency in the range 1-3%.

  19. Random Forest as an Imputation Method for Education and Psychology Research: Its Impact on Item Fit and Difficulty of the Rasch Model

    Science.gov (United States)

    Golino, Hudson F.; Gomes, Cristiano M. A.

    2016-01-01

    This paper presents a non-parametric imputation technique, named random forest, from the machine learning field. The random forest procedure has two main tuning parameters: the number of trees grown in the prediction and the number of predictors used. Fifty experimental conditions were created in the imputation procedure, with different…

  20. A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data

    NARCIS (Netherlands)

    Kim, Young Jin; Lee, Juyoung; Kim, Bong-Jo; Park, Taesung; Abecasis, Gonçalo; Almeida, Marcio; Altshuler, David; Asimit, Jennifer L.; Atzmon, Gil; Barber, Mathew; Barzilai, Nir; Beer, Nicola L.; Bell, Graeme I.; Below, Jennifer; Blackwell, Tom; Blangero, John; Boehnke, Michael; Bowden, Donald W.; Burtt, Noël; Chambers, John; Chen, Han; Chen, Peng; Chines, Peter S.; Choi, Sungkyoung; Churchhouse, Claire; Cingolani, Pablo; Cornes, Belinda K.; Cox, Nancy; Day-Williams, Aaron G.; Duggirala, Ravindranath; Dupuis, Josée; Dyer, Thomas; Feng, Shuang; Fernandez-Tajes, Juan; Ferreira, Teresa; Fingerlin, Tasha E.; Flannick, Jason; Florez, Jose; Fontanillas, Pierre; Frayling, Timothy M.; Fuchsberger, Christian; Gamazon, Eric R.; Gaulton, Kyle; Ghosh, Saurabh; Glaser, Benjamin; Gloyn, Anna; Grossman, Robert L.; Grundstad, Jason; Hanis, Craig; Heath, Allison; Highland, Heather; Horikoshi, Momoko; Huh, Ik-Soo; Huyghe, Jeroen R.; Ikram, Kamran; Jablonski, Kathleen A.; Jun, Goo; Kato, Norihiro; Kim, Jayoun; King, C. Ryan; Kooner, Jaspal; Kwon, Min-Seok; Im, Hae Kyung; Laakso, Markku; Lam, Kevin Koi-Yau; Lee, Jaehoon; Lee, Selyeong; Lee, Sungyoung; Lehman, Donna M.; Li, Heng; Lindgren, Cecilia M.; Liu, Xuanyao; Livne, Oren E.; Locke, Adam E.; Mahajan, Anubha; Maller, Julian B.; Manning, Alisa K.; Maxwell, Taylor J.; Mazoure, Alexander; McCarthy, Mark I.; Meigs, James B.; Min, Byungju; Mohlke, Karen L.; Morris, Andrew P.; Musani, Solomon; Nagai, Yoshihiko; Ng, Maggie C. Y.; Nicolae, Dan; Oh, Sohee; Palmer, Nicholette; Pollin, Toni I.; Prokopenko, Inga; Reich, David; Rivas, Manuel A.; Scott, Laura J.; Seielstad, Mark; Cho, Yoon Shin; Sim, Xueling; Sladek, Robert; Smith, Philip; Tachmazidou, Ioanna; Tai, E. Shyong; teo, Yik Ying; Teslovich, Tanya M.; Torres, Jason; Trubetskoy, Vasily; Willems, Sara M.; Williams, Amy L.; Wilson, James G.; Wiltshire, Steven; Won, Sungho; Wood, Andrew R.; Xu, Wang; Yoon, Joon; Zawistowski, Matthew; Zeggini, Eleftheria; Zhang, Weihua; Zöllner, Sebastian

    2015-01-01

    Background: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the imputation

  1. A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.

    Science.gov (United States)

    De Silva, Anurika Priyanjali; Moreno-Betancur, Margarita; De Livera, Alysha Madhu; Lee, Katherine Jane; Simpson, Julie Anne

    2017-07-25

    Missing data is a common problem in epidemiological studies, and is particularly prominent in longitudinal data, which involve multiple waves of data collection. Traditional multiple imputation (MI) methods (fully conditional specification (FCS) and multivariate normal imputation (MVNI)) treat repeated measurements of the same time-dependent variable as just another 'distinct' variable for imputation and therefore do not make the most of the longitudinal structure of the data. Only a few studies have explored extensions to the standard approaches to account for the temporal structure of longitudinal data. One suggestion is the two-fold fully conditional specification (two-fold FCS) algorithm, which restricts the imputation of a time-dependent variable to time blocks where the imputation model includes measurements taken at the specified and adjacent times. To date, no study has investigated the performance of two-fold FCS and standard MI methods for handling missing data in a time-varying covariate with a non-linear trajectory over time - a commonly encountered scenario in epidemiological studies. We simulated 1000 datasets of 5000 individuals based on the Longitudinal Study of Australian Children (LSAC). Three missing data mechanisms: missing completely at random (MCAR), and a weak and a strong missing at random (MAR) scenarios were used to impose missingness on body mass index (BMI) for age z-scores; a continuous time-varying exposure variable with a non-linear trajectory over time. We evaluated the performance of FCS, MVNI, and two-fold FCS for handling up to 50% of missing data when assessing the association between childhood obesity and sleep problems. The standard two-fold FCS produced slightly more biased and less precise estimates than FCS and MVNI. We observed slight improvements in bias and precision when using a time window width of two for the two-fold FCS algorithm compared to the standard width of one. We recommend the use of FCS or MVNI in a similar

  2. Reconstructing geographical parthenogenesis

    DEFF Research Database (Denmark)

    Kirchheimer, Bernhard; Wessely, Johannes; Gattringer, Andreas

    2018-01-01

    Asexual taxa often have larger ranges than their sexual progenitors, particularly in areas affected by Pleistocene glaciations. The reasons given for this geographical parthenogenesis' are contentious, with expansion of the ecological niche or colonisation advantages of uniparental reproduction a...... effects of niche differentiation and reproductive modes on range formation of related sexual and asexual taxa arising from their differential sensitivity to minority cytotype disadvantage....

  3. Imputation by the mean score should be avoided when validating a Patient Reported Outcomes questionnaire by a Rasch model in presence of informative missing data

    LENUS (Irish Health Repository)

    Hardouin, Jean-Benoit

    2011-07-14

    Abstract Background Nowadays, more and more clinical scales consisting in responses given by the patients to some items (Patient Reported Outcomes - PRO), are validated with models based on Item Response Theory, and more specifically, with a Rasch model. In the validation sample, presence of missing data is frequent. The aim of this paper is to compare sixteen methods for handling the missing data (mainly based on simple imputation) in the context of psychometric validation of PRO by a Rasch model. The main indexes used for validation by a Rasch model are compared. Methods A simulation study was performed allowing to consider several cases, notably the possibility for the missing values to be informative or not and the rate of missing data. Results Several imputations methods produce bias on psychometrical indexes (generally, the imputation methods artificially improve the psychometric qualities of the scale). In particular, this is the case with the method based on the Personal Mean Score (PMS) which is the most commonly used imputation method in practice. Conclusions Several imputation methods should be avoided, in particular PMS imputation. From a general point of view, it is important to use an imputation method that considers both the ability of the patient (measured for example by his\\/her score), and the difficulty of the item (measured for example by its rate of favourable responses). Another recommendation is to always consider the addition of a random process in the imputation method, because such a process allows reducing the bias. Last, the analysis realized without imputation of the missing data (available case analyses) is an interesting alternative to the simple imputation in this context.

  4. Validation of a geographic information system for the evaluation of the soil radon exhalation potential in South-Tyrol and Veneto, Italy.

    Science.gov (United States)

    Bertolo, A; Verdi, L

    2001-01-01

    The PERS (soil radon exhalation potential) project was promoted by ANPA (Italian Environmental Protection Agency) together with the Università Cattolica del Sacro Cuore of Rome: the aim was to produce a geographic information system allowing the discovery of regions with different radon exhalation potential starting from some territorial knowledge. Some environmental measurements were carried out within this project in selected areas in South-Tyrol and Veneto. The measurement of radon in springwater and groundwater as well as in soil gas plays a decisive role for the validation of the algorithm for computing the PERS. Along with technical aspects, a possible use of the PERS method by the Regional Environmental Protection Agencies and by other agencies is discussed with the scope of identifying radon prone areas, as stated in the Italian 'Decreto Legislativo' 26 May 2000, n. 241. Moreover the forecasting power of PERS regarding indoor radon concentration is analysed.

  5. Validation of a geographic information system for the evaluation of the soil radon exhalation potential in South-Tyrol and Veneto (Italy)

    Energy Technology Data Exchange (ETDEWEB)

    Bertolo, A.; Verdi, L

    2001-07-01

    The PERS (soil radon exhalation potential) project was promoted by ANPA (Italian Environmental Protection Agency) together with the Universita Cattolica del Sacro Cuore of Rome: the aim was to produce a geographic information system allowing the discovery of regions with different radon exhalation potential starting from some territorial knowledge. Some environmental measurements were carried out within this project in selected areas in South-Tyrol and Veneto. The measurement of radon in springwater and groundwater as well as in soil gas plays a decisive role for the validation of the algorithm for computing the PERS. Along with technical aspects, a possible use of the PERS method by the Regional Environmental Protection Agencies and by other agencies is discussed with the scope of identifying radon prone areas, as stated in the Italian 'Decreto Legislativo' 26 May 2000, n. 241. Moreover the forecasting power of PERS regarding indoor radon concentration is analysed. (author)

  6. Evaluation of five geographic sources of loblolly pine to environmental stress such as the air-borne disease organism Cronartium quercuum f. sp. fusiforme, and testing for concentrations of specific elements in diseased tissue. Progress report

    Energy Technology Data Exchange (ETDEWEB)

    Powers, H.R. Jr.

    1982-10-01

    Data were taken at the end of the third growing season on the 1920 seedlings representing the four geographic seed sources of loblolly pine being evaluated. Information was obtained on survival, those infected with fusiform rust, and height. In addition to these four different seed sources, these plots also include seedlings from a rust resistant orchard of both loblolly and slash pines, and susceptible Georgia slash pines, for a total of 3360 seedlings. The results show excellent survival rates with a fusiform rust infection level too low to draw meaningful conclusion after only three years. Height was a significant variable. (PSB)

  7. Oral and cutaneous lymphomas other than mycosis fungoides and sézary syndrome in a mexican cohort: Recategorization and evaluation of international geographical disparities

    Directory of Open Access Journals (Sweden)

    Amparo Hernández-Salazar

    2017-01-01

    Full Text Available Background: Nonmycosis fungoides/Sézary syndrome (non-MF/SS primary cutaneous lymphomas (PCL are currently categorized under the 2005-World Health Organization/European Organization for Research and Treatment of Cancer (WHO-EORTC classification for PCL. These differ in behavior from secondary cutaneous lymphomas (SCL and to lymphomas limited to the oral cavity (primary oral lymphomas [POL] both categorized under the 2016-WHO classification for lymphoid neoplasms. Aims: This study aims to report the first series of non-MF/SS PCL, SCL, and POL in a Mexican cohort, examine the applicability of current classification systems and compare our findings with those from foreign cohorts. Materials and Methods: Eighteen non-MF/SS PCL, four SCL, and two POL with available tissue for morphology and immunophenotypic assessment were reclassified according to the 2005-WHO/EORTC and 2016-WHO classifications. Results: Non-MF/SS PCLs were primarily of T-cell origin (61% where CD30+ lymphoproliferative disorders predominated, followed by Epstein–Barr virus-induced lymphomas, and peripheral T-cell lymphomas, not otherwise specified. Primary cutaneous B-cell lymphomas (BCL were primarily of follicle center cell origin followed by postgerminal lymphomas of the diffuse large BCL variety. Conclusions: Most non-MF/SS PCL, SCL, and POL can be adequately categorized according to the 2005-WHO/EORTC and 2016-WHO classification systems, even when dealing with clinically atypical cases. The relative frequencies in our cohort hold closer similarities to Asian registries than from those of Europe/USA, supporting the concept of individual and/or racial susceptibility, and the notion of geographical variances in the rate of lymphomas. In particular, such disparity may arise from viral-induced lymphomas which might show partial geographical restriction.

  8. Imputation of variants from the 1000 Genomes Project modestly improves known associations and can identify low-frequency variant-phenotype associations undetected by HapMap based imputation.

    Directory of Open Access Journals (Sweden)

    Andrew R Wood

    Full Text Available Genome-wide association (GWA studies have been limited by the reliance on common variants present on microarrays or imputable from the HapMap Project data. More recently, the completion of the 1000 Genomes Project has provided variant and haplotype information for several million variants derived from sequencing over 1,000 individuals. To help understand the extent to which more variants (including low frequency (1% ≤ MAF <5% and rare variants (<1% can enhance previously identified associations and identify novel loci, we selected 93 quantitative circulating factors where data was available from the InCHIANTI population study. These phenotypes included cytokines, binding proteins, hormones, vitamins and ions. We selected these phenotypes because many have known strong genetic associations and are potentially important to help understand disease processes. We performed a genome-wide scan for these 93 phenotypes in InCHIANTI. We identified 21 signals and 33 signals that reached P<5×10(-8 based on HapMap and 1000 Genomes imputation, respectively, and 9 and 11 that reached a stricter, likely conservative, threshold of P<5×10(-11 respectively. Imputation of 1000 Genomes genotype data modestly improved the strength of known associations. Of 20 associations detected at P<5×10(-8 in both analyses (17 of which represent well replicated signals in the NHGRI catalogue, six were captured by the same index SNP, five were nominally more strongly associated in 1000 Genomes imputed data and one was nominally more strongly associated in HapMap imputed data. We also detected an association between a low frequency variant and phenotype that was previously missed by HapMap based imputation approaches. An association between rs112635299 and alpha-1 globulin near the SERPINA gene represented the known association between rs28929474 (MAF = 0.007 and alpha1-antitrypsin that predisposes to emphysema (P = 2.5×10(-12. Our data provide important proof of

  9. Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research

    Science.gov (United States)

    2012-01-01

    Background Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. Methods A simulation study of a linear regression with a response Y and two predictors X1 and X2 was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10) vs. moderate correlations (r=.50) with X’s and Y. Results The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful. Conclusion More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3. PMID:23216665

  10. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    DEFF Research Database (Denmark)

    Huang, Jie; Howie, Bryan; Mccarthy, Shane

    2015-01-01

    depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show...

  11. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    NARCIS (Netherlands)

    J. Huang (Jie); B. Howie (Bryan); S. McCarthy (Shane); Y. Memari (Yasin); K. Walter (Klaudia); J.L. Min (Josine L.); P. Danecek (Petr); G. Malerba (Giovanni); E. Trabetti (Elisabetta); H.-F. Zheng (Hou-Feng); G. Gambaro (Giovanni); J.B. Richards (Brent); R. Durbin (Richard); N.J. Timpson (Nicholas); J. Marchini (Jonathan); N. Soranzo (Nicole); S.H. Al Turki (Saeed); A. Amuzu (Antoinette); C. Anderson (Carl); R. Anney (Richard); D. Antony (Dinu); M.S. Artigas; M. Ayub (Muhammad); S. Bala (Senduran); J.C. Barrett (Jeffrey); I. Barroso (Inês); P.L. Beales (Philip); M. Benn (Marianne); J. Bentham (Jamie); S. Bhattacharya (Shoumo); E. Birney (Ewan); D.H.R. Blackwood (Douglas); M. Bobrow (Martin); E. Bochukova (Elena); P.F. Bolton (Patrick F.); R. Bounds (Rebecca); C. Boustred (Chris); G. Breen (Gerome); M. Calissano (Mattia); K. Carss (Keren); J.P. Casas (Juan Pablo); J.C. Chambers (John C.); R. Charlton (Ruth); K. Chatterjee (Krishna); L. Chen (Lu); A. Ciampi (Antonio); S. Cirak (Sebahattin); P. Clapham (Peter); G. Clement (Gail); G. Coates (Guy); M. Cocca (Massimiliano); D.A. Collier (David); C. Cosgrove (Catherine); T. Cox (Tony); N.J. Craddock (Nick); L. Crooks (Lucy); S. Curran (Sarah); D. Curtis (David); A. Daly (Allan); I.N.M. Day (Ian N.M.); A.G. Day-Williams (Aaron); G.V. Dedoussis (George); T. Down (Thomas); Y. Du (Yuanping); C.M. van Duijn (Cornelia); I. Dunham (Ian); T. Edkins (Ted); R. Ekong (Rosemary); P. Ellis (Peter); D.M. Evans (David); I.S. Farooqi (I. Sadaf); D.R. Fitzpatrick (David R.); P. Flicek (Paul); J. Floyd (James); A.R. Foley (A. Reghan); C.S. Franklin (Christopher S.); M. Futema (Marta); L. Gallagher (Louise); P. Gasparini (Paolo); T.R. Gaunt (Tom); M. Geihs (Matthias); D. Geschwind (Daniel); C.M.T. Greenwood (Celia); H. Griffin (Heather); D. Grozeva (Detelina); X. Guo (Xiaosen); X. Guo (Xueqin); H. Gurling (Hugh); D. Hart (Deborah); A.E. Hendricks (Audrey E.); P.A. Holmans (Peter A.); L. Huang (Liren); T. Hubbard (Tim); S.E. Humphries (Steve E.); M.E. Hurles (Matthew); P.G. Hysi (Pirro); V. Iotchkova (Valentina); A. Isaacs (Aaron); D.K. Jackson (David K.); Y. Jamshidi (Yalda); J. Johnson (Jon); C. Joyce (Chris); K.J. Karczewski (Konrad); J. Kaye (Jane); T. Keane (Thomas); J.P. Kemp (John); K. Kennedy (Karen); A. Kent (Alastair); J. Keogh (Julia); F. Khawaja (Farrah); M.E. Kleber (Marcus); M. Van Kogelenberg (Margriet); A. Kolb-Kokocinski (Anja); J.S. Kooner (Jaspal S.); G. Lachance (Genevieve); C. Langenberg (Claudia); C. Langford (Cordelia); D. Lawson (Daniel); I. Lee (Irene); E.M. van Leeuwen (Elisa); M. Lek (Monkol); R. Li (Rui); Y. Li (Yingrui); J. Liang (Jieqin); H. Lin (Hong); R. Liu (Ryan); J. Lönnqvist (Jouko); L.R. Lopes (Luis R.); M.C. Lopes (Margarida); J. Luan; D.G. MacArthur (Daniel G.); M. Mangino (Massimo); G. Marenne (Gaëlle); W. März (Winfried); J. Maslen (John); A. Matchan (Angela); I. Mathieson (Iain); P. McGuffin (Peter); A.M. McIntosh (Andrew); A.G. McKechanie (Andrew G.); A. McQuillin (Andrew); S. Metrustry (Sarah); N. Migone (Nicola); H.M. Mitchison (Hannah M.); A. Moayyeri (Alireza); J. Morris (James); R. Morris (Richard); D. Muddyman (Dawn); F. Muntoni; B.G. Nordestgaard (Børge G.); K. Northstone (Kate); M.C. O'donovan (Michael); S. O'Rahilly (Stephen); A. Onoufriadis (Alexandros); K. Oualkacha (Karim); M.J. Owen (Michael J.); A. Palotie (Aarno); K. Panoutsopoulou (Kalliope); V. Parker (Victoria); J.R. Parr (Jeremy R.); L. Paternoster (Lavinia); T. Paunio (Tiina); F. Payne (Felicity); S.J. Payne (Stewart J.); J.R.B. Perry (John); O.P.H. Pietiläinen (Olli); V. Plagnol (Vincent); R.C. Pollitt (Rebecca C.); S. Povey (Sue); M.A. Quail (Michael A.); L. Quaye (Lydia); L. Raymond (Lucy); K. Rehnström (Karola); C.K. Ridout (Cheryl K.); S.M. Ring (Susan); G.R.S. Ritchie (Graham R.S.); N. Roberts (Nicola); R.L. Robinson (Rachel L.); D.B. Savage (David); P.J. Scambler (Peter); S. Schiffels (Stephan); M. Schmidts (Miriam); N. Schoenmakers (Nadia); R.H. Scott (Richard H.); R.A. Scott (Robert); R.K. Semple (Robert K.); E. Serra (Eva); S.I. Sharp (Sally I.); A.C. Shaw (Adam C.); H.A. Shihab (Hashem A.); S.-Y. Shin (So-Youn); D. Skuse (David); K.S. Small (Kerrin); C. Smee (Carol); G.D. Smith; L. Southam (Lorraine); O. Spasic-Boskovic (Olivera); T.D. Spector (Timothy); D. St. Clair (David); B. St Pourcain (Beate); J. Stalker (Jim); E. Stevens (Elizabeth); J. Sun (Jianping); G. Surdulescu (Gabriela); J. Suvisaari (Jaana); P. Syrris (Petros); I. Tachmazidou (Ioanna); R. Taylor (Rohan); J. Tian (Jing); M.D. Tobin (Martin); D. Toniolo (Daniela); M. Traglia (Michela); A. Tybjaerg-Hansen; A.M. Valdes; A.M. Vandersteen (Anthony M.); A. Varbo (Anette); P. Vijayarangakannan (Parthiban); P.M. Visscher (Peter); L.V. Wain (Louise); J.T. Walters (James); G. Wang (Guangbiao); J. Wang (Jun); Y. Wang (Yu); K. Ward (Kirsten); E. Wheeler (Eleanor); P.H. Whincup (Peter); T. Whyte (Tamieka); H.J. Williams (Hywel J.); K.A. Williamson (Kathleen); C. Wilson (Crispian); S.G. Wilson (Scott); K. Wong (Kim); C. Xu (Changjiang); J. Yang (Jian); G. Zaza (Gianluigi); E. Zeggini (Eleftheria); F. Zhang (Feng); P. Zhang (Pingbo); W. Zhang (Weihua)

    2015-01-01

    textabstractImputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced

  12. Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research.

    Science.gov (United States)

    Hardt, Jochen; Herke, Max; Leonhart, Rainer

    2012-12-05

    Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. A simulation study of a linear regression with a response Y and two predictors X1 and X2 was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10) vs. moderate correlations (r=.50) with X's and Y. The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful. More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.

  13. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel.

    NARCIS (Netherlands)

    Huang, J.; Howie, B.; McCarthy, S.; Memari, Y.; Walter, K.; Min, J.L.; Danecek, P.; Malerba, G.; Trabetti, E.; Zheng, H.F.; Gambaro, G.; Richards, J.B.; Durbin, R.; Timpson, N.J.; Marchini, J.; Soranzo, N.; Schmidts, M.

    2015-01-01

    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth

  14. Imputation of the Date of HIV Seroconversion in a Cohort of Seroprevalent Subjects: Implications for Analysis of Late HIV Diagnosis

    NARCIS (Netherlands)

    Sobrino-Vegas, Paz; Pérez-Hoyos, Santiago; Geskus, Ronald; Padilla, Belén; Segura, Ferrán; Rubio, Rafael; del Romero, Jorge; Santos, Jesus; Moreno, Santiago; del Amo, Julia

    2012-01-01

    Objectives. Since subjects may have been diagnosed before cohort entry, analysis of late HIV diagnosis (LD) is usually restricted to the newly diagnosed. We estimate the magnitude and risk factors of LD in a cohort of seroprevalent individuals by imputing seroconversion dates. Methods. Multicenter

  15. Application of a novel hybrid method for spatiotemporal data imputation: A case study of the Minqin County groundwater level

    Science.gov (United States)

    Zhang, Zhongrong; Yang, Xuan; Li, Hao; Li, Weide; Yan, Haowen; Shi, Fei

    2017-10-01

    The techniques for data analyses have been widely developed in past years, however, missing data still represent a ubiquitous problem in many scientific fields. In particular, dealing with missing spatiotemporal data presents an enormous challenge. Nonetheless, in recent years, a considerable amount of research has focused on spatiotemporal problems, making spatiotemporal missing data imputation methods increasingly indispensable. In this paper, a novel spatiotemporal hybrid method is proposed to verify and imputed spatiotemporal missing values. This new method, termed SOM-FLSSVM, flexibly combines three advanced techniques: self-organizing feature map (SOM) clustering, the fruit fly optimization algorithm (FOA) and the least squares support vector machine (LSSVM). We employ a cross-validation (CV) procedure and FOA swarm intelligence optimization strategy that can search available parameters and determine the optimal imputation model. The spatiotemporal underground water data for Minqin County, China, were selected to test the reliability and imputation ability of SOM-FLSSVM. We carried out a validation experiment and compared three well-studied models with SOM-FLSSVM using a different missing data ratio from 0.1 to 0.8 in the same data set. The results demonstrate that the new hybrid method performs well in terms of both robustness and accuracy for spatiotemporal missing data.

  16. Investigating the Effects of Imputation Methods for Modelling Gene Networks Using a Dynamic Bayesian Network from Gene Expression Data

    Science.gov (United States)

    CHAI, Lian En; LAW, Chow Kuan; MOHAMAD, Mohd Saberi; CHONG, Chuii Khim; CHOON, Yee Wen; DERIS, Safaai; ILLIAS, Rosli Md

    2014-01-01

    Background: Gene expression data often contain missing expression values. Therefore, several imputation methods have been applied to solve the missing values, which include k-nearest neighbour (kNN), local least squares (LLS), and Bayesian principal component analysis (BPCA). However, the effects of these imputation methods on the modelling of gene regulatory networks from gene expression data have rarely been investigated and analysed using a dynamic Bayesian network (DBN). Methods: In the present study, we separately imputed datasets of the Escherichia coli S.O.S. DNA repair pathway and the Saccharomyces cerevisiae cell cycle pathway with kNN, LLS, and BPCA, and subsequently used these to generate gene regulatory networks (GRNs) using a discrete DBN. We made comparisons on the basis of previous studies in order to select the gene network with the least error. Results: We found that BPCA and LLS performed better on larger networks (based on the S. cerevisiae dataset), whereas kNN performed better on smaller networks (based on the E. coli dataset). Conclusion: The results suggest that the performance of each imputation method is dependent on the size of the dataset, and this subsequently affects the modelling of the resultant GRNs using a DBN. In addition, on the basis of these results, a DBN has the capacity to discover potential edges, as well as display interactions, between genes. PMID:24876803

  17. Missing data in a multi-item instrument were best handled by multiple imputation at the item score level

    NARCIS (Netherlands)

    Eekhout, Iris; de Vet, Henrica C. W.; Twisk, Jos W. R.; Brand, Jaap P. L.; de Boer, Michiel R.; Heymans, Martijn W.

    Objectives: Regardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced techniques such as multiple imputation (MI) are available. The objective of this study was to explore the performance of simple and more advanced methods for handling

  18. Propensity Scoring after Multiple Imputation in a Retrospective Study on Adjuvant Radiation Therapy in Lymph-Node Positive Vulvar Cancer

    NARCIS (Netherlands)

    Eulenburg, Christine; Suling, Anna; Neuser, Petra; Reuss, Alexander; Canzler, Ulrich; Fehm, Tanja; Luyten, Alexander; Hellriegel, Martin; Woelber, Linn; Mahner, Sven

    2016-01-01

    Propensity scoring (PS) is an established tool to account for measured confounding in non-randomized studies. These methods are sensitive to missing values, which are a common problem in observational data. The combination of multiple imputation of missing values and different propensity scoring

  19. The Use of Imputed Sibling Genotypes in Sibship-Based Association Analysis: On Modeling Alternatives, Power and Model Misspecification

    NARCIS (Netherlands)

    Minica, C.C.; Dolan, C.V.; Willemsen, G.; Vink, J.M.; Boomsma, D.I.

    2013-01-01

    When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of

  20. Nearest neighbor imputation of species-level, plot-scale forest structure attributes from LiDAR data

    Science.gov (United States)

    Andrew T. Hudak; Nicholas L. Crookston; Jeffrey S. Evans; David E. Hall; Michael J. Falkowski

    2008-01-01

    Meaningful relationships between forest structure attributes measured in representative field plots on the ground and remotely sensed data measured comprehensively across the same forested landscape facilitate the production of maps of forest attributes such as basal area (BA) and tree density (TD). Because imputation methods can efficiently predict multiple response...

  1. Nonlinear Imputation of PaO2/FIO2 From SpO2/FIO2 Among Mechanically Ventilated Patients in the ICU: A Prospective, Observational Study.

    Science.gov (United States)

    Brown, Samuel M; Duggal, Abhijit; Hou, Peter C; Tidswell, Mark; Khan, Akram; Exline, Matthew; Park, Pauline K; Schoenfeld, David A; Liu, Ming; Grissom, Colin K; Moss, Marc; Rice, Todd W; Hough, Catherine L; Rivers, Emanuel; Thompson, B Taylor; Brower, Roy G

    2017-08-01

    In the contemporary ICU, mechanically ventilated patients may not have arterial blood gas measurements available at relevant timepoints. Severity criteria often depend on arterial blood gas results. Retrospective studies suggest that nonlinear imputation of PaO2/FIO2 from SpO2/FIO2 is accurate, but this has not been established prospectively among mechanically ventilated ICU patients. The objective was to validate the superiority of nonlinear imputation of PaO2/FIO2 among mechanically ventilated patients and understand what factors influence the accuracy of imputation. Simultaneous SpO2, oximeter characteristics, receipt of vasopressors, and skin pigmentation were recorded at the time of a clinical arterial blood gas. Acute respiratory distress syndrome criteria were recorded. For each imputation method, we calculated both imputation error and the area under the curve for patients meeting criteria for acute respiratory distress syndrome (PaO2/FIO2 ≤ 300) and moderate-severe acute respiratory distress syndrome (PaO2/FIO2 ≤ 150). Nine hospitals within the Prevention and Early Treatment of Acute Lung Injury network. We prospectively enrolled 703 mechanically ventilated patients admitted to the emergency departments or ICUs of participating study hospitals. None. We studied 1,034 arterial blood gases from 703 patients; 650 arterial blood gases were associated with SpO2 less than or equal to 96%. Nonlinear imputation had consistently lower error than other techniques. Among all patients, nonlinear had a lower error (p < 0.001) and higher (p < 0.001) area under the curve (0.87; 95% CI, 0.85-0.90) for PaO2/FIO2 less than or equal to 300 than linear/log-linear (0.80; 95% CI, 0.76-0.83) imputation. All imputation methods better identified moderate-severe acute respiratory distress syndrome (PaO2/FIO2 ≤ 150); nonlinear imputation remained superior (p < 0.001). For PaO2/FIO2 less than or equal to 150, the sensitivity and specificity for nonlinear imputation were 0

  2. Candidate gene analysis using imputed genotypes: cell cycle single-nucleotide polymorphisms and ovarian cancer risk

    DEFF Research Database (Denmark)

    Goode, Ellen L; Fridley, Brooke L; Vierkant, Robert A

    2009-01-01

    Polymorphisms in genes critical to cell cycle control are outstanding candidates for association with ovarian cancer risk; numerous genes have been interrogated by multiple research groups using differing tagging single-nucleotide polymorphism (SNP) sets. To maximize information gleaned from...... existing genotype data, we conducted a combined analysis of five independent studies of invasive epithelial ovarian cancer. Up to 2,120 cases and 3,382 controls were genotyped in the course of two collaborations at a variety of SNPs in 11 cell cycle genes (CDKN2C, CDKN1A, CCND3, CCND1, CCND2, CDKN1B, CDK2......, and rs3212891; CDK2 rs2069391, rs2069414, and rs17528736; and CCNE1 rs3218036. These results exemplify the utility of imputation in candidate gene studies and lend evidence to a role of cell cycle genes in ovarian cancer etiology, suggest a reduced set of SNPs to target in additional cases and controls....

  3. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation

    Science.gov (United States)

    Artigas, María Soler; Wain, Louise V.; Miller, Suzanne; Kheirallah, Abdul Kader; Huffman, Jennifer E.; Ntalla, Ioanna; Shrine, Nick; Obeidat, Ma'en; Trochet, Holly; McArdle, Wendy L.; Alves, Alexessander Couto; Hui, Jennie; Zhao, Jing Hua; Joshi, Peter K.; Teumer, Alexander; Albrecht, Eva; Imboden, Medea; Rawal, Rajesh; Lopez, Lorna M.; Marten, Jonathan; Enroth, Stefan; Surakka, Ida; Polasek, Ozren; Lyytikäinen, Leo-Pekka; Granell, Raquel; Hysi, Pirro G.; Flexeder, Claudia; Mahajan, Anubha; Beilby, John; Bossé, Yohan; Brandsma, Corry-Anke; Campbell, Harry; Gieger, Christian; Gläser, Sven; González, Juan R.; Grallert, Harald; Hammond, Chris J.; Harris, Sarah E.; Hartikainen, Anna-Liisa; Heliövaara, Markku; Henderson, John; Hocking, Lynne; Horikoshi, Momoko; Hutri-Kähönen, Nina; Ingelsson, Erik; Johansson, Åsa; Kemp, John P.; Kolcic, Ivana; Kumar, Ashish; Lind, Lars; Melén, Erik; Musk, Arthur W.; Navarro, Pau; Nickle, David C.; Padmanabhan, Sandosh; Raitakari, Olli T.; Ried, Janina S.; Ripatti, Samuli; Schulz, Holger; Scott, Robert A.; Sin, Don D.; Starr, John M.; Deloukas, Panos; Hansell, Anna L.; Hubbard, Richard; Jackson, Victoria E.; Marchini, Jonathan; Pavord, Ian; Thomson, Neil C.; Zeggini, Eleftheria; Viñuela, Ana; Völzke, Henry; Wild, Sarah H.; Wright, Alan F.; Zemunik, Tatijana; Jarvis, Deborah L.; Spector, Tim D.; Evans, David M.; Lehtimäki, Terho; Vitart, Veronique; Kähönen, Mika; Gyllensten, Ulf; Rudan, Igor; Deary, Ian J.; Karrasch, Stefan; Probst-Hensch, Nicole M.; Heinrich, Joachim; Stubbe, Beate; Wilson, James F.; Wareham, Nicholas J.; James, Alan L.; Morris, Andrew P.; Jarvelin, Marjo-Riitta; Hayward, Caroline; Sayers, Ian; Strachan, David P.; Hall, Ian P.; Tobin, Martin D.

    2015-01-01

    Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed genotypes and followed up top associations in 54,550 Europeans. We identify 14 novel loci (P<5 × 10−8) in or near ENSA, RNU5F-1, KCNS3, AK097794, ASTN2, LHX3, CCDC91, TBX3, TRIP11, RIN3, TEKT5, LTBP4, MN1 and AP1S2, and two novel signals at known loci NPNT and GPR126, providing a basis for new understanding of the genetic determinants of these traits and pulmonary diseases in which they are altered. PMID:26635082

  4. Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis.

    Science.gov (United States)

    Eekhout, Iris; van de Wiel, Mark A; Heymans, Martijn W

    2017-08-22

    Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin's Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels significantly contributes to the model, different methods are available. For example pooling chi-square tests with multiple degrees of freedom, pooling likelihood ratio test statistics, and pooling based on the covariance matrix of the regression model. These methods are more complex than RR and are not available in all mainstream statistical software packages. In addition, they do not always obtain optimal power levels. We argue that the median of the p-values from the overall significance tests from the analyses on the imputed datasets can be used as an alternative pooling rule for categorical variables. The aim of the current study is to compare different methods to test a categorical variable for significance after multiple imputation on applicability and power. In a large simulation study, we demonstrated the control of the type I error and power levels of different pooling methods for categorical variables. This simulation study showed that for non-significant categorical covariates the type I error is controlled and the statistical power of the median pooling rule was at least equal to current multiple parameter tests. An empirical data example showed similar results. It can therefore be concluded that using the median of the p-values from the imputed data analyses is an attractive and easy to use alternative method for significance testing of categorical variables.

  5. GACT: a Genome build and Allele definition Conversion Tool for SNP imputation and meta-analysis in genetic association studies

    Science.gov (United States)

    2014-01-01

    Background Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions. Results In this study, we have developed a tool, GACT, which stands for Genome build and Allele definition Conversion Tool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate > 0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs. Conclusion GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of

  6. ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

    Directory of Open Access Journals (Sweden)

    Kamatani Naoyuki

    2011-05-01

    Full Text Available Abstract Background Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required. Results We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo. Conclusions ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.

  7. Geographical orientation. An integral geoperspective

    Directory of Open Access Journals (Sweden)

    Cristóbal Cobo Arízaga

    2013-12-01

    This approach seeks to create a new line of discussion, to launch a proposal that is scientifically challenging to the hegemony of geographical thought and that provides new geographical rationality structures.

  8. An evaluation of a geographic information system software and its utility in promoting the use of integrated process skills in secondary students

    Science.gov (United States)

    Abbott, Thomas Diamond

    2001-07-01

    As technology continues to become an integral part of our educational system, research that clarifies how various technologies affect learning should be available to educators prior to the large scale introduction of any new technology into the classroom. This study will assess the degree to which a relatively new Geographic Information System Software (ArcView 3.1), when utilized by high school freshman in earth science and geography courses, can be used to (a) promote and develop integrated process skills in these students, and (b) improve their awareness and appraisal of their problem solving abilities. Two research questions will be addressed in this research: (1) Will the use of a GIS to solve problems with authentic contexts enhance the learning and refinement of integrated process skills over more conventional means of classroom instruction? and (2) Will students' perceptions of competence to solve problems within authentic contexts be greater for those who learned to use and implement a GIS when compared to those who have learned by more conventional means of classroom instruction? Research Question 1 will be assessed by using the Test of Integrated Process Skills II (or TIPS II) and Research Question 2 will be addressed by using the Problem Solving Inventory (PSI). The research will last thirteen weeks. The TIPS II and the PSI will be administered after the intervention of GIS to the experimental group, at which point an Analysis of Covariance and the Mann-Whitney U-test will be utilized to measure the affects of intervention by the independent variable. Teacher/researcher journals and teacher/student questionnaires will be used to compliment the statistical analysis. It is hoped that this study will help in the creation of future instructional models that enable educators to utilize modern technologies appropriately in their classrooms.

  9. Volunteered Geographic Information in Wikipedia

    Science.gov (United States)

    Hardy, Darren

    2010-01-01

    Volunteered geographic information (VGI) refers to the geographic subset of online user-generated content. Through Geobrowsers and online mapping services, which use geovisualization and Web technologies to share and produce VGI, a global digital commons of geographic information has emerged. A notable example is Wikipedia, an online collaborative…

  10. Synergetic Paradigm of Geographical Science

    Science.gov (United States)

    Gorbanyov, Vladimir A.

    2016-01-01

    It is shown that in the last decades, geography has expanded so much, that it has lost its object of study. It was not clear, what the geographical science does, and, as a consequence, households have an extremely low level of geographical cultures and geographical education. Each geography is extremely isolated, has its own object of study.…

  11. Geographical Income Polarization

    DEFF Research Database (Denmark)

    Azhar, Hussain; Jonassen, Anders Bruun

    In this paper we estimate the degree, composition and development of geographical income polarization based on data at the individual and municipal level in Denmark from 1984 to 2002. Rising income polarization is reconfirmed when applying new polarization measures, the driving force being greater...... inter municipal income inequality. Counter factual simulations show that rising property prices to a large part explain the rise in polarization. One side-effect of polarization is tendencies towards a parallel polarization of residence location patterns, where low skilled individuals tend to live...

  12. Imputation of the rare HOXB13 G84E mutation and cancer risk in a large population-based cohort.

    Directory of Open Access Journals (Sweden)

    Thomas J Hoffmann

    2015-01-01

    Full Text Available An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into large well-phenotyped cohorts with existing genome-wide genotype data using large sequenced referenced panels. The success of this approach hinges on the accuracy of rare variant imputation, which remains controversial. For example, a recent study suggested that one cannot adequately impute the HOXB13 G84E mutation associated with prostate cancer risk (carrier frequency of 0.0034 in European ancestry participants in the 1000 Genomes Project. We show that by utilizing the 1000 Genomes Project data plus an enriched reference panel of mutation carriers we were able to accurately impute the G84E mutation into a large cohort of 83,285 non-Hispanic White participants from the Kaiser Permanente Research Program on Genes, Environment and Health Genetic Epidemiology Research on Adult Health and Aging cohort. Imputation authenticity was confirmed via a novel classification and regression tree method, and then empirically validated analyzing a subset of these subjects plus an additional 1,789 men from Kaiser specifically genotyped for the G84E mutation (r2 = 0.57, 95% CI = 0.37–0.77. We then show the value of this approach by using the imputed data to investigate the impact of the G84E mutation on age-specific prostate cancer risk and on risk of fourteen other cancers in the cohort. The age-specific risk of prostate cancer among G84E mutation carriers was higher than among non-carriers. Risk estimates from Kaplan-Meier curves were 36.7% versus 13.6% by age 72, and 64.2% versus 24.2% by age 80, for G84E mutation carriers and non-carriers, respectively (p = 3.4x10-12. The G84E mutation was also associated with an increase in risk for the fourteen other most common cancers considered collectively (p = 5.8x10-4 and more so in cases diagnosed with multiple cancer types, both those including and not including prostate cancer, strongly suggesting

  13. The Geographical Information System

    Directory of Open Access Journals (Sweden)

    Jürgen Schweikart

    2008-12-01

    Full Text Available The Geographical Information System, normally called GIS, is a tool for representing spatial relationships and real processes with the help of a model. A GIS is a system of hardware, software and staff for collecting, managing, analysing and representing geospatial information. For example, we can study the evolution of an infectious disease in a certain territory, perform market analysis, or locate the best ways to choose a new industrial site. In substance, it is data manipulation software for that allows us to have, both the graphic component, that is a territorial representation of the reality that you want to represent, and the data components in the form of a database or more commonly, calculation sheets. Geographical data are divided in spatial data and attribute data: Spatial data are recorded as points, lines and polygons (vectorial structure. In other words, the survey systems have been projected to acquire information in accordance to elementary cells corresponding to a territorial grid (raster structure. It also includes remote sensing data.

  14. APPLICATION OF GEOGRAPHICAL INFORMATION SYSTEMS (GIS ...

    African Journals Online (AJOL)

    A study was undertaken to evaluate the present and potential application of the Geographical Information System (GIS) in Swaziland to manage land resources. This was done by interviewing key persons in the different institutions in the country and assessing the facilities (hardware, software and personnel). The results ...

  15. Comparison of Imputation Methods for Handling Missing Categorical Data with Univariate Pattern|| Una comparación de métodos de imputación de variables categóricas con patrón univariado

    Directory of Open Access Journals (Sweden)

    Torres Munguía, Juan Armando

    2014-06-01

    Full Text Available This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random forests. Results showed that the most effective methods for dealing with missing categorical data in most of the scenarios assessed in this paper were hot-deck and polytomous regression approaches. || El presente estudio examina la estimación de proporciones muestrales en la presencia de valores faltantes en una variable categórica. Se utiliza una encuesta de consumo de tabaco (Encuesta Nacional de Adicciones de México 2011 para crear bases de datos simuladas pero reales con 5% y 15% de valores perdidos para cada mecanismo de no respuesta MCAR, MAR y MNAR. Se evalúa el desempeño de seis métodos para tratar la falta de respuesta: listwise, imputación de moda, imputación aleatoria, hot-deck, imputación por regresión politómica y árboles de clasificación. Los resultados de las simulaciones indican que los métodos más efectivos para el tratamiento de la no respuesta en variables categóricas, bajo los escenarios simulados, son hot-deck y la regresión politómica.

  16. Geographic Ontologies, Gazetteers and Multilingualism

    Directory of Open Access Journals (Sweden)

    Robert Laurini

    2015-01-01

    Full Text Available Different languages imply different visions of space, so that terminologies are different in geographic ontologies. In addition to their geometric shapes, geographic features have names, sometimes different in diverse languages. In addition, the role of gazetteers, as dictionaries of place names (toponyms, is to maintain relations between place names and location. The scope of geographic information retrieval is to search for geographic information not against a database, but against the whole Internet: but the Internet stores information in different languages, and it is of paramount importance not to remain stuck to a unique language. In this paper, our first step is to clarify the links between geographic objects as computer representations of geographic features, ontologies and gazetteers designed in various languages. Then, we propose some inference rules for matching not only types, but also relations in geographic ontologies with the assistance of gazetteers.

  17. GEOGRAPHIC INFORMATION SYSTEM – TOBEL

    Directory of Open Access Journals (Sweden)

    M. Koeva

    2013-05-01

    Full Text Available TOBEL is Geographic Information System entirely developed by one of the leading Bulgarian Geo-information companies – "Mapex" JSC. The system is based on modern information technology and it is especially designed for Bulgarian authorities. GIS – TOBEL provides to municipalities extraordinary quantitative and qualitative benefits. The system offers a method of quick access, evaluation, and format conversion. It also allows producing interactive maps from different sources, leveraging database information, and automating work processes. The paper contains a description of the main functions of the system, the used data and the whole process of development and system integration in Bulgarian Municipalities. The examples of successful working GIS systems integrated from our company are demonstrated.

  18. Evaluation of Subcutaneous Proleukin (interleukin-2) in a Randomized International Trial (ESPRIT): geographical and gender differences in the baseline characteristics of participants

    NARCIS (Netherlands)

    Pett, S. L.; Wand, H.; Law, M. G.; Arduino, R.; Lopez, J. C.; Knysz, B.; Pereira, L. C.; Pollack, S.; Reiss, P.; Tambussi, G.

    2006-01-01

    BACKGROUND: ESPRIT, is a phase III, open-label, randomized, international clinical trial evaluating the effects of subcutaneous recombinant interleukin-2 (rIL-2) plus antiretroviral therapy (ART) versus ART alone on HIV-disease progression and death in HIV-1-infected individuals with CD4+ T-cells >

  19. Imputing Missing Race/Ethnicity in Pediatric Electronic Health Records: Reducing Bias with Use of U.S. Census Location and Surname Data.

    Science.gov (United States)

    Grundmeier, Robert W; Song, Lihai; Ramos, Mark J; Fiks, Alexander G; Elliott, Marc N; Fremont, Allen; Pace, Wilson; Wasserman, Richard C; Localio, Russell

    2015-08-01

    To assess the utility of imputing race/ethnicity using U.S. Census race/ethnicity, residential address, and surname information compared to standard missing data methods in a pediatric cohort. Electronic health record data from 30 pediatric practices with known race/ethnicity. In a simulation experiment, we constructed dichotomous and continuous outcomes with pre-specified associations with known race/ethnicity. Bias was introduced by nonrandomly setting race/ethnicity to missing. We compared typical methods for handling missing race/ethnicity (multiple imputation alone with clinical factors, complete case analysis, indicator variables) to multiple imputation incorporating surname and address information. Imputation using U.S. Census information reduced bias for both continuous and dichotomous outcomes. The new method reduces bias when race/ethnicity is partially, nonrandomly missing. © Health Research and Educational Trust.

  20. Propensity Scoring after Multiple Imputation in a Retrospective Study on Adjuvant Radiation Therapy in Lymph-Node Positive Vulvar Cancer.

    Directory of Open Access Journals (Sweden)

    Christine Eulenburg

    Full Text Available Propensity scoring (PS is an established tool to account for measured confounding in non-randomized studies. These methods are sensitive to missing values, which are a common problem in observational data. The combination of multiple imputation of missing values and different propensity scoring techniques is addressed in this work. For a sample of lymph node-positive vulvar cancer patients, we re-analyze associations between the application of radiotherapy and disease-related and non-related survival. Inverse-probability-of-treatment-weighting (IPTW and PS stratification are applied after multiple imputation by chained equation (MICE. Methodological issues are described in detail. Interpretation of the results and methodological limitations are discussed.

  1. Propensity Scoring after Multiple Imputation in a Retrospective Study on Adjuvant Radiation Therapy in Lymph-Node Positive Vulvar Cancer.

    Science.gov (United States)

    Eulenburg, Christine; Suling, Anna; Neuser, Petra; Reuss, Alexander; Canzler, Ulrich; Fehm, Tanja; Luyten, Alexander; Hellriegel, Martin; Woelber, Linn; Mahner, Sven

    2016-01-01

    Propensity scoring (PS) is an established tool to account for measured confounding in non-randomized studies. These methods are sensitive to missing values, which are a common problem in observational data. The combination of multiple imputation of missing values and different propensity scoring techniques is addressed in this work. For a sample of lymph node-positive vulvar cancer patients, we re-analyze associations between the application of radiotherapy and disease-related and non-related survival. Inverse-probability-of-treatment-weighting (IPTW) and PS stratification are applied after multiple imputation by chained equation (MICE). Methodological issues are described in detail. Interpretation of the results and methodological limitations are discussed.

  2. Assessment of Consequences of Replacement of System of the Uniform Tax on Imputed Income Patent System of the Taxation

    Directory of Open Access Journals (Sweden)

    Galina A. Manokhina

    2012-11-01

    Full Text Available The article highlights the main questions concerning possible consequences of replacement of nowadays operating system in the form of a single tax in reference to imputed income with patent system of the taxation. The main advantages and drawbacks of new system of the taxation are shown, including the opinion that not the replacement of one special mode of the taxation with another is more effective, but the introduction of patent a taxation system as an auxilary system.

  3. Treatments of Missing Values in Large National Data Affect Conclusions: The Impact of Multiple Imputation on Arthroplasty Research.

    Science.gov (United States)

    Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Su, Edwin P; Grauer, Jonathan N

    2018-03-01

    Despite the advantages of large, national datasets, one continuing concern is missing data values. Complete case analysis, where only cases with complete data are analyzed, is commonly used rather than more statistically rigorous approaches such as multiple imputation. This study characterizes the potential selection bias introduced using complete case analysis and compares the results of common regressions using both techniques following unicompartmental knee arthroplasty. Patients undergoing unicompartmental knee arthroplasty were extracted from the 2005 to 2015 National Surgical Quality Improvement Program. As examples, the demographics of patients with and without missing preoperative albumin and hematocrit values were compared. Missing data were then treated with both complete case analysis and multiple imputation (an approach that reproduces the variation and associations that would have been present in a full dataset) and the conclusions of common regressions for adverse outcomes were compared. A total of 6117 patients were included, of which 56.7% were missing at least one value. Younger, female, and healthier patients were more likely to have missing preoperative albumin and hematocrit values. The use of complete case analysis removed 3467 patients from the study in comparison with multiple imputation which included all 6117 patients. The 2 methods of handling missing values led to differing associations of low preoperative laboratory values with commonly studied adverse outcomes. The use of complete case analysis can introduce selection bias and may lead to different conclusions in comparison with the statistically rigorous multiple imputation approach. Joint surgeons should consider the methods of handling missing values when interpreting arthroplasty research. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. A digestive allergic reaction with hypereosinophilia imputable to docetaxel in a breast cancer patient: a case report.

    Science.gov (United States)

    Hamdan, Diaddin; Leboeuf, Christophe; Pereira, Cathy; Jourdan, Nathalie; Verneuil, Laurence; Bousquet, Guilhem; Janin, Anne

    2015-12-21

    Hypereosinophilia, defined by an absolute eosinophil count of more than 1500/mm3, is rarely observed in patients treated for cancer, and rarely imputable to anti-cancer agents. Drug-induced hypereosinophilia usually appears within a few weeks of the start of treatment and resolves after discontinuation of the medication. We report here a first case of hypereosinophilia with digestive allergic reaction imputable to docetaxel in a woman treated for breast cancer. This patient, with a history of childhood atopic dermatitis and asthma, underwent surgery for breast lobular carcinoma, followed with chemotherapy including 3 cycles of the FEC100 protocol and 3 cycles of docetaxel. Ten days after the second cycle of docetaxel, she had abdominal pain with diarrhea, which increased after the third cycle of docetaxel at the same dose. The blood eosinophil count increased up to 4685/mm(3) at day 92. All biological tests were normal, except elevated seric IgE. The systematic biopsies of the upper and lower digestive tract showed diffuse edema of the lamina propria, lymphocytic infiltrate and CD117-expressing cells both in the epithelium and in the lamina propria. Electron microscopy showed a large number of degranulating mast cells, while the number of tissue eosinophils was small. The blood eosinophil count decreased after day 96, three months after the last injection of docetaxel. After day 182, the hypereosinophilia and symptoms resolved. This spontaneous evolution, the history of atopic dermatitis and asthma, and the negativity of all biological tests performed led us to hypothesize a diagnosis of a systemic digestive Type 1 drug-induced hypersensitivity reaction. Using two validated pharmacovigilance scales, we found that docetaxel had the highest imputability score compared to the other drugs. Recognition of allergic reactions imputable to docetaxel is important because it requires the drug to be discontinued. In the difficult setting of anti-cancer treatment, if

  5. Using multiple imputation to efficiently correct cerebral MRI whole brain lesion and atrophy data in patients with multiple sclerosis.

    Science.gov (United States)

    Chua, Alicia S; Egorova, Svetlana; Anderson, Mark C; Polgar-Turcsanyi, Mariann; Chitnis, Tanuja; Weiner, Howard L; Guttmann, Charles R G; Bakshi, Rohit; Healy, Brian C

    2015-10-01

    Automated segmentation of brain MRI scans into tissue classes is commonly used for the assessment of multiple sclerosis (MS). However, manual correction of the resulting brain tissue label maps by an expert reader remains necessary in many cases. Since automated segmentation data awaiting manual correction are "missing", we proposed to use multiple imputation (MI) to fill-in the missing manually-corrected MRI data for measures of normalized whole brain volume (brain parenchymal fraction-BPF) and T2 hyperintense lesion volume (T2LV). Automated and manually corrected MRI measures from 1300 patients enrolled in the Comprehensive Longitudinal Investigation of Multiple Sclerosis at the Brigham and Women's Hospital (CLIMB) were identified. Simulation studies were conducted to assess the performance of MI with missing data both missing completely at random and missing at random. An imputation model including the concurrent automated data as well as clinical and demographic variables explained a high proportion of the variance in the manually corrected BPF (R(2)=0.97) and T2LV (R(2)=0.89), demonstrating the potential to accurately impute the missing data. Further, our results demonstrate that MI allows for the accurate estimation of group differences with little to no bias and with similar precision compared to an analysis with no missing data. We believe that our findings provide important insights for efficient correction of automated MRI measures to obviate the need to perform manual correction on all cases. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study.

    Science.gov (United States)

    Seffens, William; Evans, Chad; Taylor, Herman

    2015-01-01

    Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules.

  7. Analysis of Case-Control Association Studies: SNPs, Imputation and Haplotypes

    KAUST Repository

    Chatterjee, Nilanjan

    2009-11-01

    Although prospective logistic regression is the standard method of analysis for case-control data, it has been recently noted that in genetic epidemiologic studies one can use the "retrospective" likelihood to gain major power by incorporating various population genetics model assumptions such as Hardy-Weinberg-Equilibrium (HWE), gene-gene and gene-environment independence. In this article we review these modern methods and contrast them with the more classical approaches through two types of applications (i) association tests for typed and untyped single nucleotide polymorphisms (SNPs) and (ii) estimation of haplotype effects and haplotype-environment interactions in the presence of haplotype-phase ambiguity. We provide novel insights to existing methods by construction of various score-tests and pseudo-likelihoods. In addition, we describe a novel two-stage method for analysis of untyped SNPs that can use any flexible external algorithm for genotype imputation followed by a powerful association test based on the retrospective likelihood. We illustrate applications of the methods using simulated and real data. © Institute of Mathematical Statistics, 2009.

  8. Multiple imputation of rainfall missing data in the Iberian Mediterranean context

    Science.gov (United States)

    Miró, Juan Javier; Caselles, Vicente; Estrela, María José

    2017-11-01

    Given the increasing need for complete rainfall data networks, in recent years have been proposed diverse methods for filling gaps in observed precipitation series, progressively more advanced that traditional approaches to overcome the problem. The present study has consisted in validate 10 methods (6 linear, 2 non-linear and 2 hybrid) that allow multiple imputation, i.e., fill at the same time missing data of multiple incomplete series in a dense network of neighboring stations. These were applied for daily and monthly rainfall in two sectors in the Júcar River Basin Authority (east Iberian Peninsula), which is characterized by a high spatial irregularity and difficulty of rainfall estimation. A classification of precipitation according to their genetic origin was applied as pre-processing, and a quantile-mapping adjusting as post-processing technique. The results showed in general a better performance for the non-linear and hybrid methods, highlighting that the non-linear PCA (NLPCA) method outperforms considerably the Self Organizing Maps (SOM) method within non-linear approaches. On linear methods, the Regularized Expectation Maximization method (RegEM) was the best, but far from NLPCA. Applying EOF filtering as post-processing of NLPCA (hybrid approach) yielded the best results.

  9. Imputation-based analysis of association studies: candidate regions and quantitative traits.

    Directory of Open Access Journals (Sweden)

    Bertrand Servin

    2007-07-01

    Full Text Available We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest with genotype data at tag SNPs collected on a phenotyped study sample, to estimate ("impute" unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene, the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html.

  10. A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method

    Directory of Open Access Journals (Sweden)

    Jun-He Yang

    2017-01-01

    Full Text Available Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir’s water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir’s water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.

  11. A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method.

    Science.gov (United States)

    Yang, Jun-He; Cheng, Ching-Hsue; Chan, Chia-Pan

    2017-01-01

    Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir's water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir's water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.

  12. The search for stable prognostic models in multiple imputed data sets

    Directory of Open Access Journals (Sweden)

    de Vet Henrica CW

    2010-09-01

    Full Text Available Abstract Background In prognostic studies model instability and missing data can be troubling factors. Proposed methods for handling these situations are bootstrapping (B and Multiple imputation (MI. The authors examined the influence of these methods on model composition. Methods Models were constructed using a cohort of 587 patients consulting between January 2001 and January 2003 with a shoulder problem in general practice in the Netherlands (the Dutch Shoulder Study. Outcome measures were persistent shoulder disability and persistent shoulder pain. Potential predictors included socio-demographic variables, characteristics of the pain problem, physical activity and psychosocial factors. Model composition and performance (calibration and discrimination were assessed for models using a complete case analysis, MI, bootstrapping or both MI and bootstrapping. Results Results showed that model composition varied between models as a result of how missing data was handled and that bootstrapping provided additional information on the stability of the selected prognostic model. Conclusion In prognostic modeling missing data needs to be handled by MI and bootstrap model selection is advised in order to provide information on model stability.

  13. GEOGRAPHIC NAMES INFORMATION SYSTEM (GNIS) ...

    Science.gov (United States)

    The Geographic Names Information System (GNIS), developed by the U.S. Geological Survey in cooperation with the U.S. Board on Geographic Names (BGN), contains information about physical and cultural geographic features in the United States and associated areas, both current and historical, but not including roads and highways. The database also contains geographic names in Antarctica. The database holds the Federally recognized name of each feature and defines the location of the feature by state, county, USGS topographic map, and geographic coordinates. Other feature attributes include names or spellings other than the official name, feature designations, feature class, historical and descriptive information, and for some categories of features the geometric boundaries. The database assigns a unique feature identifier, a random number, that is a key for accessing, integrating, or reconciling GNIS data with other data sets. The GNIS is our Nation's official repository of domestic geographic feature names information.

  14. Protecting Confidentiality in Cancer Registry Data With Geographic Identifiers.

    Science.gov (United States)

    Yu, Mandi; Reiter, Jerome Phillip; Zhu, Li; Liu, Benmei; Cronin, Kathleen A; Feuer, Eric J Rocky

    2017-07-01

    The National Cancer Institute's Surveillance, Epidemiology, and End Results Program releases research files of cancer registry data. These files include geographic information at the county level, but no finer. Access to finer geography, such as census tract identifiers, would enable richer analyses-for example, examination of health disparities across neighborhoods. To date, tract identifiers have been left off the research files because they could compromise the confidentiality of patients' identities. We present an approach to inclusion of tract identifiers based on multiply imputed, synthetic data. The idea is to build a predictive model of tract locations, given patient and tumor characteristics, and randomly simulate the tract of each patient by sampling from this model. For the predictive model, we use multivariate regression trees fitted to the latitude and longitude of the population centroid of each tract. We implement the approach in the registry data from California. The method results in synthetic data that reproduce a wide range (but not all) of analyses of census tract socioeconomic cancer disparities and have relatively low disclosure risks, which we assess by comparing individual patients' actual and synthetic tract locations. We conclude with a discussion of how synthetic data sets can be used by researchers with cancer registry data. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health 2017. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  15. Grapes from the geographical areas of the Black Sea: Agroclimatic growing conditions and evaluation of stable isotopes compositions in scientific study

    Directory of Open Access Journals (Sweden)

    Kolesnov Alexander

    2016-01-01

    Full Text Available The report considers the agroclimatic conditions in the Black Sea districts of cultivation and processing of grapes - the Black Sea Lowland, the Crimean Peninsula and the South-west coastal areas of the Greater Caucasus. The IRMS/SIRA techniques - Flash combustion (FC-IRMS/SIRA & Isotopic equilibration (EQ-IRMS/SIRA - were first applied for the evaluation of carbon and oxygen isotopes ratios in the components of grapes from the Crimean Peninsula. The 13C/12C ratios were studied by the FC-IRMS/SIRA in carbohydrates and organic acids in authentic samples of 8 grape varieties from the 2015 harvest. The EQ-IRMS/SIRA was applied to measure the 18O/16O ratios in intracellular water of grapes. The measured δ13CVPDB value ranges from − 25.01 to − 21.01‰ (for carbohydrates, and from − 25.09 to − 21.30‰ (for organic acids. To evaluate the extent of biological isotope fractionation the 18O/16O ratios were measured in ground water and water of atmospheric precipitates from the Crimean Peninsula. Compared to ground (δ18OVSMOW from − 10.85 to − 8.14‰ and atmospheric (average δ18OVSMOW− 2.85‰ waters, the intracellular water of Crimean grape varieties is found to be enriched with 18O isotope. The δ18OVSMOW value of the grape intracellular water varies from 2.34 to 5.29‰ according to agroclimatic conditions of the season in 2015.

  16. Poverty alleviation through geographic targeting: How much does disaggregation help?

    NARCIS (Netherlands)

    Elbers, C.T.M.; Fujii, T.; Lanjouw, P.F.; Özler, B; Yin, W

    2007-01-01

    In this paper, we employ recently completed "poverty maps" for three countries as tools for an ex ante evaluation of the distributional incidence of geographic targeting of public resources. We simulate the impact on poverty of transferring an exogenously given budget to geographically defined

  17. Application of geographic information systems (gis) for sustainable ...

    African Journals Online (AJOL)

    ... recent advances in information technology e.g. geographical information systems (GIS). This paper highlights the need for the integration of geographic information systems with processes of land evaluation, for improved quality of land decisions and sustainable land use and management. Journal of Applied Chemistry ...

  18. Exploration of haplotype research consortium imputation for genome-wide association studies in 20,032 Generation Scotland participants.

    Science.gov (United States)

    Nagy, Reka; Boutin, Thibaud S; Marten, Jonathan; Huffman, Jennifer E; Kerr, Shona M; Campbell, Archie; Evenden, Louise; Gibson, Jude; Amador, Carmen; Howard, David M; Navarro, Pau; Morris, Andrew; Deary, Ian J; Hocking, Lynne J; Padmanabhan, Sandosh; Smith, Blair H; Joshi, Peter; Wilson, James F; Hastie, Nicholas D; Wright, Alan F; McIntosh, Andrew M; Porteous, David J; Haley, Chris S; Vitart, Veronique; Hayward, Caroline

    2017-03-07

    The Generation Scotland: Scottish Family Health Study (GS:SFHS) is a family-based population cohort with DNA, biological samples, socio-demographic, psychological and clinical data from approximately 24,000 adult volunteers across Scotland. Although data collection was cross-sectional, GS:SFHS became a prospective cohort due to of the ability to link to routine Electronic Health Record (EHR) data. Over 20,000 participants were selected for genotyping using a large genome-wide array. GS:SFHS was analysed using genome-wide association studies (GWAS) to test the effects of a large spectrum of variants, imputed using the Haplotype Research Consortium (HRC) dataset, on medically relevant traits measured directly or obtained from EHRs. The HRC dataset is the largest available haplotype reference panel for imputation of variants in populations of European ancestry and allows investigation of variants with low minor allele frequencies within the entire GS:SFHS genotyped cohort. Genome-wide associations were run on 20,032 individuals using both genotyped and HRC imputed data. We present results for a range of well-studied quantitative traits obtained from clinic visits and for serum urate measures obtained from data linkage to EHRs collected by the Scottish National Health Service. Results replicated known associations and additionally reveal novel findings, mainly with rare variants, validating the use of the HRC imputation panel. For example, we identified two new associations with fasting glucose at variants near to Y_RNA and WDR4 and four new associations with heart rate at SNPs within CSMD1 and ASPH, upstream of HTR1F and between PROKR2 and GPCPD1. All were driven by rare variants (minor allele frequencies in the range of 0.08-1%). Proof of principle for use of EHRs was verification of the highly significant association of urate levels with the well-established urate transporter SLC2A9. GS:SFHS provides genetic data on over 20,000 participants alongside a range of

  19. THEATRICALIZE GEOGRAPHIC TEACHING

    Directory of Open Access Journals (Sweden)

    Liana Macabu de Sousa Soares

    2013-06-01

    Full Text Available This article assumes that the theater, through the methodology of theater games, can contribute to the teaching of geography more meaningful and seek to practice and critical reflection in order to form citizens capable of conceiving a geographical reading of reality. This proposal combines the teaching of geography for artistic expression, seeking recovery of several languages. To validate the research were developed theater games activities with higher and elementary students. RESUMO: O presente artigo parte do pressuposto de que o Teatro, através da metodologia dos jogos teatrais, pode contribuir para um ensino de Geografia que seja mais significativo e que busque a prática e a reflexão crítica com o intuito de formar cidadãos capazes de conceber uma leitura geográfica da realidade. Esta proposta une o ensino de Geografia à expressão artística, na busca da valorização das diversas linguagens. Para validar a pesquisa foram desenvolvidas atividades de jogos teatrais com alunos de Licenciatura em Geografia e do Ensino Médio.

  20. Geographical Database Integrity Validation

    Science.gov (United States)

    Jacobs, Derya; Kauffman, Paul; Blackstock, Dexter

    2000-01-01

    Airport Safety Modeling Data (ASMD) was developed at the request of a 1997 White House Conference on Aviation Safety and Security. Politicians, military personnel, commercial aircraft manufacturers and the airline industry attended the conference. The objective of the conference was to study the airline industry and make recommendations to improve safety and security. One of the topics discussed at the conference was the loss of situational awareness by aircraft pilots. Loss of situational awareness occurs when a pilot loses his geographic position during flight and can result in crashes into terrain and obstacles. It was recognized at the conference that aviation safety could be improved by reducing the loss of situational awareness. The conference advised that a system be placed in the airplane cockpit that would provide pilots with a visual representation of the terrain around airports. The system would prevent airline crashes during times of inclement weather and loss of situational awareness. The system must be based on accurate data that represents terrain around airports. The Department of Defense and the National Imagery and Mapping Agency (NIMA) released ASMD to be used for the development of a visual system for aircraft pilots. ASMD was constructed from NIMA digital terrain elevation data (DTED).

  1. Is there a role for expectation maximization imputation in addressing missing data in research using WOMAC questionnaire? Comparison to the standard mean approach and a tutorial

    Directory of Open Access Journals (Sweden)

    Rutledge John

    2011-05-01

    Full Text Available Abstract Background Standard mean imputation for missing values in the Western Ontario and Mc Master (WOMAC Osteoarthritis Index limits the use of collected data and may lead to bias. Probability model-based imputation methods overcome such limitations but were never before applied to the WOMAC. In this study, we compare imputation results for the Expectation Maximization method (EM and the mean imputation method for WOMAC in a cohort of total hip replacement patients. Methods WOMAC data on a consecutive cohort of 2062 patients scheduled for surgery were analyzed. Rates of missing values in each of the WOMAC items from this large cohort were used to create missing patterns in the subset of patients with complete data. EM and the WOMAC's method of imputation are then applied to fill the missing values. Summary score statistics for both methods are then described through box-plot and contrasted with the complete case (CC analysis and the true score (TS. This process is repeated using a smaller sample size of 200 randomly drawn patients with higher missing rate (5 times the rates of missing values observed in the 2062 patients capped at 45%. Results Rate of missing values per item ranged from 2.9% to 14.5% and 1339 patients had complete data. Probability model-based EM imputed a score for all subjects while WOMAC's imputation method did not. Mean subscale scores were very similar for both imputation methods and were similar to the true score; however, the EM method results were more consistent with the TS after simulation. This difference became more pronounced as the number of items in a subscale increased and the sample size decreased. Conclusions The EM method provides a better alternative to the WOMAC imputation method. The EM method is more accurate and imputes data to create a complete data set. These features are very valuable for patient-reported outcomes research in which resources are limited and the WOMAC score is used in a multivariate

  2. The use of multiple imputation method for the validation of 24-h food recalls by part-time observation of dietary intake in school.

    Science.gov (United States)

    Kupek, Emil; de Assis, Maria Alice A

    2016-09-01

    External validation of food recall over 24 h in schoolchildren is often restricted to eating events in schools and is based on direct observation as the reference method. The aim of this study was to estimate the dietary intake out of school, and consequently the bias in such research design based on only part-time validated food recall, using multiple imputation (MI) conditioned on the information on child age, sex, BMI, family income, parental education and the school attended. The previous-day, web-based questionnaire WebCAAFE, structured as six meals/snacks and thirty-two foods/beverage, was answered by a sample of 7-11-year-old Brazilian schoolchildren (n 602) from five public schools. Food/beverage intake recalled by children was compared with the records provided by trained observers during school meals. Sensitivity analysis was performed with artificial data emulating those recalled by children on WebCAAFE in order to evaluate the impact of both differential and non-differential bias. Estimated bias was within ±30 % interval for 84·4 % of the thirty-two foods/beverages evaluated in WebCAAFE, and half of the latter reached statistical significance (Ppart-time validation design of dietary intake over six daily eating events.

  3. Trade Effects of Geographical Indications

    NARCIS (Netherlands)

    Bremmers, H.J.

    2015-01-01

    This contribution addresses the effects of geographical indications on intra-EU and international trade. Trade is hampered by the exclusiveness of indications like “Protected Designations of Origin or Protected Geographical Indications”, which at the same time is necessary to protect their value. As

  4. Determinants of Dentists' Geographic Distribution.

    Science.gov (United States)

    Beazoglou, Tryfon J.; And Others

    1992-01-01

    A model for explaining the geographic distribution of dentists' practice locations is presented and applied to particular market areas in Connecticut. Results show geographic distribution is significantly related to a few key variables, including demography, disposable income, and housing prices. Implications for helping students make practice…

  5. Adaptive Cartography and Geographical Education

    Science.gov (United States)

    Konecny, Milan; Stanek, Karel

    2010-01-01

    The article focuses on adaptive cartography and its potential for geographical education. After briefly describing the wider context of adaptive cartography, it is suggested that this new cartographic approach establishes new demands and benefits for geographical education, especially in offering the possibility for broader individual…

  6. A Multiple-Imputation "Forward Bridging" Approach to Address Changes in the Classification of Asian Race/Ethnicity on the US Death Certificate.

    Science.gov (United States)

    Thompson, Caroline A; Boothroyd, Derek B; Hastings, Katherine G; Cullen, Mark R; Palaniappan, Latha P; Rehkopf, David H

    2018-02-01

    The incomparability of old and new classification systems for describing the same data can be seen as a missing-data problem, and, under certain assumptions, multiple imputation may be used to "bridge" 2 classification systems. One example of such a change is the introduction of detailed Asian-American race/ethnicity classifications on the 2003 version of the US national death certificate, which was adopted for use by 38 states between 2003 and 2011. Using county- and decedent-level data from 3 different national sources for pre- and postadoption years, we fitted within-state multiple-imputation models to impute ethnicities for decedents classified as "other Asian" during preadoption years. We present mortality rates derived using 3 different methods of calculation: 1) including all states but ignoring the gradual adoption of the new death certificate over time, 2) including only the 7 states with complete reporting of all ethnicities, and 3) including all states and applying multiple imputation. Estimates from our imputation model were consistently in the middle of the other 2 estimates, and trend results demonstrated that the year-by-year estimates of the imputation model were more similar to those of the 7-state model. This work demonstrates how multiple imputation can provide a "forward bridging" approach to make more accurate estimates over time in newly categorized populations. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  7. Soil microbial communities: Influence of geographic location and hydrocarbon pollutants

    CSIR Research Space (South Africa)

    Maila, MP

    2006-02-01

    Full Text Available The importance and relevance of the geographical origin of the soil sample and the hydrocarbons in determining the functional or species diversity within different bacterial communities was evaluated using the community level physiological profiles...

  8. Combining Land Capability Evaluation, Geographic Information ...

    African Journals Online (AJOL)

    Where terraces are recommended, acceptance by farmers is ensured not only by developing the structures from indigenous technologies (i.e., 'Weber' and 'Kab' or 'Kirit'), but also by adopting various strategies to increase their economic advantages and profitability. (Eastern Africa Social Science Research Review: 2003 ...

  9. An imputation/copula-based stochastic individual tree growth model for mixed species Acadian forests: a case study using the Nova Scotia permanent sample plot network

    Directory of Open Access Journals (Sweden)

    John A. KershawJr

    2017-09-01

    Full Text Available Background A novel approach to modelling individual tree growth dynamics is proposed. The approach combines multiple imputation and copula sampling to produce a stochastic individual tree growth and yield projection system. Methods The Nova Scotia, Canada permanent sample plot network is used as a case study to develop and test the modelling approach. Predictions from this model are compared to predictions from the Acadian variant of the Forest Vegetation Simulator, a widely used statistical individual tree growth and yield model. Results Diameter and height growth rates were predicted with error rates consistent with those produced using statistical models. Mortality and ingrowth error rates were higher than those observed for diameter and height, but also were within the bounds produced by traditional approaches for predicting these rates. Ingrowth species composition was very poorly predicted. The model was capable of reproducing a wide range of stand dynamic trajectories and in some cases reproduced trajectories that the statistical model was incapable of reproducing. Conclusions The model has potential to be used as a benchmarking tool for evaluating statistical and process models and may provide a mechanism to separate signal from noise and improve our ability to analyze and learn from large regional datasets that often have underlying flaws in sample design.

  10. Análise AMMI com dados imputados em experimentos de interação genótipo x ambiente de algodão AMMI analysis with imputed data in genotype x environment interaction experiments in cotton

    Directory of Open Access Journals (Sweden)

    Sergio Arciniegas-Alarcón

    2009-11-01

    Full Text Available O objetivo deste trabalho foi avaliar a conveniência de definir o número de componentes multiplicativos dos modelos de efeitos principais aditivos com interação multiplicativa (AMMI em experimentos de interações genótipo x ambiente de algodão com dados imputados ou desbalanceados. Um estudo de simulação foi realizado com base em uma matriz de dados reais de produtividade de algodão em caroço, obtidos em ensaios de interação genótipo x ambiente, conduzidos com 15 cultivares em 27 locais no Brasil. A simulação foi feita com retiradas aleatórias de 10, 20 e 30% dos dados. O número ótimo de componentes multiplicativos para o modelo AMMI foi determinado usando o teste de Cornelius e o teste de razão de verossimilhança sobre as matrizes completadas por imputação. Para testar as hipóteses, quando a análise é feita a partir de médias e não são disponibilizadas as repetições, foi proposta uma correção com base nas observações ausentes no teste de Cornelius. Para a imputação de dados, foram considerados métodos usando submodelos robustos, mínimos quadrados alternados e imputação múltipla. Na análise de experimentos desbalanceados, é recomendável escolher o número de componentes multiplicativos do modelo AMMI somente a partir da informação observada e fazer a estimação clássica dos parâmetros com base nas matrizes completadas por imputação.The objective of this work was to evaluate the convenience of defining the number of multiplicative components of additive main effect and multiplicative interaction models (AMMI in genotype x enviroment interaction experiments in cotton with imputed or unbalanced data. A simulation study was carried out based on a matrix of real seed-cotton productivity data obtained in trials with genotype x environment interaction carried out with 15 genotypes at 27 locations in Brazil. The simulation was made with random withdrawals of 10, 20 and 30% of the data. The optimal

  11. High-accuracy imputation for HLA class I and II genes based on high-resolution SNP data of population-specific references.

    Science.gov (United States)

    Khor, S-S; Yang, W; Kawashima, M; Kamitsuji, S; Zheng, X; Nishida, N; Sawai, H; Toyoda, H; Miyagawa, T; Honda, M; Kamatani, N; Tokunaga, K

    2015-12-01

    Statistical imputation of classical human leukocyte antigen (HLA) alleles is becoming an indispensable tool for fine-mappings of disease association signals from case-control genome-wide association studies. However, most currently available HLA imputation tools are based on European reference populations and are not suitable for direct application to non-European populations. Among the HLA imputation tools, The HIBAG R package is a flexible HLA imputation tool that is equipped with a wide range of population-based classifiers; moreover, HIBAG R enables individual researchers to build custom classifiers. Here, two data sets, each comprising data from healthy Japanese individuals of difference sample sizes, were used to build custom classifiers. HLA imputation accuracy in five HLA classes (HLA-A, HLA-B, HLA-DRB1, HLA-DQB1 and HLA-DPB1) increased from the 82.5-98.8% obtained with the original HIBAG references to 95.2-99.5% with our custom classifiers. A call threshold (CT) of 0.4 is recommended for our Japanese classifiers; in contrast, HIBAG references recommend a CT of 0.5. Finally, our classifiers could be used to identify the risk haplotypes for Japanese narcolepsy with cataplexy, HLA-DRB1*15:01 and HLA-DQB1*06:02, with 100% and 99.7% accuracy, respectively; therefore, these classifiers can be used to supplement the current lack of HLA genotyping data in widely available genome-wide association study data sets.

  12. Residential proximity to electromagnetic field sources and birth weight: Minimizing residual confounding using multiple imputation and propensity score matching.

    Science.gov (United States)

    de Vocht, Frank; Lee, Brian

    2014-08-01

    Studies have suggested that residential exposure to extremely low frequency (50 Hz) electromagnetic fields (ELF-EMF) from high voltage cables, overhead power lines, electricity substations or towers are associated with reduced birth weight and may be associated with adverse birth outcomes or even miscarriages. We previously conducted a study of 140,356 singleton live births between 2004 and 2008 in Northwest England, which suggested that close residential proximity (≤ 50 m) to ELF-EMF sources was associated with reduced average birth weight of 212 g (95%CI: -395 to -29 g) but not with statistically significant increased risks for other adverse perinatal outcomes. However, the cohort was limited by missing data for most potentially confounding variables including maternal smoking during pregnancy, which was only available for a small subgroup, while also residual confounding could not be excluded. This study, using the same cohort, was conducted to minimize the effects of these problems using multiple imputation to address missing data and propensity score matching to minimize residual confounding. Missing data were imputed using multiple imputation using chained equations to generate five datasets. For each dataset 115 exposed women (residing ≤ 50 m from a residential ELF-EMF source) were propensity score matched to 1150 unexposed women. After doubly robust confounder adjustment, close proximity to a residential ELF-EMF source remained associated with a reduction in birth weight of -116 g (95% confidence interval: -224:-7 g). No effect was found for proximity ≤ 100 m compared to women living further away. These results indicate that although the effect size was about half of the effect previously reported, close maternal residential proximity to sources of ELF-EMF remained associated with suboptimal fetal growth. Copyright © 2014 Elsevier Ltd. All rights reserved.

  13. Improving accuracy of genomic prediction in Brangus cattle by adding animals with imputed low-density SNP genotypes.

    Science.gov (United States)

    Lopes, F B; Wu, X-L; Li, H; Xu, J; Perkins, T; Genho, J; Ferretti, R; Tait, R G; Bauck, S; Rosa, G J M

    2018-02-01

    Reliable genomic prediction of breeding values for quantitative traits requires the availability of sufficient number of animals with genotypes and phenotypes in the training set. As of 31 October 2016, there were 3,797 Brangus animals with genotypes and phenotypes. These Brangus animals were genotyped using different commercial SNP chips. Of them, the largest group consisted of 1,535 animals genotyped by the GGP-LDV4 SNP chip. The remaining 2,262 genotypes were imputed to the SNP content of the GGP-LDV4 chip, so that the number of animals available for training the genomic prediction models was more than doubled. The present study showed that the pooling of animals with both original or imputed 40K SNP genotypes substantially increased genomic prediction accuracies on the ten traits. By supplementing imputed genotypes, the relative gains in genomic prediction accuracies on estimated breeding values (EBV) were from 12.60% to 31.27%, and the relative gain in genomic prediction accuracies on de-regressed EBV was slightly small (i.e. 0.87%-18.75%). The present study also compared the performance of five genomic prediction models and two cross-validation methods. The five genomic models predicted EBV and de-regressed EBV of the ten traits similarly well. Of the two cross-validation methods, leave-one-out cross-validation maximized the number of animals at the stage of training for genomic prediction. Genomic prediction accuracy (GPA) on the ten quantitative traits was validated in 1,106 newly genotyped Brangus animals based on the SNP effects estimated in the previous set of 3,797 Brangus animals, and they were slightly lower than GPA in the original data. The present study was the first to leverage currently available genotype and phenotype resources in order to harness genomic prediction in Brangus beef cattle. © 2018 Blackwell Verlag GmbH.

  14. Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies.

    Science.gov (United States)

    Shah, Jasmit S; Rai, Shesh N; DeFilippis, Andrew P; Hill, Bradford G; Bhatnagar, Aruni; Brock, Guy N

    2017-02-20

    High throughput metabolomics makes it possible to measure the relative abundances of numerous metabolites in biological samples, which is useful to many areas of biomedical research. However, missing values (MVs) in metabolomics datasets are common and can arise due to both technical and biological reasons. Typically, such MVs are substituted by a minimum value, which may lead to different results in downstream analyses. Here we present a modified version of the K-nearest neighbor (KNN) approach which accounts for truncation at the minimum value, i.e., KNN truncation (KNN-TN). We compare imputation results based on KNN-TN with results from other KNN approaches such as KNN based on correlation (KNN-CR) and KNN based on Euclidean distance (KNN-EU). Our approach assumes that the data follow a truncated normal distribution with the truncation point at the detection limit (LOD). The effectiveness of each approach was analyzed by the root mean square error (RMSE) measure as well as the metabolite list concordance index (MLCI) for influence on downstream statistical testing. Through extensive simulation studies and application to three real data sets, we show that KNN-TN has lower RMSE values compared to the other two KNN procedures as well as simpler imputation methods based on substituting missing values with the metabolite mean, zero values, or the LOD. MLCI values between KNN-TN and KNN-EU were roughly equivalent, and superior to the other four methods in most cases. Our findings demonstrate that KNN-TN generally has improved performance in imputing the missing values of the different datasets compared to KNN-CR and KNN-EU when there is missingness due to missing at random combined with an LOD. The results shown in this study are in the field of metabolomics but this method could be applicable with any high throughput technology which has missing due to LOD.

  15. Geographic variation in gorilla limb bones.

    Science.gov (United States)

    Jabbour, Rebecca S; Pearman, Tessa L

    2016-06-01

    Gorilla systematics has received increased attention over recent decades from primatologists, conservationists, and paleontologists. Studies of geographic variation in DNA, skulls, and teeth have led to new taxonomic proposals, such as recognition of two gorilla species, Gorilla gorilla (western gorilla) and Gorilla beringei (eastern gorilla). Postcranial differences between mountain gorillas (G. beringei beringei) and western lowland gorillas (G. g. gorilla) have a long history of study, but differences between the limb bones of the eastern and western species have not yet been examined with an emphasis on geographic variation within each species. In addition, proposals for recognition of the Cross River gorilla as Gorilla gorilla diehli and gorillas from Tshiaberimu and Kahuzi as G. b. rex-pymaeorum have not been evaluated in the context of geographic variation in the forelimb and hindlimb skeletons. Forty-three linear measurements were collected from limb bones of 266 adult gorillas representing populations of G. b. beringei, Gorilla beringei graueri, G. g. gorilla, and G. g. diehli in order to investigate geographic diversity. Skeletal elements included the humerus, radius, third metacarpal, third proximal hand phalanx, femur, tibia, calcaneus, first metatarsal, third metatarsal, and third proximal foot phalanx. Comparisons of means and principal components analyses clearly differentiate eastern and western gorillas, indicating that eastern gorillas have absolutely and relatively smaller hands and feet, among other differences. Gorilla subspecies and populations cluster consistently by species, although G. g. diehli may be similar to the eastern gorillas in having small hands and feet. The subspecies of G. beringei are distinguished less strongly and by different variables than the two gorilla species. Populations of G. b. graueri are variable, and Kahuzi and Tshiaberimu specimens do not cluster together. Results support the possible influence of

  16. NEPR Geographic Zone Map 2015

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This geographic zone map was created by interpreting satellite and aerial imagery, seafloor topography (bathymetry model), and the new NEPR Benthic Habitat Map...

  17. Geographic Profiling: Knowledge Through Prediction

    Science.gov (United States)

    2014-06-01

    acts. In general, geographic profiling “is based on crime pattern, routine activity, and rational choice theories from environmental criminology , a...Information Systems and Crime Analysis , ed. Fahui Wang (Hershey, PA: Idea Group, 2005), 104. 3 models to the actual outcomes and determine the...order to construct a geographic profile, the coordinates of crime scenes are entered into a software analysis program that contains an algorithm known

  18. Typicity in Potato: Characterization of Geographic Origin

    Directory of Open Access Journals (Sweden)

    Marco Manzelli

    2010-03-01

    Full Text Available A two-year study was carried out in three regions of Italy and the crop performance and the chemical composition of tubers of three typical potato varieties evaluated. Carbon and nitrogen tuber content was determined by means of an elemental analyzer and the other mineral elements by means of a spectrometer. The same determinations were performed on soil samples taken from experimental areas. The Principal Component Analysis, applied to the results of mineral element tuber analysis, permitted the classification of all potato tuber samples according to their geographic origin. Only a partial discrimination was obtained in function of potato varieties. Some correlations between mineral content in the tubers and in the soil were also detected. Analytical and statistical methods proved to be useful in verifying the authenticity of guaranteed geographical food denominations.

  19. Imputation of the Date of HIV Seroconversion in a Cohort of Seroprevalent Subjects: Implications for Analysis of Late HIV Diagnosis

    Directory of Open Access Journals (Sweden)

    Paz Sobrino-Vegas

    2012-01-01

    Full Text Available Objectives. Since subjects may have been diagnosed before cohort entry, analysis of late HIV diagnosis (LD is usually restricted to the newly diagnosed. We estimate the magnitude and risk factors of LD in a cohort of seroprevalent individuals by imputing seroconversion dates. Methods. Multicenter cohort of HIV-positive subjects who were treatment naive at entry, in Spain, 2004–2008. Multiple-imputation techniques were used. Subjects with times to HIV diagnosis longer than 4.19 years were considered LD. Results. Median time to HIV diagnosis was 2.8 years in the whole cohort of 3,667 subjects. Factors significantly associated with LD were: male sex; Sub-Saharan African, Latin-American origin compared to Spaniards; and older age. In 2,928 newly diagnosed subjects, median time to diagnosis was 3.3 years, and LD was more common in injecting drug users. Conclusions. Estimates of the magnitude and risk factors of LD for the whole cohort differ from those obtained for new HIV diagnoses.

  20. Insights into Diversity and Imputed Metabolic Potential of Bacterial Communities in the Continental Shelf of Agatti Island.

    Directory of Open Access Journals (Sweden)

    Shreyas V Kumbhare

    Full Text Available Marine microbes play a key role and contribute largely to the global biogeochemical cycles. This study aims to explore microbial diversity from one such ecological hotspot, the continental shelf of Agatti Island. Sediment samples from various depths of the continental shelf were analyzed for bacterial diversity using deep sequencing technology along with the culturable approach. Additionally, imputed metagenomic approach was carried out to understand the functional aspects of microbial community especially for microbial genes important in nutrient uptake, survival and biogeochemical cycling in the marine environment. Using culturable approach, 28 bacterial strains representing 9 genera were isolated from various depths of continental shelf. The microbial community structure throughout the samples was dominated by phylum Proteobacteria and harbored various bacterioplanktons as well. Significant differences were observed in bacterial diversity within a short region of the continental shelf (1-40 meters i.e. between upper continental shelf samples (UCS with lesser depths (i.e. 1-20 meters and lower continental shelf samples (LCS with greater depths (i.e. 25-40 meters. By using imputed metagenomic approach, this study also discusses several adaptive mechanisms which enable microbes to survive in nutritionally deprived conditions, and also help to understand the influence of nutrition availability on bacterial diversity.

  1. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.

    Science.gov (United States)

    Zhang, Zhaoyang; Fang, Hua; Wang, Honggang

    2016-06-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services.

  2. New Geographical Regionalization of Russia

    Directory of Open Access Journals (Sweden)

    Vladimir A. Gorbanyov

    2014-01-01

    Full Text Available After the October Revolution there was an urgent need for a new economic zoning of Russia. Under the leadership of the greatest scientists the Soviet Union was divided in to economic regions. After the collapse of the USSR and the emergence of a market economy, these regions have lost their meaning, and the new were not created. Therefore there was a need of a new zoning Russia, and not by economic regions, but by the complex geographical regions. This is a difficult task, but because of the author's opinion that geography should be a single, new geographical areas should reflect their historical, natural, economic, social, cultural view specifics. This approach will promote rational geographical division of labor in the country in a market economy. An attempt to new geographical zoning, with 10 allocated geographic regions: Capital, Central, European North, European South, North Caucasus, Volga-Urals, Western Siberia, Southern Siberia, Northern Siberia and the Far East. For each region area, population size and density, level of urbanization, natural, mechanical and overall population growth, GDP per capita, the structure of employment, Human Development Index were counted, and the appropriate analysis were made.

  3. Development of new historical global Nitrogen fertilizer map and the evaluation of their impacts on terrestrial N cycling and the evaluation of their impacts on terrestrial N cycling

    Science.gov (United States)

    Nishina, K.; Ito, A.; Hayashi, S.

    2015-12-01

    The use of synthetic nitrogen fertilizer was rapidly growing up after the birth of Haber-Bosch process in the early 20th century. The recent N loading derived from these sources on terrestrial ecosystems was estimated 2 times higher than biogenic N fixation in terrestrial ecosystems (Gruber et al., 2009). However, there are still large uncertainties in cumulative N impacts on terrestrial impact at global scale. In this study, to assess historical N impacts at global scale, we made a new global N fertilizer input map, which was a spatial-temporal explicit map (during 1960-2010) and considered the fraction of NH4+ and NO3- in the N fertilizer inputs. With the developed N fertilizer map, we evaluated historical N20 cycling changes by land-use changes and N depositions in N cycling using ecosystem model 'VISIT'. Prior to the downscaling processes for global N fertilizer map, we applied the statistical data imputation to FAOSTAT data due to there existing many missing data especially in developing countries. For the data imputation, we used multiple data imputation method proposed by Honaker & King (2010). The statistics of various types of synthetic fertilizer consumption are available in FAOSTAT, which can be sorted by the content of NH4+ and NO3-, respectively. To downscaling the country by country N fertilizer consumptions data to the 0.5˚x 0.5˚ grid-based map, we used historical land-use map in Earthstat (Rumankutty et al., 1999). Before the assignment of N fertilizer in each grid, we weighted the double cropping regions to be more N fertilizer input on to these regions. Using M3-Crops Data (Monfreda et al., 2008), we picked up the dominant cropping species in each grid cell. After that, we used Crop Calendar in SAGE dataset (Sacks et al., 2010) and determined schedule of N fertilizer input in each grid cell using dominant crop calendar. Base fertilizer was set to be 7 days before transplanting and second fertilizer to be 30 days after base fertilizer application

  4. Ontology-driven geographic information systems

    Science.gov (United States)

    Fonseca, Frederico Torres

    Information integration is the combination of different types of information in a framework so that it can be queried, retrieved, and manipulated. Integration of geographic data has gained in importance because of the new possibilities arising from the interconnected world and the increasing availability of geographic information. Many times the need for information is so pressing that it does not matter if some details are lost, as long as integration is achieved. To integrate information across computerized information systems it is necessary first to have explicit formalizations of the mental concepts that people have about the real world. Furthermore, these concepts need to be grouped by communities in order to capture the basic agreements that exist within different communities. The explicit formalization of the mental models within a community is an ontology. This thesis introduces a framework for the integration of geographic information. We use ontologies as the foundation of this framework. By integrating ontologies that are linked to sources of geographic information we allow for the integration of geographic information based primarily on its meaning. Since the integration may occurs across different levels, we also create the basic mechanisms for enabling integration across different levels of detail. The use of an ontology, translated into an active, information-system component, leads Ontology-Driven Geographic Information Systems. The results of this thesis show that a model that incorporates hierarchies and roles has the potential to integrate more information than models that do not incorporate these concepts. We developed a methodology to evaluate the influence of the use of roles and of hierarchical structures for representing ontologies on the potential for information integration. The use of a hierarchical structure increases the potential for information integration. The use of roles also improves the potential for information integration

  5. Image Processing and Geographic Information

    Science.gov (United States)

    McLeod, Ronald G.; Daily, Julie; Kiss, Kenneth

    1985-12-01

    A Geographic Information System, which is a product of System Development Corporation's Image Processing System and a commercially available Data Base Management System, is described. The architecture of the system allows raster (image) data type, graphics data type, and tabular data type input and provides for the convenient analysis and display of spatial information. A variety of functions are supported through the Geographic Information System including ingestion of foreign data formats, image polygon encoding, image overlay, image tabulation, costatistical modelling of image and tabular information, and tabular to image conversion. The report generator in the DBMS is utilized to prepare quantitative tabular output extracted from spatially referenced images. An application of the Geographic Information System to a variety of data sources and types is highlighted. The application utilizes sensor image data, graphically encoded map information available from government sources, and statistical tables.

  6. A Geographical Heuristic Routing Protocol for VANETs

    Directory of Open Access Journals (Sweden)

    Luis Urquiza-Aguiar

    2016-09-01

    Full Text Available Vehicular ad hoc networks (VANETs leverage the communication system of Intelligent Transportation Systems (ITS. Recently, Delay-Tolerant Network (DTN routing protocols have increased their popularity among the research community for being used in non-safety VANET applications and services like traffic reporting. Vehicular DTN protocols use geographical and local information to make forwarding decisions. However, current proposals only consider the selection of the best candidate based on a local-search. In this paper, we propose a generic Geographical Heuristic Routing (GHR protocol that can be applied to any DTN geographical routing protocol that makes forwarding decisions hop by hop. GHR includes in its operation adaptations simulated annealing and Tabu-search meta-heuristics, which have largely been used to improve local-search results in discrete optimization. We include a complete performance evaluation of GHR in a multi-hop VANET simulation scenario for a reporting service. Our study analyzes all of the meaningful configurations of GHR and offers a statistical analysis of our findings by means of MANOVA tests. Our results indicate that the use of a Tabu list contributes to improving the packet delivery ratio by around 5% to 10%. Moreover, if Tabu is used, then the simulated annealing routing strategy gets a better performance than the selection of the best node used with carry and forwarding (default operation.

  7. Geographic Analysis of the Radiation Oncology Workforce

    Energy Technology Data Exchange (ETDEWEB)

    Aneja, Sanjay [Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT (United States); Cancer Outcomes, Policy, and Effectiveness Research Center at Yale, New Haven, CT (United States); Smith, Benjamin D. [University of Texas M. D. Anderson Cancer Center, Houston, TX (United States); Gross, Cary P. [Cancer Outcomes, Policy, and Effectiveness Research Center at Yale, New Haven, CT (United States); Department of General Internal Medicine, Yale University School of Medicine, New Haven, CT (United States); Wilson, Lynn D. [Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT (United States); Haffty, Bruce G. [Cancer Institute of New Jersey, New Brunswick, NJ (United States); Roberts, Kenneth [Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT (United States); Yu, James B., E-mail: james.b.yu@yale.edu [Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT (United States); Cancer Outcomes, Policy, and Effectiveness Research Center at Yale, New Haven, CT (United States)

    2012-04-01

    Purpose: To evaluate trends in the geographic distribution of the radiation oncology (RO) workforce. Methods and Materials: We used the 1995 and 2007 versions of the Area Resource File to map the ratio of RO to the population aged 65 years or older (ROR) within different health service areas (HSA) within the United States. We used regression analysis to find associations between population variables and 2007 ROR. We calculated Gini coefficients for ROR to assess the evenness of RO distribution and compared that with primary care physicians and total physicians. Results: There was a 24% increase in the RO workforce from 1995 to 2007. The overall growth in the RO workforce was less than that of primary care or the overall physician workforce. The mean ROR among HSAs increased by more than one radiation oncologist per 100,000 people aged 65 years or older, from 5.08 per 100,000 to 6.16 per 100,000. However, there remained consistent geographic variability concerning RO distribution, specifically affecting the non-metropolitan HSAs. Regression analysis found higher ROR in HSAs that possessed higher education (p = 0.001), higher income (p < 0.001), lower unemployment rates (p < 0.001), and higher minority population (p = 0.022). Gini coefficients showed RO distribution less even than for both primary care physicians and total physicians (0.326 compared with 0.196 and 0.292, respectively). Conclusions: Despite a modest growth in the RO workforce, there exists persistent geographic maldistribution of radiation oncologists allocated along socioeconomic and racial lines. To solve problems surrounding the RO workforce, issues concerning both gross numbers and geographic distribution must be addressed.

  8. Diagnosis and outcomes of acute kidney injury using surrogate and imputation methods for missing preadmission creatinine values.

    Science.gov (United States)

    Bernier-Jean, Amélie; Beaubien-Souligny, William; Goupil, Rémi; Madore, François; Paquette, François; Troyanov, Stéphan; Bouchard, Josée

    2017-04-28

    Missing preadmission serum creatinine (SCr) values are a common obstacle to assess acute kidney injury (AKI) diagnosis and outcomes. The Kidney Disease Improving Global Outcomes (KDIGO) guidelines suggest using a SCr computed from the Modification of Diet in Renal Disease (MDRD) with an estimated glomerular filtration rate of 75 ml/min/1.73 m2. We aimed to identify the best surrogate method for baseline SCr to assess AKI diagnosis and outcomes. We compared the use of 1) first SCr at hospital admission 2) minimal SCr over 2 weeks after intensive care unit admission 3) MDRD computed SCr and 4) Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) computed SCr to assess AKI diagnosis and outcomes. We then performed multilinear regression models to predict preadmission SCr and imputation strategies to assess AKI diagnosis. Our one-year retrospective cohort study included 1001 critically ill adults; 498 of them had preadmission SCr values. In these patients, AKI incidence was 25.1% using preadmission SCr. First SCr had the best agreement for AKI diagnosis (22.5%; kappa = 0.90) and staging (kappa = 0.81). MDRD, CKD-EPI and minimal SCr overestimated AKI diagnosis (26.7%, 27.1% and 43.2%;kappa = 0.86, 0.86 and 0.60, respectively). However, MDRD and CKD-EPI computed SCr had a better sensitivity than first SCr for AKI (93% and 94% vs. 87%). Eighty-eight percent of patients experienced renal recovery at least 3 months after hospital discharge. All methods except the first SCr significantly underestimated the percentage of renal recovery. In a multivariate model, age, male gender, hypertension, heart failure, undergoing surgery and log first SCr best predicted preadmission SCr (adjusted R2 = 0.56). Imputation methods with first SCr increased AKI incidence to 23.9% (kappa = 0.92) but not with MDRD computed SCr (26.7%;kappa = 0.89). In our cohort, first SCr performed better for AKI diagnosis and staging, as well as for renal recovery after

  9. Addressing geographic disparities in liver transplantation through redistricting.

    Science.gov (United States)

    Gentry, S E; Massie, A B; Cheek, S W; Lentine, K L; Chow, E H; Wickliffe, C E; Dzebashvili, N; Salvalaggio, P R; Schnitzler, M A; Axelrod, D A; Segev, D L

    2013-08-01

    Severe geographic disparities exist in liver transplantation; for patients with comparable disease severity, 90-day transplant rates range from 18% to 86% and death rates range from 14% to 82% across donation service areas (DSAs). Broader sharing has been proposed to resolve geographic inequity; however, we hypothesized that the efficacy of broader sharing depends on the geographic partitions used. To determine the potential impact of redistricting on geographic disparity in disease severity at transplantation, we combined existing DSAs into novel regions using mathematical redistricting optimization. Optimized maps and current maps were evaluated using the Liver Simulated Allocation Model. Primary analysis was based on 6700 deceased donors, 28 063 liver transplant candidates, and 242 727 Model of End-Stage Liver Disease (MELD) changes in 2010. Fully regional sharing within the current regional map would paradoxically worsen geographic disparity (variance in MELD at transplantation increases from 11.2 to 13.5, p = 0.021), although it would decrease waitlist deaths (from 1368 to 1329, p = 0.002). In contrast, regional sharing within an optimized map would significantly reduce geographic disparity (to 7.0, p = 0.002) while achieving a larger decrease in waitlist deaths (to 1307, p = 0.002). Redistricting optimization, but not broader sharing alone, would reduce geographic disparity in allocation of livers for transplant across the United States. © Copyright 2013 The American Society of Transplantation and the American Society of Transplant Surgeons.

  10. Geographical Information in Virtual Environments

    NARCIS (Netherlands)

    Loo, J. van; Lawick van Pabst, J. van

    1998-01-01

    We studied the combination of a Geographical Information System (GIS) and a Virtual Environment (VE). The goal was to establish a bi-directional link between a GIS and a virtual environment. The first step was to combine three types of data to build the 3D world and store it into the GIS: a digital

  11. The Andes: A Geographical Portrait

    Directory of Open Access Journals (Sweden)

    Anthony Bebbington

    2016-05-01

    Full Text Available Reviewed: The Andes: A Geographical Portrait. By Axel Borsdorf and Christoph Stadel. Translated by Brigitte Scott and Christoph Stadel. Cham, Switzerland: Springer International Publishing, 2015. xiv + 368 pp. US$ 139.00. Also available as an e-book. ISBN 978-3-319-03529-1.

  12. Geographic Projection of Cluster Composites

    NARCIS (Netherlands)

    Nerbonne, J.; Bosveld-de Smet, L.M.; Kleiweg, P.; Blackwell, A.; Marriott, K.; Shimojima, A.

    2004-01-01

    A composite cluster map displays a fuzzy categorisation of geographic areas. It combines information from several sources to provide a visualisation of the significance of cluster borders. The basic technique renders the chance that two neighbouring locations are members of different clusters as the

  13. Geographic Literacy through Children's Literature.

    Science.gov (United States)

    Rogers, Linda K.

    This activity-centered approach to teaching geography to elementary students is based on children's literature. Methodologies for teaching the themes of geography involving children's books and stories are given. The book is organized into five chapters: (1) "Geographic Literacy in Curriculum" gives instructions for using the book and provides…

  14. Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels

    DEFF Research Database (Denmark)

    Van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Deelen, Joris

    2015-01-01

    Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (∼35,000 samples) with the population-specific reference panel crea...

  15. A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data

    NARCIS (Netherlands)

    Y.J. Kim (Young Jin); J. Lee (Juyoung); B.-J. Kim (Bong-Jo); T. Park (Taesung); G.R. Abecasis (Gonçalo); M.A.A. De Almeida (Marcio); D. Altshuler (David); J.L. Asimit (Jennifer L.); G. Atzmon (Gil); M. Barber (Mathew); A. Barzilai (Ari); N.L. Beer (Nicola L.); G.I. Bell (Graeme I.); J. Below (Jennifer); T. Blackwell (Tom); J. Blangero (John); M. Boehnke (Michael); D.W. Bowden (Donald W.); N.P. Burtt (Noël); J.C. Chambers (John); H. Chen (Han); P. Chen (Ping); P.S. Chines (Peter); S. Choi (Sungkyoung); C. Churchhouse (Claire); P. Cingolani (Pablo); B.K. Cornes (Belinda); N.J. Cox (Nancy); A.G. Day-Williams (Aaron); A. Duggirala (Aparna); J. Dupuis (Josée); T. Dyer (Thomas); S. Feng (Shuang); J. Fernandez-Tajes (Juan); T. Ferreira (Teresa); T.E. Fingerlin (Tasha E.); J. Flannick (Jason); J.C. Florez (Jose); P. Fontanillas (Pierre); T.M. Frayling (Timothy); C. Fuchsberger (Christian); E. Gamazon (Eric); K. Gaulton (Kyle); S. Ghosh (Saurabh); B. Glaser (Benjamin); A.L. Gloyn (Anna); R.L. Grossman (Robert L.); J. Grundstad (Jason); C. Hanis (Craig); A. Heath (Allison); H. Highland (Heather); M. Horikoshi (Momoko); I.-S. Huh (Ik-Soo); J.R. Huyghe (Jeroen R.); M.K. Ikram (Kamran); K.A. Jablonski (Kathleen); Y. Jun (Yang); N. Kato (Norihiro); J. Kim (Jayoun); Y.J. Kim (Young Jin); B.-J. Kim (Bong-Jo); J. Lee (Juyoung); C.R. King (C. Ryan); J.S. Kooner (Jaspal S.); M.-S. Kwon (Min-Seok); H.K. Im (Hae Kyung); M. Laakso (Markku); K.K.-Y. Lam (Kevin Koi-Yau); J. Lee (Jaehoon); S. Lee (Selyeong); S. Lee (Sungyoung); D.M. Lehman (Donna M.); H. Li (Heng); C.M. Lindgren (Cecilia); X. Liu (Xuanyao); O.E. Livne (Oren E.); A.E. Locke (Adam E.); A. Mahajan (Anubha); J.B. Maller (Julian B.); A.K. Manning (Alisa K.); T.J. Maxwell (Taylor J.); A. Mazoure (Alexander); M.I. McCarthy (Mark); J.B. Meigs (James B.); B. Min (Byungju); K.L. Mohlke (Karen); A.P. Morris (Andrew); S. Musani (Solomon); Y. Nagai (Yoshihiko); M.C.Y. Ng (Maggie C.Y.); D. Nicolae (Dan); S. Oh (Sohee); N.D. Palmer (Nicholette); T. Park (Taesung); T.I. Pollin (Toni I.); I. Prokopenko (Inga); D. Reich (David); M.A. Rivas (Manuel); L.J. Scott (Laura); M. Seielstad (Mark); Y.S. Cho (Yoon Shin); X. Sim (Xueling); R. Sladek (Rob); P. Smith (Philip); I. Tachmazidou (Ioanna); E.S. Tai (Shyong); Y.Y. Teo (Yik Ying); T.M. Teslovich (Tanya M.); J. Torres (Jason); V. Trubetskoy (Vasily); S.M. Willems (Sara); A.L. Williams (Amy L.); J.G. Wilson (James); S. Wiltshire (Steven); S. Won (Sungho); A.R. Wood (Andrew); W. Xu (Wang); J. Yoon (Joon); M. Zawistowski (Matthew); E. Zeggini (Eleftheria); W. Zhang (Weihua); S. Zöllner (Sebastian)

    2015-01-01

    textabstractBackground: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the

  16. Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels

    Science.gov (United States)

    van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Deelen, Joris; Isaacs, Aaron; Medina-Gomez, Carolina; Mbarek, Hamdi; Kanterakis, Alexandros; Trompet, Stella; Postmus, Iris; Verweij, Niek; van Enckevort, David J.; Huffman, Jennifer E.; White, Charles C.; Feitosa, Mary F.; Bartz, Traci M.; Manichaikul, Ani; Joshi, Peter K.; Peloso, Gina M.; Deelen, Patrick; van Dijk, Freerk; Willemsen, Gonneke; de Geus, Eco J.; Milaneschi, Yuri; Penninx, Brenda W.J.H.; Francioli, Laurent C.; Menelaou, Androniki; Pulit, Sara L.; Rivadeneira, Fernando; Hofman, Albert; Oostra, Ben A.; Franco, Oscar H.; Leach, Irene Mateo; Beekman, Marian; de Craen, Anton J.M.; Uh, Hae-Won; Trochet, Holly; Hocking, Lynne J.; Porteous, David J.; Sattar, Naveed; Packard, Chris J.; Buckley, Brendan M.; Brody, Jennifer A.; Bis, Joshua C.; Rotter, Jerome I.; Mychaleckyj, Josyf C.; Campbell, Harry; Duan, Qing; Lange, Leslie A.; Wilson, James F.; Hayward, Caroline; Polasek, Ozren; Vitart, Veronique; Rudan, Igor; Wright, Alan F.; Rich, Stephen S.; Psaty, Bruce M.; Borecki, Ingrid B.; Kearney, Patricia M.; Stott, David J.; Adrienne Cupples, L.; Neerincx, Pieter B.T.; Elbers, Clara C.; Francesco Palamara, Pier; Pe'er, Itsik; Abdellaoui, Abdel; Kloosterman, Wigard P.; van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F.J.; Stoneking, Mark; de Knijff, Peter; Kayser, Manfred; Veldink, Jan H.; van den Berg, Leonard H.; Byelas, Heorhiy; den Dunnen, Johan T.; Dijkstra, Martijn; Amin, Najaf; Joeri van der Velde, K.; van Setten, Jessica; Kattenberg, Mathijs; van Schaik, Barbera D.C.; Bot, Jan; Nijman, Isaäc J.; Mei, Hailiang; Koval, Vyacheslav; Ye, Kai; Lameijer, Eric-Wubbo; Moed, Matthijs H.; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Sunyaev, Shamil R.; Sohail, Mashaal; Hormozdiari, Fereydoun; Marschall, Tobias; Schönhuth, Alexander; Guryev, Victor; Suchiman, H. Eka D.; Wolffenbuttel, Bruce H.; Platteel, Mathieu; Pitts, Steven J.; Potluri, Shobha; Cox, David R.; Li, Qibin; Li, Yingrui; Du, Yuanping; Chen, Ruoyan; Cao, Hongzhi; Li, Ning; Cao, Sujie; Wang, Jun; Bovenberg, Jasper A.; Jukema, J. Wouter; van der Harst, Pim; Sijbrands, Eric J.; Hottenga, Jouke-Jan; Uitterlinden, Andre G.; Swertz, Morris A.; van Ommen, Gert-Jan B.; de Bakker, Paul I.W.; Eline Slagboom, P.; Boomsma, Dorret I.; Wijmenga, Cisca; van Duijn, Cornelia M.

    2015-01-01

    Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (~35,000 samples) with the population-specific reference panel created by the Genome of the Netherlands Project and perform association testing with blood lipid levels. We report the discovery of five novel associations at four loci (P value <6.61 × 10−4), including a rare missense variant in ABCA6 (rs77542162, p.Cys1359Arg, frequency 0.034), which is predicted to be deleterious. The frequency of this ABCA6 variant is 3.65-fold increased in the Dutch and its effect (βLDL-C=0.135, βTC=0.140) is estimated to be very similar to those observed for single variants in well-known lipid genes, such as LDLR. PMID:25751400

  17. The significant surface-water connectivity of "geographically isolated wetlands"

    Science.gov (United States)

    Calhoun, Aram J.K.; Mushet, David M.; Alexander, Laurie C.; DeKeyser, Edward S.; Fowler, Laurie; Lane, Charles R.; Lang, Megan W.; Rains, Mark C.; Richter, Stephen; Walls, Susan

    2017-01-01

    We evaluated the current literature, coupled with our collective research expertise, on surface-water connectivity of wetlands considered to be “geographically isolated” (sensu Tiner Wetlands 23:494–516, 2003a) to critically assess the scientific foundation of grouping wetlands based on the singular condition of being surrounded by uplands. The most recent research on wetlands considered to be “geographically isolated” shows the difficulties in grouping an ecological resource that does not reliably indicate lack of surface water connectivity in order to meet legal, regulatory, or scientific needs. Additionally, the practice of identifying “geographically isolated wetlands” based on distance from a stream can result in gross overestimates of the number of wetlands lacking ecologically important surface-water connections. Our findings do not support use of the overly simplistic label of “geographically isolated wetlands”. Wetlands surrounded by uplands vary in function and surface-water connections based on wetland landscape setting, context, climate, and geographic region and should be evaluated as such. We found that the “geographically isolated” grouping does not reflect our understanding of the hydrologic variability of these wetlands and hence does not benefit conservation of the Nation’s diverse wetland resources. Therefore, we strongly discourage use of categorizations that provide overly simplistic views of surface-water connectivity of wetlands fully embedded in upland landscapes.

  18. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

    Directory of Open Access Journals (Sweden)

    Momoko Horikoshi

    2015-07-01

    Full Text Available Reference panels from the 1000 Genomes (1000G Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS, supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI at genome-wide significance, and two for fasting glucose (FG, none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3 and FG (GCK and G6PC2. The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.

  19. Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons

    Directory of Open Access Journals (Sweden)

    Mohammad Reza Baneshi

    2013-05-01

    Full Text Available Background Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern, to be addressed here, is the role of the pattern of missing data. Methods We used information of 2720 prisoners. Results derived from fitting regression model to whole data were served as gold standard. Missing data were then generated so that 10%, 20% and 50% of data were lost. In scenario 1, we generated missing values, at above rates, in one variable which was significant in gold model (age. In scenario 2, a small proportion of each of independent variable was dropped out. Four imputation methods, under different Event Per Variable (EPV values, were compared in terms of selection of important variables and parameter estimation. Results In scenario 2, bias in estimates was low and performances of all method for handing missing data were similar. All methods at all missing rates were able to detect significance of age. In scenario 1, biases in estimations were increased, in particular at 50% missing rate. Here at EPVs of 10 and 5, imputation methods failed to capture effect of age. Conclusion In scenario 2, all imputation methods at all missing rates, were able to detect age as being significant. This was not the case in scenario 1. Our results showed that performance of imputation methods depends on the pattern of missing data.

  20. Assesing Geographic Isolation of the Galapagos Islands

    Science.gov (United States)

    Orellana, D.; Smith, F.

    2016-06-01

    The Galapagos Archipelago is one of the most important ecological spots in the planet due its unique biodiversity, active geology, and relatively well-preserved ecosystems. These characteristics are strongly based on the geographical isolation of the islands. On the one hand this isolation allowed the evolution processes that gave the islands their international fame and on the other hand it kept them from major human impacts that affected the vast majority of the Earth's surface. Galapagos' geographical isolation is therefore of mayor value, but it is rapidly diminishing due to the increase of marine and air transportation among islands and with the rest of the world. This increased accessibility implies enhanced risks for the ecological dynamics on the archipelago (e.g. increased risk of biological invasions, uncontrolled tourism growth, more water and energy consumption). Here, we introduce a general accessibility model to assess geographical isolation of the Galapagos Islands. The model aims to characterize accessibility in terms of human mobility by evaluating travel time to each point of the archipelago using all available transportation modalities. Using a multi criteria cost surface for marine and land areas, we estimated travel time for each surface unit using the fastest route and mode of transportation available while considering several friction factors such as surface type, slope, infrastructure, transfer points, legal restrictions, and physical barriers. We created maps to evaluate the isolation of different islands and places, highlighting the potential risks for several habitats and ecosystems. The model can be used for research and decision-making regarding island conservation, such as estimating spreading paths for invasive species, informing decisions on tourism management, and monitoring isolation changes of sensitive ecosystems.

  1. ASSESING GEOGRAPHIC ISOLATION OF THE GALAPAGOS ISLANDS

    Directory of Open Access Journals (Sweden)

    D. Orellana

    2016-06-01

    Full Text Available The Galapagos Archipelago is one of the most important ecological spots in the planet due its unique biodiversity, active geology, and relatively well-preserved ecosystems. These characteristics are strongly based on the geographical isolation of the islands. On the one hand this isolation allowed the evolution processes that gave the islands their international fame and on the other hand it kept them from major human impacts that affected the vast majority of the Earth’s surface. Galapagos’ geographical isolation is therefore of mayor value, but it is rapidly diminishing due to the increase of marine and air transportation among islands and with the rest of the world. This increased accessibility implies enhanced risks for the ecological dynamics on the archipelago (e.g. increased risk of biological invasions, uncontrolled tourism growth, more water and energy consumption. Here, we introduce a general accessibility model to assess geographical isolation of the Galapagos Islands. The model aims to characterize accessibility in terms of human mobility by evaluating travel time to each point of the archipelago using all available transportation modalities. Using a multi criteria cost surface for marine and land areas, we estimated travel time for each surface unit using the fastest route and mode of transportation available while considering several friction factors such as surface type, slope, infrastructure, transfer points, legal restrictions, and physical barriers. We created maps to evaluate the isolation of different islands and places, highlighting the potential risks for several habitats and ecosystems. The model can be used for research and decision-making regarding island conservation, such as estimating spreading paths for invasive species, informing decisions on tourism management, and monitoring isolation changes of sensitive ecosystems.

  2. Correlation Assessment of Climate and Geographic Distribution of Tuberculosis Using Geographical Information System (GIS).

    Science.gov (United States)

    Beiranvand, Reza; Karimi, Asrin; Delpisheh, Ali; Sayehmiri, Kourosh; Soleimani, Samira; Ghalavandi, Shahnaz

    2016-01-01

    Tuberculosis (TB) spread pattern is influenced by geographic and social factors. Nowadays Geographic Information System (GIS) is one of the most important epidemiological instrumentation identifying high-risk population groups and geographic areas of TB. The aim of this study was to determine the correlation between climate and geographic distribution of TB in Khuzestan Province using GIS during 2005-2012. Through an ecological study, all 6363 patients with definite diagnosis of TB from 2005 until the end of September 2012 in Khuzestan Province, southern Iran were diagnosed. Data were recorded using TB- Register software. Tuberculosis incidence based on the climate and the average of annual rain was evaluated using GIS. Data were analyzed through SPSS software. Independent t-test, ANOVA, Linear regression, Pearson and Eta correlation coefficient with a significance level of less than 5% were used for the statistical analysis. The TB incidence was different in various geographic conditions. The highest mean of TB cumulative incidence rate was observed in extra dry areas (P= 0.017). There was a significant inverse correlation between annual rain rate and TB incidence rate (R= -0.45, P= 0.001). The lowest TB incidence rate (0-100 cases per 100,000) was in areas with the average of annual rain more than 1000 mm (P= 0.003). The risk of TB has a strong relationship with climate and the average of annual rain, so that the risk of TB in areas with low annual rainfall and extra dry climate is more than other regions. Services and special cares to high-risk regions of TB are recommended.

  3. Geographic Information System Data Analysis

    Science.gov (United States)

    Billings, Chad; Casad, Christopher; Floriano, Luis G.; Hill, Tracie; Johnson, Rashida K.; Locklear, J. Mark; Penn, Stephen; Rhoulac, Tori; Shay, Adam H.; Taylor, Antone; hide

    1995-01-01

    Data was collected in order to further NASA Langley Research Center's Geographic Information System(GIS). Information on LaRC's communication, electrical, and facility configurations was collected. Existing data was corrected through verification, resulting in more accurate databases. In addition, Global Positioning System(GPS) points were used in order to accurately impose buildings on digitized images. Overall, this project will help the Imaging and CADD Technology Team (ICTT) prove GIS to be a valuable resource for LaRC.

  4. Geographic Names Information System (GNIS) Antarctica Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  5. Geographic Names Information System (GNIS) Landform Features

    Data.gov (United States)

    Department of Homeland Security — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  6. The Geographer's Concern With Natural Hazard Studies

    Science.gov (United States)

    Oliver, J.

    1975-01-01

    Discusses the interest of geographers in the interplay of physical and human influences in fashioning geographical patterns. Methods of stimulating student interest in geography are suggested. For journal availability, see SO 505 212. (Author/DB)

  7. Collaborative Geographic Information Systems for Business Intelligence

    OpenAIRE

    Juan José Ramírez

    2009-01-01

    This paper shows a number of sceneries where information (specifically, geographical-related information) is lost because there is no method for storing or sharing it. This research has been done with the aim to solve those scenery problems in a general way, by means of a geographical information system that can store geographical-related information and publish it in order to avoid loss of information and enabling geographical information sharing

  8. Collaborative Geographic Information Systems for Business Intelligence

    Directory of Open Access Journals (Sweden)

    Juan José Ramírez

    2009-12-01

    Full Text Available This paper shows a number of sceneries where information (specifically, geographical-related information is lost because there is no method for storing or sharing it. This research has been done with the aim to solve those scenery problems in a general way, by means of a geographical information system that can store geographical-related information and publish it in order to avoid loss of information and enabling geographical information sharing

  9. Human Geography and the Geographical Imagination.

    Science.gov (United States)

    Norton, William

    1989-01-01

    Discusses the development of human geography, pointing out linkages between human geography and sociology. Defines sociological imagination, summarizing the logic behind it. Provides arguments for a parallel geographical imagination, and assesses the extent to which geographers exhibit a geographical imagination. (LS)

  10. Imputing at-work productivity loss using results of a randomized controlled trial comparing tapentadol extended release and oxycodone controlled release for osteoarthritis pain.

    Science.gov (United States)

    Lerner, Debra; Chang, Hong; Rogers, William H; Benson, Carmela; Chow, Wing; Kim, Myoung S; Biondi, David

    2012-08-01

    : To determine the impact of tapentadol extended release (ER) versus placebo or oxycodone controlled release (CR) on the work productivity of adults with chronic moderate to severe knee osteoarthritis pain. : Using clinical trial data on pain outcomes, a validated methodology imputed treatment group differences in at-work productivity and associated differences in productivity costs (assuming a $100,000 annual salary per participant). : Imputed improvements in at-work productivity were significantly greater for tapentadol ER compared with either placebo (mean, 1.96% vs 1.51%; P = 0.001) or oxycodone CR (mean, 1.96% vs 1.40%; P employees to function better at work and reduce their employers' productivity costs.

  11. Natural Scales in Geographical Patterns

    Science.gov (United States)

    Menezes, Telmo; Roth, Camille

    2017-04-01

    Human mobility is known to be distributed across several orders of magnitude of physical distances, which makes it generally difficult to endogenously find or define typical and meaningful scales. Relevant analyses, from movements to geographical partitions, seem to be relative to some ad-hoc scale, or no scale at all. Relying on geotagged data collected from photo-sharing social media, we apply community detection to movement networks constrained by increasing percentiles of the distance distribution. Using a simple parameter-free discontinuity detection algorithm, we discover clear phase transitions in the community partition space. The detection of these phases constitutes the first objective method of characterising endogenous, natural scales of human movement. Our study covers nine regions, ranging from cities to countries of various sizes and a transnational area. For all regions, the number of natural scales is remarkably low (2 or 3). Further, our results hint at scale-related behaviours rather than scale-related users. The partitions of the natural scales allow us to draw discrete multi-scale geographical boundaries, potentially capable of providing key insights in fields such as epidemiology or cultural contagion where the introduction of spatial boundaries is pivotal.

  12. Addressing Semantic Geographic Information Systems

    Directory of Open Access Journals (Sweden)

    Salvatore F. Pileggi

    2013-11-01

    Full Text Available The progressive consolidation of information technologies on a large scale has been facilitating and progressively increasing the production, collection, and diffusion of geographic data, as well as facilitating the integration of a large amount of external information into geographic information systems (GIS. Traditional GIS is transforming into a consolidated information infrastructure. This consolidated infrastructure is affecting more and more aspects of internet computing and services. Most popular systems (such as social networks, GPS, and decision support systems involve complex GIS and significant amounts of information. As a web service, GIS is affected by exactly the same problems that affect the web as a whole. Therefore, next generation GIS solutions have to address further methodological and data engineering challenges in order to accommodate new applications’ extended requirements (in terms of scale, interoperability, and complexity. The conceptual and semantic modeling of GIS, as well as the integration of semantics into current GIS, provide highly expressive environments that are capable of meeting the needs and requirements of a wide range of applications.

  13. OUTDOOR EDUCATION AND GEOGRAPHICAL EDUCATION

    Directory of Open Access Journals (Sweden)

    ANDREA GUARAN

    2016-01-01

    Full Text Available This paper focuses on the reflection on the relationship between values and methodological principles of Outdoor Education and spatial and geographical education perspectives, especially in pre-school and primary school, which relates to the age between 3 and 10 years. Outdoor Education is an educational practice that is already rooted in the philosophical thought of the 16th and the 17th centuries, from John Locke to Jean-Jacques Rousseau, and in the pedagogical thought, in particular Friedrich Fröbel, and it has now a quite stable tradition in Northern Europe countries. In Italy, however, there are still few experiences and they usually do not have a systematic and structural modality, but rather a temporarily and experimentally outdoor organization. In the first part, this paper focuses on the reasons that justify a particular attention to educational paths that favour outdoors activities, providing also a definition of outdoor education and highlighting its values. It is also essential to understand that educational programs in open spaces, such as a forest or simply the schoolyard, surely offers the possibility to learn geographical situations. Therefore, the question that arises is how to finalize the best stimulus that the spatial location guarantees for the acquisition of knowledge, skills and abilities about space and geography.

  14. Geographic profiling and animal foraging.

    Science.gov (United States)

    Le Comber, Steven C; Nicholls, Barry; Rossmo, D Kim; Racey, Paul A

    2006-05-21

    Geographic profiling was originally developed as a statistical tool for use in criminal cases, particularly those involving serial killers and rapists. It is designed to help police forces prioritize lists of suspects by using the location of crime scenes to identify the areas in which the criminal is most likely to live. Two important concepts are the buffer zone (criminals are less likely to commit crimes in the immediate vicinity of their home) and distance decay (criminals commit fewer crimes as the distance from their home increases). In this study, we show how the techniques of geographic profiling may be applied to animal data, using as an example foraging patterns in two sympatric colonies of pipistrelle bats, Pipistrellus pipistrellus and P. pygmaeus, in the northeast of Scotland. We show that if model variables are fitted to known roost locations, these variables may be used as numerical descriptors of foraging patterns. We go on to show that these variables can be used to differentiate patterns of foraging in these two species.

  15. Partial imputation to improve predictive modelling in insurance risk classification using a hybrid positive selection algorithm and correlation-based feature selection

    CSIR Research Space (South Africa)

    Duma, M

    2013-09-01

    Full Text Available accuracy and resilience of supervised learning methods improve significantly when applied with the imputation strategy under these assumptions. Keywords: Insurance risk classification, missing data, positive selection, supervised learning. DATASETS... environment. The experiments were conducted on syn- thetic and real-world datasets and the results show excep- tional estimation accuracy, with error rates as low as 0.78% and 1.7% compared to other algorithms (such as SPIRIT and TinyDB). Furthermore...

  16. Genome-wide imputation study identifies novel HLA locus for pulmonary fibrosis and potential role for auto-immunity in fibrotic idiopathic interstitial pneumonia

    OpenAIRE

    Fingerlin, TE; Zhang, W; Yang, IV; Ainsworth, HC; Russell, PH; Blumhagen, RZ; Schwarz, MI; Brown, KK; Steele, MP; Loyd, JE; Cosgrove, GP; Lynch, DA; Groshong, S; Collard, HR; Wolters, PJ

    2016-01-01

    Background Fibrotic idiopathic interstitial pneumonias (fIIP) are a group of fatal lung diseases with largely unknown etiology and without definitive treatment other than lung transplant to prolong life. There is strong evidence for the importance of both rare and common genetic risk alleles in familial and sporadic disease. We have previously used genome-wide single nucleotide polymorphism data to identify 10 risk loci for fIIP. Here we extend that work to imputed genome-wide genotypes and c...

  17. Faculty Performance Evaluation: The CIPP-SAPS Model.

    Science.gov (United States)

    Mitcham, Maralynne

    1981-01-01

    The issues of faculty performance evaluation for allied health professionals are addressed. Daniel Stufflebeam's CIPP (content-imput-process-product) model is introduced and its development into a CIPP-SAPS (self-administrative-peer- student) model is pursued. (Author/CT)

  18. A comparison of model-based imputation methods for handling missing predictor values in a linear regression model: A simulation study

    Science.gov (United States)

    Hasan, Haliza; Ahmad, Sanizah; Osman, Balkish Mohd; Sapri, Shamsiah; Othman, Nadirah

    2017-08-01

    In regression analysis, missing covariate data has been a common problem. Many researchers use ad hoc methods to overcome this problem due to the ease of implementation. However, these methods require assumptions about the data that rarely hold in practice. Model-based methods such as Maximum Likelihood (ML) using the expectation maximization (EM) algorithm and Multiple Imputation (MI) are more promising when dealing with difficulties caused by missing data. Then again, inappropriate methods of missing value imputation can lead to serious bias that severely affects the parameter estimates. The main objective of this study is to provide a better understanding regarding missing data concept that can assist the researcher to select the appropriate missing data imputation methods. A simulation study was performed to assess the effects of different missing data techniques on the performance of a regression model. The covariate data were generated using an underlying multivariate normal distribution and the dependent variable was generated as a combination of explanatory variables. Missing values in covariate were simulated using a mechanism called missing at random (MAR). Four levels of missingness (10%, 20%, 30% and 40%) were imposed. ML and MI techniques available within SAS software were investigated. A linear regression analysis was fitted and the model performance measures; MSE, and R-Squared were obtained. Results of the analysis showed that MI is superior in handling missing data with highest R-Squared and lowest MSE when percent of missingness is less than 30%. Both methods are unable to handle larger than 30% level of missingness.

  19. Multiple imputation was a valid approach to estimate absolute risk from a prediction model based on case-cohort data.

    Science.gov (United States)

    Mühlenbruch, Kristin; Kuxhaus, Olga; di Giuseppe, Romina; Boeing, Heiner; Weikert, Cornelia; Schulze, Matthias B

    2017-04-01

    To compare weighting methods for Cox regression and multiple imputation (MI) in a case-cohort study in the context of risk prediction modeling. Based on the European Prospective Investigation into Cancer and Nutrition Potsdam study, we estimated risk scores to predict incident type-2 diabetes using full cohort data and case-cohort data assuming missing information on waist circumference outside the case-cohort (∼90%). Varying weighting approaches and MI were compared with regard to the calculation of relative risks, absolute risks, and predictive abilities including C-index, the net reclassification improvement, and calibration. The full cohort comprised 21,845 participants, and the case-cohort comprised 2,703 participants. Relative risks were similar across all methods and compatible with full cohort estimates. Absolute risk estimates showed stronger disagreement mainly for Prentice and Self & Prentice weighting. Barlow and Langholz & Jiao weighting methods and MI were in good agreement with full cohort analysis. Predictive abilities were closest to full cohort estimates for MI or for Barlow and Langholz & Jiao weighting. MI seems to be a valid method for deriving or extending a risk prediction model from case-cohort data and might be superior for absolute risk calculation when compared to weighted approaches. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.

  20. The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based methods.

    Science.gov (United States)

    Martiniano, Rui; Cassidy, Lara M; Ó'Maoldúin, Ros; McLaughlin, Russell; Silva, Nuno M; Manco, Licinio; Fidalgo, Daniel; Pereira, Tania; Coelho, Maria J; Serra, Miguel; Burger, Joachim; Parreira, Rui; Moran, Elena; Valera, Antonio C; Porfirio, Eduardo; Boaventura, Rui; Silva, Ana M; Bradley, Daniel G

    2017-07-01

    We analyse new genomic data (0.05-2.95x) from 14 ancient individuals from Portugal distributed from the Middle Neolithic (4200-3500 BC) to the Middle Bronze Age (1740-1430 BC) and impute genomewide diploid genotypes in these together with published ancient Eurasians. While discontinuity is evident in the transition to agriculture across the region, sensitive haplotype-based analyses suggest a significant degree of local hunter-gatherer contribution to later Iberian Neolithic populations. A more subtle genetic influx is also apparent in the Bronze Age, detectable from analyses including haplotype sharing with both ancient and modern genomes, D-statistics and Y-chromosome lineages. However, the limited nature of this introgression contrasts with the major Steppe migration turnovers within third Millennium northern Europe and echoes the survival of non-Indo-European language in Iberia. Changes in genomic estimates of individual height across Europe are also associated with these major cultural transitions, and ancestral components continue to correlate with modern differences in stature.

  1. Geographical Gradients in Argentinean Terrestrial Mammal Species Richness and Their Environmental Correlates

    Science.gov (United States)

    Márquez, Ana L.; Real, Raimundo; Kin, Marta S.; Guerrero, José Carlos; Galván, Betina; Barbosa, A. Márcia; Olivero, Jesús; Palomo, L. Javier; Vargas, J. Mario; Justo, Enrique

    2012-01-01

    We analysed the main geographical trends of terrestrial mammal species richness (SR) in Argentina, assessing how broad-scale environmental variation (defined by climatic and topographic variables) and the spatial form of the country (defined by spatial filters based on spatial eigenvector mapping (SEVM)) influence the kinds and the numbers of mammal species along these geographical trends. We also evaluated if there are pure geographical trends not accounted for by the environmental or spatial factors. The environmental variables and spatial filters that simultaneously correlated with the geographical variables and SR were considered potential causes of the geographic trends. We performed partial correlations between SR and the geographical variables, maintaining the selected explanatory variables statistically constant, to determine if SR was fully explained by them or if a significant residual geographic pattern remained. All groups and subgroups presented a latitudinal gradient not attributable to the spatial form of the country. Most of these trends were not explained by climate. We used a variation partitioning procedure to quantify the pure geographic trend (PGT) that remained unaccounted for. The PGT was larger for latitudinal than for longitudinal gradients. This suggests that historical or purely geographical causes may also be relevant drivers of these geographical gradients in mammal diversity. PMID:23028254

  2. Geographical Gradients in Argentinean Terrestrial Mammal Species Richness and Their Environmental Correlates

    Directory of Open Access Journals (Sweden)

    Ana L. Márquez

    2012-01-01

    Full Text Available We analysed the main geographical trends of terrestrial mammal species richness (SR in Argentina, assessing how broad-scale environmental variation (defined by climatic and topographic variables and the spatial form of the country (defined by spatial filters based on spatial eigenvector mapping (SEVM influence the kinds and the numbers of mammal species along these geographical trends. We also evaluated if there are pure geographical trends not accounted for by the environmental or spatial factors. The environmental variables and spatial filters that simultaneously correlated with the geographical variables and SR were considered potential causes of the geographic trends. We performed partial correlations between SR and the geographical variables, maintaining the selected explanatory variables statistically constant, to determine if SR was fully explained by them or if a significant residual geographic pattern remained. All groups and subgroups presented a latitudinal gradient not attributable to the spatial form of the country. Most of these trends were not explained by climate. We used a variation partitioning procedure to quantify the pure geographic trend (PGT that remained unaccounted for. The PGT was larger for latitudinal than for longitudinal gradients. This suggests that historical or purely geographical causes may also be relevant drivers of these geographical gradients in mammal diversity.

  3. Geographical gradients in Argentinean terrestrial mammal species richness and their environmental correlates.

    Science.gov (United States)

    Márquez, Ana L; Real, Raimundo; Kin, Marta S; Guerrero, José Carlos; Galván, Betina; Barbosa, A Márcia; Olivero, Jesús; Palomo, L Javier; Vargas, J Mario; Justo, Enrique

    2012-01-01

    We analysed the main geographical trends of terrestrial mammal species richness (SR) in Argentina, assessing how broad-scale environmental variation (defined by climatic and topographic variables) and the spatial form of the country (defined by spatial filters based on spatial eigenvector mapping (SEVM)) influence the kinds and the numbers of mammal species along these geographical trends. We also evaluated if there are pure geographical trends not accounted for by the environmental or spatial factors. The environmental variables and spatial filters that simultaneously correlated with the geographical variables and SR were considered potential causes of the geographic trends. We performed partial correlations between SR and the geographical variables, maintaining the selected explanatory variables statistically constant, to determine if SR was fully explained by them or if a significant residual geographic pattern remained. All groups and subgroups presented a latitudinal gradient not attributable to the spatial form of the country. Most of these trends were not explained by climate. We used a variation partitioning procedure to quantify the pure geographic trend (PGT) that remained unaccounted for. The PGT was larger for latitudinal than for longitudinal gradients. This suggests that historical or purely geographical causes may also be relevant drivers of these geographical gradients in mammal diversity.

  4. Stennis Space Center Environmental Geographic Information System

    Science.gov (United States)

    Lovely, Janette; Cohan, Tyrus

    2000-01-01

    As NASA's lead center for rocket propulsion testing, the John C. Stennis Space Center (SSC) monitors and assesses the off-site impacts of such testing through its Environmental Office (SSC-EO) using acoustical models and ancillary data. The SSC-EO has developed a geographical database, called the SSC Environmental Geographic Information System (SSC-EGIS), that covers an eight-county area bordering the NASA facility. Through the SSC-EGIS, the Enivronmental Office inventories, assesses, and manages the nearly 139,000 acres that comprise Stennis Space Center and its surrounding acoustical buffer zone. The SSC-EGIS contains in-house data as well as a wide range of data obtained from outside sources, including private agencies and local, county, state, and U.S. government agencies. The database comprises cadastral/geodetic, hydrology, infrastructure, geo-political, physical geography, and socio-economic vector and raster layers. The imagery contained in the database is varied, including low-resolution imagery, such as Landsat TM and SPOT; high-resolution imagery, such as IKONOS and AVIRIS; and aerial photographs. The SSC-EGIS has been an integral part of several major projects and the model upon which similar EGIS's will be developed for other NASA facilities. The Corps of Engineers utilized the SSC-EGIS in a plan to establish wetland mitigation sites within the SSC buffer zone. Mississippi State University employed the SSC-EGIS in a preliminary study to evaluate public access points within the buffer zone. The SSC-EO has also expressly used the SSC-EGIS to assess noise pollution modeling, land management/wetland mitigation assessment, environmental hazards mapping, and protected areas mapping for archaeological sites and for threatened and endangered species habitats. The SSC-EO has several active and planned projects that will also make use of the SSC-EGIS during this and the coming fiscal year.

  5. Geographical relationships and CEO compensation contracts

    OpenAIRE

    Junli Yu; Wei Xu; Ping Zhang

    2017-01-01

    In this paper, we empirically analyze the effects that the geographical relationships between chairman and CEO have on the latter’s compensation contracts, based on samples of listed A-share private firms from 2005 to 2014. We find that geographical relationships are related to lower pay–performance sensitivity, and that the correlation mainly exists in poor performance periods, suggesting that geographical relationships weaken the effectiveness of compensation contracts. We also find that ge...

  6. Socioeconomic and geographic inequalities in adolescent smoking: a multilevel cross-sectional study of 15 year olds in Scotland.

    Science.gov (United States)

    Levin, K A; Dundas, R; Miller, M; McCartney, G

    2014-04-01

    The objective of the study was to present socioeconomic and geographic inequalities in adolescent smoking in Scotland. The international literature suggests there is no obvious pattern in the geography of adolescent smoking, with rural areas having a higher prevalence than urban areas in some countries, and a lower prevalence in others. These differences are most likely due to substantive differences in rurality between countries in terms of their social, built and cultural geography. Previous studies in the UK have shown an association between lower socioeconomic status and smoking. The Scottish Health Behaviour in School-aged Children study surveyed 15 year olds in schools across Scotland between March and June of 2010. We ran multilevel logistic regressions using Markov chain Monte Carlo method and adjusting for age, school type, family affluence, area level deprivation and rurality. We imputed missing rurality and deprivation data using multivariate imputation by chained equations, and re-analysed the data (N = 3577), comparing findings. Among boys, smoking was associated only with area-level deprivation. This relationship appeared to have a quadratic S-shape, with those living in the second most deprived quintile having highest odds of smoking. Among girls, however, odds of smoking increased with deprivation at individual and area-level, with an approximate dose-response relationship for both. Odds of smoking were higher for girls living in remote and rural parts of Scotland than for those living in urban areas. Schools in rural areas were no more or less homogenous than schools in urban areas in terms of smoking prevalence. We discuss possible social and cultural explanations for the high prevalence of boys' and girls' smoking in low SES neighbourhoods and of girls' smoking in rural areas. We consider possible differences in the impact of recent tobacco policy changes, primary socialization, access and availability, retail outlet density and the home

  7. Geographical relationships and CEO compensation contracts

    Directory of Open Access Journals (Sweden)

    Junli Yu

    2017-06-01

    Full Text Available In this paper, we empirically analyze the effects that the geographical relationships between chairman and CEO have on the latter’s compensation contracts, based on samples of listed A-share private firms from 2005 to 2014. We find that geographical relationships are related to lower pay–performance sensitivity, and that the correlation mainly exists in poor performance periods, suggesting that geographical relationships weaken the effectiveness of compensation contracts. We also find that geographical relationships can be substituted by external formal institutions.

  8. CONTEMPORARY TRENDS IN GEOGRAPHICAL EDUCATION

    Directory of Open Access Journals (Sweden)

    M. Wasileva

    2017-01-01

    Full Text Available The geography includes rich, diverse and comprehensive themes that give us an understanding of our changing environment and interconnected world. It includes the study of the physical environment and resources; cultures, economies and societies; people and places; and global development and civic participation. As a subject, geography is particularly valuable because it provides information for exploring contemporary issues from a different perspective. This geographical information affects us all at work and in our daily lives and helps us make informed decisions that shape our future. All these facts result in a wide discussion on many topical issues in contemporary geography didactics. Subjects of research are the new geography and economics curriculum as well as construction of modern learning process. The paper presents briefly some of the current trends and key issues of geodidactics. As central notions we consider and analyze the training/educational goals, geography curriculum, target groups and environment of geography training, training methods as well as the information sources used in geography education. We adhere that all the above-mentioned finds its reflection in planning, analysis and assessment of education and thus in its quality and effectiveness.

  9. Automatic extraction of geographic context from textual data

    Directory of Open Access Journals (Sweden)

    Jurijs Nikolajevs

    2014-08-01

    Full Text Available The amount of information on the internet grows exponentially. It isnot enough anymore just to have a general access to this huge amount of data,instead it is becoming a necessity to be able to use different kinds ofautomatic filters to retrieve just the information you actually want. Onesolution for the information filtering and retrieval is context analysis inwhich one of the contexts of interest is the geographic context. This paperstudies the problem and methodology of geoparsing – recognition of geographicnames in unstructured textual content for the aim of extracting geographiccontext. A prototype implementation of a geoparsing system, capable ofautomatically analyzing unstructured text, recognizing geographic informationand marking geographic names, is developed. Empirical evaluation of the systemusing articles from real-world news showed that the average quality of itsgeographic name recognition varies around 75-100%. Possible applications of thedeveloped prototype include automated grouping of any texts by their geographiccontexts (e.g., in news portals and location-based search. Preliminary resultsof empirical evaluation showed that the average rate of its geographic namerecognition varies around 75-100%.DOI: http://dx.doi.org/10.15181/csat.v2i1.13

  10. Geographical Information System Based Evaluation of Benthic MACRO Fauna in Thondi Coastal Environment, South East Coast of India Rethna Priya. E, Anbuchezhian. R and Ravichandran. S. Centre of Advanced Study in Marine Biology, Annamalai University, Portonovo, India

    Science.gov (United States)

    Priya, R.; Ramasamy, A.; Ravichandran, R.; Anbuchezhian. E.

    2013-05-01

    Seasonal and frequency difference of the macro fauna have been related to variation in the morpho dynamics and the population dynamics of dominant species. The aim of this article is to describe the seasonal and spatial variation of the macro fauna at 12 different samplings stations with distinct environmental conditions in Thondi coastal area. The samples were collected monthly from September 2010 to September 2011. Macro benthic invertebrates are numerically important components of coastal ecosystems and represents indicators of fishery potentials, intertidal ecology and environmental degradation. Sampling stations were fixed by GPS. 54 species were recorded, of this 24 species belonging to gastropods, 15 species of bivalves, 5 species of amphipods, 6 species of decapods and 4 species of echinoderms. In the present study the abdunce of benthic fauna greatly depends on physical and chemical properties of the substratum. The diversity, seasonal variations, dominances, influence of ecological parameters have been studied geographically by using GIS software for a period of one year from September 2010 to September 2011The macro fauna at all sites showed a arresting seasonal variation in density and diversity. Keywords: Macro fauna, GIS software, environmental degradation, morpho dynamics.

  11. Isolation of Microsporum gypseum in soil samples from different geographical regions of Brazil, evaluation of the extracellular proteolytic enzymes activities (keratinase and elastase and molecular sequencing of selected strains

    Directory of Open Access Journals (Sweden)

    Mauro Cintra Giudice

    2012-09-01

    Full Text Available A survey of Microsporum gypseum was conducted in soil samples in different geographical regions of Brazil. The isolation of dermatophyte from soil samples was performed by hair baiting technique and the species were identified by morphology studies. We analyzed 692 soil samples and the recuperating rate was 19.2%. The activities of keratinase and elastase were quantitatively performed in 138 samples. The sequencing of the ITS region of rDNA was performed in representatives samples. M. gypseum isolates showed significant quantitative differences in the expression of both keratinase and elastase, but no significant correlation was observed between these enzymes. The sequencing of the representative samples revealed the presence of two teleomorphic species of M. gypseum (Arthroderma gypseum and A. incurvatum. The enzymatic activities may play an important role in the pathogenicity and a probable adaptation of this fungus to the animal parasitism. Using the phenotypical and molecular analysis, the Microsporum identification and their teleomorphic states will provide a useful and reliable identification system.

  12. Legal sources of trademarks and geographical indications

    Directory of Open Access Journals (Sweden)

    Boloş, M.D.

    2011-01-01

    Full Text Available Trademarks and geographical indications are a highly internationally debated topic, mainly because of their economic value. This iswhy new international laws are created in order to keep up with the advances in economics, communications and internet. The article aims to study the international sources of law in trademarks and geographical indications field, in order to underline the applicable legislation.

  13. 7 CFR 3565.213 - Geographic distribution.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 15 2010-01-01 2010-01-01 false Geographic distribution. 3565.213 Section 3565.213 Agriculture Regulations of the Department of Agriculture (Continued) RURAL HOUSING SERVICE, DEPARTMENT OF AGRICULTURE GUARANTEED RURAL RENTAL HOUSING PROGRAM Loan Requirements § 3565.213 Geographic distribution. The...

  14. 34 CFR 642.33 - Geographic distribution.

    Science.gov (United States)

    2010-07-01

    ... 34 Education 3 2010-07-01 2010-07-01 false Geographic distribution. 642.33 Section 642.33 Education Regulations of the Offices of the Department of Education (Continued) OFFICE OF POSTSECONDARY... Grant? § 642.33 Geographic distribution. The Secretary, to the greatest extent possible, awards grants...

  15. Geographical Literacy and the Role of GIS.

    Science.gov (United States)

    West, Bryan A.

    1999-01-01

    Demonstrates how Geographical Information Systems (GIS) can help develop student skills that enhance learning. Describes the application of GIS within secondary geography education, providing an example of its use at the Windaroo Valley State High School (Australia). Discusses GIS and geographic literacy. (CMK)

  16. Ontology-Based Geographic Data Set Integration

    NARCIS (Netherlands)

    Uitermark, Henricus Theodorus Johannes Antonius; Uitermark, H.T.J.A.

    2001-01-01

    Geographic data set integration is particularly important for update propagation, i.e. the reuse of updates from one data set in another data set. In this thesis geographic data set integration (also known as map integration) between two topographic data sets, GBKN and TOP10vector, is described.

  17. Beyond boundaries : Geographical aspects of urban health

    NARCIS (Netherlands)

    Veldhuizen, E.M.

    2017-01-01

    In this thesis we look at urban health from a geographical perspective. It focuses specifically on environmental influences on health in Amsterdam. Spatial variations in health exist at multiple geographical scales. Differences in health between locations can be the result of individual

  18. Conceptual Model of Dynamic Geographic Environment

    Directory of Open Access Journals (Sweden)

    Martínez-Rosales Miguel Alejandro

    2014-04-01

    Full Text Available In geographic environments, there are many and different types of geographic entities such as automobiles, trees, persons, buildings, storms, hurricanes, etc. These entities can be classified into two groups: geographic objects and geographic phenomena. By its nature, a geographic environment is dynamic, thus, it’s static modeling is not sufficient. Considering the dynamics of geographic environment, a new type of geographic entity called event is introduced. The primary target is a modeling of geographic environment as an event sequence, because in this case the semantic relations are much richer than in the case of static modeling. In this work, the conceptualization of this model is proposed. It is based on the idea to process each entity apart instead of processing the environment as a whole. After that, the so called history of each entity and its spatial relations to other entities are defined to describe the whole environment. The main goal is to model systems at a conceptual level that make use of spatial and temporal information, so that later it can serve as the semantic engine for such systems.

  19. Socioeconomic Development Inequalities among Geographic Units ...

    African Journals Online (AJOL)

    Socio-economic development inequality among geographic units is a phenomenon common in both the developed and developing countries. Regional inequality may result in dissension among geographic units of the same state due to the imbalance in socio-economic development. This study examines the inequality ...

  20. The evolution of cooperation on geographical networks

    Science.gov (United States)

    Li, Yixiao; Wang, Yi; Sheng, Jichuan

    2017-11-01

    We study evolutionary public goods game on geographical networks, i.e., complex networks which are located on a geographical plane. The geographical feature effects in two ways: In one way, the geographically-induced network structure influences the overall evolutionary dynamics, and, in the other way, the geographical length of an edge influences the cost when the two players at the two ends interact. For the latter effect, we design a new cost function of cooperators, which simply assumes that the longer the distance between two players, the higher cost the cooperator(s) of them have to pay. In this study, network substrates are generated by a previous spatial network model with a cost-benefit parameter controlling the network topology. Our simulations show that the greatest promotion of cooperation is achieved in the intermediate regime of the parameter, in which empirical estimates of various railway networks fall. Further, we investigate how the distribution of edges' geographical costs influences the evolutionary dynamics and consider three patterns of the distribution: an approximately-equal distribution, a diverse distribution, and a polarized distribution. For normal geographical networks which are generated using intermediate values of the cost-benefit parameter, a diverse distribution hinders the evolution of cooperation, whereas a polarized distribution lowers the threshold value of the amplification factor for cooperation in public goods game. These results are helpful for understanding the evolution of cooperation on real-world geographical networks.

  1. Estimating past hepatitis C infection risk from reported risk factor histories: implications for imputing age of infection and modeling fibrosis progression

    Directory of Open Access Journals (Sweden)

    Busch Michael P

    2007-12-01

    Full Text Available Abstract Background Chronic hepatitis C virus infection is prevalent and often causes hepatic fibrosis, which can progress to cirrhosis and cause liver cancer or liver failure. Study of fibrosis progression often relies on imputing the time of infection, often as the reported age of first injection drug use. We sought to examine the accuracy of such imputation and implications for modeling factors that influence progression rates. Methods We analyzed cross-sectional data on hepatitis C antibody status and reported risk factor histories from two large studies, the Women's Interagency HIV Study and the Urban Health Study, using modern survival analysis methods for current status data to model past infection risk year by year. We compared fitted distributions of past infection risk to reported age of first injection drug use. Results Although injection drug use appeared to be a very strong risk factor, models for both studies showed that many subjects had considerable probability of having been infected substantially before or after their reported age of first injection drug use. Persons reporting younger age of first injection drug use were more likely to have been infected after, and persons reporting older age of first injection drug use were more likely to have been infected before. Conclusion In cross-sectional studies of fibrosis progression where date of HCV infection is estimated from risk factor histories, modern methods such as multiple imputation should be used to account for the substantial uncertainty about when infection occurred. The models presented here can provide the inputs needed by such methods. Using reported age of first injection drug use as the time of infection in studies of fibrosis progression is likely to produce a spuriously strong association of younger age of infection with slower rate of progression.

  2. A systematic review of instrumental variable analyses using geographic region as an instrument.

    Science.gov (United States)

    Vertosick, Emily A; Assel, Melissa; Vickers, Andrew J

    2017-10-13

    Instrumental variables analysis is a methodology to mitigate the effects of measured and unmeasured confounding in observational studies of treatment effects. Geographic area is increasingly used as an instrument. We conducted a literature review to determine the properties of geographic area in studies of cancer treatments. We identified cancer studies performed in the United States which incorporated instrumental variable analysis with area-wide treatment rate within a geographic region as the instrument. We assessed the degree of treatment variability between geographic regions, assessed balance of measured confounders afforded by geographic area and compared the results of instrumental variable analysis to those of multivariable methods. Geographic region as an instrument was relatively common, with 22 eligible studies identified, many of which were published in high-impact journals. Treatment rates did not vary greatly by geographic region. Covariates were not balanced by the instrument in the majority of studies. Eight out of eleven studies found statistically significant effects of treatment on multivariable analysis but not for instrumental variables, with the central estimates of the instrumental variables analysis generally being closer to the null. We recommend caution and an investigation of IV assumptions when considering the use of geographic region as an instrument in observational studies of cancer treatments. The value of geographic region as an instrument should be critically evaluated in other areas of medicine. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. A genome-wide linkage and association analysis of imputed insertions and deletions with cardiometabolic phenotypes in Mexican Americans: The Insulin Resistance Atherosclerosis Family Study.

    Science.gov (United States)

    Gao, Chuan; Hsu, Fang-Chi; Dimitrov, Latchezar M; Okut, Hayrettin; Chen, Yii-Der I; Taylor, Kent D; Rotter, Jerome I; Langefeld, Carl D; Bowden, Donald W; Palmer, Nicholette D

    2017-05-01

    Insertions and deletions (INDELs) represent a significant fraction of interindividual variation in the human genome yet their contribution to phenotypes is poorly understood. To confirm the quality of imputed INDELs and investigate their roles in mediating cardiometabolic phenotypes, genome-wide association and linkage analyses were performed for 15 phenotypes with 1,273,952 imputed INDELs in 1,024 Mexican-origin Americans. Imputation quality was validated using whole exome sequencing with an average kappa of 0.93 in common INDELs (minor allele frequencies [MAFs] ≥ 5%). Association analysis revealed one genome-wide significant association signal for the cholesterylester transfer protein gene (CETP) with high-density lipoprotein levels (rs36229491, P = 3.06 × 10-12 ); linkage analysis identified two peaks with logarithm of the odds (LOD) > 5 (rs60560566, LOD = 5.36 with insulin sensitivity (SI ) and rs5825825, LOD = 5.11 with adiponectin levels). Suggestive overlapping signals between linkage and association were observed: rs59849892 in the WSC domain containing 2 gene (WSCD2) was associated and nominally linked with SI (P = 1.17 × 10-7 , LOD = 1.99). This gene has been implicated in glucose metabolism in human islet cell expression studies. In addition, rs201606363 was linked and nominally associated with low-density lipoprotein (P = 4.73 × 10-4 , LOD = 3.67), apolipoprotein B (P = 1.39 × 10-3 , LOD = 4.64), and total cholesterol (P = 1.35 × 10-2 , LOD = 3.80) levels. rs201606363 is an intronic variant of the UBE2F-SCLY (where UBE2F is ubiquitin-conjugating enzyme E2F and SCLY is selenocysteine lyase) fusion gene that may regulate cholesterol through selenium metabolism. In conclusion, these results confirm the feasibility of imputing INDELs from array-based single nucleotide polymorphism (SNP) genotypes. Analysis of these variants using association and linkage replicated previously identified SNP signals and identified multiple novel INDEL signals. These

  4. Geographic integration of hepatitis C virus: A global threat

    Science.gov (United States)

    Daw, Mohamed A; El-Bouzedi, Abdallah A; Ahmed, Mohamed O; Dau, Aghnyia A; Agnan, Mohamed M; Drah, Aisha M

    2016-01-01

    AIM To assess hepatitis C virus (HCV) geographic integration, evaluate the spatial and temporal evolution of HCV worldwide and propose how to diminish its burden. METHODS A literature search of published articles was performed using PubMed, MEDLINE and other related databases up to December 2015. A critical data assessment and analysis regarding the epidemiological integration of HCV was carried out using the meta-analysis method. RESULTS The data indicated that HCV has been integrated immensely over time and through various geographical regions worldwide. The history of HCV goes back to 1535 but between 1935 and 1965 it exhibited a rapid, exponential spread. This integration is clearly seen in the geo-epidemiology and phylogeography of HCV. HCV integration can be mirrored either as intra-continental or trans-continental. Migration, drug trafficking and HCV co-infection, together with other potential risk factors, have acted as a vehicle for this integration. Evidence shows that the geographic integration of HCV has been important in the global and regional distribution of HCV. CONCLUSION HCV geographic integration is clearly evident and this should be reflected in the prevention and treatment of this ongoing pandemic. PMID:27878104

  5. Fissured and geographic tongue in Williams-Beuren syndrome

    Directory of Open Access Journals (Sweden)

    Neeta Sharma

    2014-01-01

    Full Text Available Williams-Beuren Syndrome (WBS is a rare, most often sporadic, genetic disease caused by a chromosomal microdeletion at locus 7q11.23 involving 28 genes. It is characterized by congenital heart defects, neonatal hypercalcemia, skeletal and renal abnormalities, cognitive disorder, social personality disorder, and dysmorphic facies. A number of clinical findings has been reported, but none of the studies evaluated this syndrome considering oral cavity. We here report a fissured and geographic tongue in association with WBS.

  6. Historical Biogeography Using Species Geographical Ranges.

    Science.gov (United States)

    Quintero, Ignacio; Keil, Petr; Jetz, Walter; Crawford, Forrest W

    2015-11-01

    Spatial variation in biodiversity is the result of complex interactions between evolutionary history and ecological factors. Methods in historical biogeography combine phylogenetic information with current species locations to infer the evolutionary history of a clade through space and time. A major limitation of most methods for historical biogeographic inference is the requirement of single locations for terminal lineages, reducing contemporary species geographical ranges to a point in two-dimensional space. In reality, geographic ranges usually show complex geographic patterns, irregular shapes, or discontinuities. In this article, we describe a method for phylogeographic analysis using polygonal species geographic ranges of arbitrary complexity. By integrating the geographic diversification process across species ranges, we provide a method to infer the geographic location of ancestors in a Bayesian framework. By modeling migration conditioned on a phylogenetic tree, this approach permits reconstructing the geographic location of ancestors through time. We apply this new method to the diversification of two neotropical bird genera, Trumpeters (Psophia) and Cinclodes ovenbirds. We demonstrate the usefulness of our method (called rase) in phylogeographic reconstruction of species ancestral locations and contrast our results with previous methods that compel researchers to reduce the distribution of species to one point in space. We discuss model extensions to enable a more general, spatially explicit framework for historical biogeographic analysis. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  7. Data Matching Imputation System

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The DMIS dataset is a flat file record of the matching of several data set collections. Primarily it consists of VTRs, dealer records, Observer data in conjunction...

  8. Similaridade genética de acessos de mangueira de diferentes origens geográficas avaliadas por marcadores AFLP Genetic similarity of mango accessions of different geographic origins evaluated with AFLP markers

    Directory of Open Access Journals (Sweden)

    Carlos Antonio Fernandes Santos

    2008-09-01

    Full Text Available As relações genéticas de 105 acessos de diferentes origens geográficas do banco de germoplasma de mangueira da Embrapa foram determinadas com base no marcador AFLP, de forma a orientar trabalhos de melhoramento e manejo de recursos genéticos da espécie para a região Semi-Árida brasileira. Foram ainda incluídos dois acessos de duas espécies do gênero Mangifera, como "outgroup". O DNA dos acessos foi extraído pelo método do CTAB, as reações de AFLP foram realizadas para os iniciadores EcoRI/MseI e as bandas polimórficas foram analisadas para construção de fenograma, baseando-se no coeficiente de similaridade de Jaccard. Foram obtidas 157 e 54 bandas de AFLP polimórficas e monomórficas, respectivamente, em 13 combinações de iniciadores. O valor co-fenético do fenograma foi estimado em 0,81. Foram observados cinco grupos: 1 cultivares como Amrapali, Malika, híbridos Embrapa-Cpac e algumas variedades americanas formando um grupo; 2 grupo formado, predominantemente, por cultivares americanas, com algumas inclusões de híbridos sul-africanos e brasileiros; 3 grande grupo formado por cultivares brasileiras, com algumas inclusões de cultivares australianas, indianas e americanas; 4 grupo formado por algumas variedades tipo Espada, Rosa e acessos de diferentes origens; e 5 grupo formado por M. foetida e M. similis. Os acessos Carabao e Manilla apresentaram a maior similaridade, 97%. Os acessos estudados apresentaram similaridade superior a 51%, evidenciando a alta variabilidade genética da coleção de germoplasma de mangueira estudada.The genetic relationship among 105 mango accessions of different geographic origins of the Embrapa germplasm collection was estimated based on AFLP marker in order to orient breeding and management of genetic resource activities of this species to the Brazilian Semi-Arid region. Two additional accessions of other species of the Mangifera genus were also included as outgroup. The DNA of the

  9. Geographic information system planning and monitoring best ...

    African Journals Online (AJOL)

    emeje

    integrated planning approaches that result in environmental conservation. Geographic Information systems (GIS) provide the ... Regional Office for West and Central Africa. (UNOCHA/ROWCA). The major cities of West Africa lie .... democratic change, social learning, community building, and environmental enlightenment.

  10. GNIS: Geographic Names Information Systems - All features

    Data.gov (United States)

    Earth Data Analysis Center, University of New Mexico — The Geographic Names Information System (GNIS) actively seeks data from and partnerships with Government agencies at all levels and other interested organizations....

  11. Recommendation Of A National Geographic Framework

    Data.gov (United States)

    US Fish and Wildlife Service, Department of the Interior — The purpose of this report is to provide guidance relative to building capacity and geographically focusing efforts to implement landscape scale conservation...

  12. Medicare Geographic Variation - Public Use File

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Medicare Geographic Variation Public Use File provides the ability to view demographic, utilization and quality indicators at the state level (including...

  13. Identifying Geographic Clusters: A Network Analytic Approach

    CERN Document Server

    Catini, Roberto; Penner, Orion; Riccaboni, Massimo

    2015-01-01

    In recent years there has been a growing interest in the role of networks and clusters in the global economy. Despite being a popular research topic in economics, sociology and urban studies, geographical clustering of human activity has often studied been by means of predetermined geographical units such as administrative divisions and metropolitan areas. This approach is intrinsically time invariant and it does not allow one to differentiate between different activities. Our goal in this paper is to present a new methodology for identifying clusters, that can be applied to different empirical settings. We use a graph approach based on k-shell decomposition to analyze world biomedical research clusters based on PubMed scientific publications. We identify research institutions and locate their activities in geographical clusters. Leading areas of scientific production and their top performing research institutions are consistently identified at different geographic scales.

  14. Geographic Learning Objects in Smart Cities Context

    Directory of Open Access Journals (Sweden)

    Vincenzo Del Fatto

    2013-08-01

    Full Text Available Nowadays, many cities around the world are trying to find smarter ways to manage challenges such as ensuring livable conditions in a context of rapid urban population growth. These cities are often referred to as Smart Cities. In the last years, researchers from many disciplines have contributed to the Smart Cities definition and implementation. In this paper we investigate how topics from two particular fields, such as Geographic Information and E-learning Systems, can be mixed in order to contribute to the Smart Cities cause. In particular, we introduce the Geographic Learning Object and discuss how such Geographic Learning Objects can be used in a Geographic Information System in order to provide information and learning content to the citizens of a Smart City.

  15. Geographic Variation in Medicare Spending Dashboard

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Geographic Variation Dashboards present Medicare fee-for-service per-capita spending at the state and county level in an interactive format. We calculated the...

  16. Training for Internationalization through Domestic Geographical Dispersion

    DEFF Research Database (Denmark)

    Santangelo, Grazia D.; Stucchi, Tamara

    organizational learning, we seek to solve this puzzle in relation to the internationalization of Indian BGs. In particular, we argue that in heterogeneous domestic emerging markets BG’s geographical dispersion across sub-national states provides training for internationalization. To internationalize successfully......, BGs need to develop the capability of managing geographically dispersed units in institutional heterogeneous contexts. Domestic geographical dispersion would indeed help the BG dealing with different regulations, customers and infrastructures. However, there is less scope for such training as BGs...... become more internationally experienced, and the benefits of domestic geographical dispersions are limited by the degree of urbanization of sub-national states. We test our argument on a sample of 693 Indian BGs over the period 2001-2010....

  17. Imputation of a true endpoint from a surrogate: application to a cluster randomized controlled trial with partial information on the true endpoint

    Directory of Open Access Journals (Sweden)

    Duffy Stephen W

    2003-09-01

    Full Text Available Abstract Background The Anglia Menorrhagia Education Study (AMES is a randomized controlled trial testing the effectiveness of an education package applied to general practices. Binary data are available from two sources; general practitioner reported referrals to hospital, and referrals to hospital determined by independent audit of the general practices. The former may be regarded as a surrogate for the latter, which is regarded as the true endpoint. Data are only available for the true end point on a sub set of the practices, but there are surrogate data for almost all of the audited practices and for most of the remaining practices. Methods The aim of this paper was to estimate the treatment effect using data from every practice in the study. Where the true endpoint was not available, it was estimated by three approaches, a regression method, multiple imputation and a full likelihood model. Results Including the surrogate data in the analysis yielded an estimate of the treatment effect which was more precise than an estimate gained from using the true end point data alone. Conclusions The full likelihood method provides a new imputation tool at the disposal of trials with surrogate data.

  18. Imputing Variants in HLA-DR Beta Genes Reveals That HLA-DRB1 Is Solely Associated with Rheumatoid Arthritis and Systemic Lupus Erythematosus.

    Science.gov (United States)

    Kim, Kwangwoo; Bang, So-Young; Yoo, Dae Hyun; Cho, Soo-Kyung; Choi, Chan-Bum; Sung, Yoon-Kyoung; Kim, Tae-Hwan; Jun, Jae-Bum; Kang, Young Mo; Suh, Chang-Hee; Shim, Seung-Cheol; Lee, Shin-Seok; Lee, Jisoo; Chung, Won Tae; Kim, Seong-Kyu; Choe, Jung-Yoon; Nath, Swapan K; Lee, Hye-Soon; Bae, Sang-Cheol

    2016-01-01

    The genetic association of HLA-DRB1 with rheumatoid arthritis (RA) and systemic lupus erythematosus (SLE) is well documented, but association with other HLA-DR beta genes (HLA-DRB3, HLA-DRB4 and HLA-DRB5) has not been thoroughly studied, despite their similar functions and chromosomal positions. We examined variants in all functional HLA-DR beta genes in RA and SLE patients and controls, down to the amino-acid level, to better understand disease association with the HLA-DR locus. To this end, we improved an existing HLA reference panel to impute variants in all protein-coding HLA-DR beta genes. Using the reference panel, HLA variants were inferred from high-density SNP data of 9,271 RA-control subjects and 5,342 SLE-control subjects. Disease association tests were performed by logistic regression and log-likelihood ratio tests. After imputation using the newly constructed HLA reference panel and statistical analysis, we observed that HLA-DRB1 variants better accounted for the association between MHC and susceptibility to RA and SLE than did the other three HLA-DRB variants. Moreover, there were no secondary effects in HLA-DRB3, HLA-DRB4, or HLA-DRB5 in RA or SLE. Of all the HLA-DR beta chain paralogs, those encoded by HLA-DRB1 solely or dominantly influence susceptibility to RA and SLE.

  19. When and how should multiple imputation be used for handling missing data in randomised clinical trials – a practical guide with flowcharts

    Directory of Open Access Journals (Sweden)

    Janus Christian Jakobsen

    2017-12-01

    Full Text Available Abstract Background Missing data may seriously compromise inferences from randomised clinical trials, especially if missing data are not handled appropriately. The potential bias due to missing data depends on the mechanism causing the data to be missing, and the analytical methods applied to amend the missingness. Therefore, the analysis of trial data with missing values requires careful planning and attention. Methods The authors had several meetings and discussions considering optimal ways of handling missing data to minimise the bias potential. We also searched PubMed (key words: missing data; randomi*; statistical analysis and reference lists of known studies for papers (theoretical papers; empirical studies; simulation studies; etc. on how to deal with missing data when analysing randomised clinical trials. Results Handling missing data is an important, yet difficult and complex task when analysing results of randomised clinical trials. We consider how to optimise the handling of missing data during the planning stage of a randomised clinical trial and recommend analytical approaches which may prevent bias caused by unavoidable missing data. We consider the strengths and limitations of using of best-worst and worst-best sensitivity analyses, multiple imputation, and full information maximum likelihood. We also present practical flowcharts on how to deal with missing data and an overview of the steps that always need to be considered during the analysis stage of a trial. Conclusions We present a practical guide and flowcharts describing when and how multiple imputation should be used to handle missing data in randomised clinical.

  20. Imputation of genotypes from low density (50,000 markers) to high density (700,000 markers) of cows from research herds in Europe, North America, and Australasia using 2 reference populations

    DEFF Research Database (Denmark)

    Pryce, J E; Johnston, J; Hayes, B J

    2014-01-01

    Australasia (Australia and New Zealand), and 3 from North America (Canada and the United States). Heifers from the Australian and New Zealand research herds were already genotyped at high density (approximately 700,000 SNP). The remaining genotypes were imputed from around 50,000 SNP to 700,000 using 2...... detection in genome-wide association studies and the accuracy of genomic selection may increase when the low-density genotypes are imputed to higher density. Genotype data were available from 10 research herds: 5 from Europe [Denmark, Germany, Ireland, the Netherlands, and the United Kingdom (UK)], 2 from...

  1. Geographic Data Bases Supporting Scene Generation

    Science.gov (United States)

    Lukes, George E.

    1980-12-01

    Recent activity in synthetic reference scene generation from geographic data bases has lead to new and expanding production responsibilities for the mapping community. It has also spawned a new and growing population of geographic data base users. Optimum utilization of this data requires an understanding of the natural and cultural patterns represented as well as knowledge of the conventions and specifications which guide data base preparation. Prudence dictates effective mechanisms for data base inspection by the user. Appropriate implementation of data display procedures can provide this capability while also supporting routine analysis of data base content. This paper first illustrates a set of convenient mechanisms for the display of the elevation and planimetric components of geographic data files. Then, a new USAETL program in Computer-Assisted Photo Interpretation Research (CAPIR) is introduced. The CAPIR program will explore issues of direct data entry to create geographic data bases from stereo aerial photography. CAPIR also provides a technique for displaying geographic data base contents in corresponding three-dimensional photo models. This capability, termed superposition, will impact on the critical tasks of data validation, revision and intensification which are essential for effective management of geographic files.

  2. Assessing the geographic dichotomy hypothesis with cacti in South America.

    Science.gov (United States)

    Arzabe, A A; Aguirre, L F; Baldelomar, M P; Molina-Montenegro, M A

    2017-11-20

    The Cactaceae is one of the most conspicuous and ecologically important plant families in the world. Its species may have specialist or generalist pollination systems that show geographic patterns, which are synthesised in the Geographic Dichotomy Hypothesis. Here, we assess this hypothesis in five countries in both tropical and extratropical regions, evaluating the pollinator visitation rate and pollinator identity and abundance. We calculate the Shannon diversity index (H') and evenness (J) and evaluate differences between latitude parameters with a Student t-test. Overall, we found more specialised pollination systems in all tropical sites; the richness, diversity and evenness of pollinators was reduced in comparison to extratropical regions, where the pollination system was generalised. Our results support the geographic dichotomy hypothesis in the cacti of South America, suggesting that environmental factors underlying the latitudinal patterns can help to explain differences in the pollination syndrome between tropical and extratropical regions. © 2017 German Society for Plant Sciences and The Royal Botanical Society of the Netherlands.

  3. A Map-Based Service Supporting Different Types of Geographic Knowledge for the Public.

    Science.gov (United States)

    Zhou, Mengjie; Wang, Rui; Tian, Jing; Ye, Ning; Mai, Shumin

    2016-01-01

    The internet enables the rapid and easy creation, storage, and transfer of knowledge; however, services that transfer geographic knowledge and facilitate the public understanding of geographic knowledge are still underdeveloped to date. Existing online maps (or atlases) can support limited types of geographic knowledge. In this study, we propose a framework for map-based services to represent and transfer different types of geographic knowledge to the public. A map-based service provides tools to ensure the effective transfer of geographic knowledge. We discuss the types of geographic knowledge that should be represented and transferred to the public, and we propose guidelines and a method to represent various types of knowledge through a map-based service. To facilitate the effective transfer of geographic knowledge, tools such as auxiliary background knowledge and auxiliary map-reading tools are provided through interactions with maps. An experiment conducted to illustrate our idea and to evaluate the usefulness of the map-based service is described; the results demonstrate that the map-based service is useful for transferring different types of geographic knowledge.

  4. Application of Spatial Data Modeling Systems, Geographical Information Systems (GIS), and Transportation Routing Optimization Methods for Evaluating Integrated Deployment of Interim Spent Fuel Storage Installations and Advanced Nuclear Plants

    Energy Technology Data Exchange (ETDEWEB)

    Mays, Gary T [ORNL; Belles, Randy [ORNL; Cetiner, Sacit M [ORNL; Howard, Rob L [ORNL; Liu, Cheng [ORNL; Mueller, Don [ORNL; Omitaomu, Olufemi A [ORNL; Peterson, Steven K [ORNL; Scaglione, John M [ORNL

    2012-06-01

    The objective of this siting study work is to support DOE in evaluating integrated advanced nuclear plant and ISFSI deployment options in the future. This study looks at several nuclear power plant growth scenarios that consider the locations of existing and planned commercial nuclear power plants integrated with the establishment of consolidated interim spent fuel storage installations (ISFSIs). This research project is aimed at providing methodologies, information, and insights that inform the process for determining and optimizing candidate areas for new advanced nuclear power generation plants and consolidated ISFSIs to meet projected US electric power demands for the future.

  5. Geographic Names Information System (GNIS) for Lousiana, Geographic NAD83, USGS (2007) [GNIS_LA_USGS_2007

    Data.gov (United States)

    Louisiana Geographic Information Center — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  6. An intelligent method for geographic Web search

    Science.gov (United States)

    Mei, Kun; Yuan, Ying

    2008-10-01

    While the electronically available information in the World-Wide Web is explosively growing and thus increasing, the difficulty to find relevant information is also increasing for search engine user. In this paper we discuss how to constrain web queries geographically. A number of search queries are associated with geographical locations, either explicitly or implicitly. Accurately and effectively detecting the locations where search queries are truly about has huge potential impact on increasing search relevance, bringing better targeted search results, and improving search user satisfaction. Our approach focus on both in the way geographic information is extracted from the web and, as far as we can tell, in the way it is integrated into query processing. This paper gives an overview of a spatially aware search engine for semantic querying of web document. It also illustrates algorithms for extracting location from web documents and query requests using the location ontologies to encode and reason about formal semantics of geographic web search. Based on a real-world scenario of tourism guide search, the application of our approach shows that the geographic information retrieval can be efficiently supported.

  7. Discovery and fine-mapping of adiposity loci using high density imputation of genome-wide association studies in individuals of African ancestry: African Ancestry Anthropometry Genetics Consortium

    Science.gov (United States)

    Lu, Yingchang; Justice, Anne E.; Mudgal, Poorva; Liu, Ching-Ti; Young, Kristin; Feitosa, Mary F.; Rand, Kristin; Dimitrov, Latchezar; Duan, Qing; Guo, Xiuqing; Lange, Leslie A.; Nalls, Michael A.; Okut, Hayrettin; Tayo, Bamidele O.; Vedantam, Sailaja; Bradfield, Jonathan P.; Chen, Guanjie; Chesi, Alessandra; Irvin, Marguerite R.; Padhukasahasram, Badri; Zheng, Wei; Allison, Matthew A.; Ambrosone, Christine B.; Bandera, Elisa V.; Berndt, Sonja I.; Blot, William J.; Bottinger, Erwin P.; Carpten, John; Chanock, Stephen J.; Chen, Yii-Der Ida; Conti, David V.; Cooper, Richard S.; Fornage, Myriam; Freedman, Barry I.; Garcia, Melissa; Goodman, Phyllis J.; Hsu, Yu-Han H.; Hu, Jennifer; Huff, Chad D.; Ingles, Sue A.; John, Esther M.; Kittles, Rick; Klein, Eric; Li, Jin; McKnight, Barbara; Nayak, Uma; Nemesure, Barbara; Olshan, Andrew; Salako, Babatunde; Sanderson, Maureen; Shao, Yaming; Siscovick, David S.; Stanford, Janet L.; Strom, Sara S.; Witte, John S.; Yao, Jie; Zhu, Xiaofeng; Ziegler, Regina G.; Zonderman, Alan B.; Ambs, Stefan; Cushman, Mary; Faul, Jessica D.; Hakonarson, Hakon; Levin, Albert M.; Nathanson, Katherine L.; Weir, David R.; Zhi, Degui; Arnett, Donna K.; Kardia, Sharon L. R.; Oloapde, Olufunmilayo I.; Rao, D. C.; Williams, L. Keoki; Becker, Diane M.; Borecki, Ingrid B.; Evans, Michele K.; Harris, Tamara B.; Hirschhorn, Joel N.; Psaty, Bruce M.; Wilson, James G.; Bowden, Donald W.; Cupples, L. Adrienne; Haiman, Christopher A.; Loos, Ruth J. F.; North, Kari E.

    2017-01-01

    Genome-wide association studies (GWAS) have identified >300 loci associated with measures of adiposity including body mass index (BMI) and waist-to-hip ratio (adjusted for BMI, WHRadjBMI), but few have been identified through screening of the African ancestry genomes. We performed large scale meta-analyses and replications in up to 52,895 individuals for BMI and up to 23,095 individuals for WHRadjBMI from the African Ancestry Anthropometry Genetics Consortium (AAAGC) using 1000 Genomes phase 1 imputed GWAS to improve coverage of both common and low frequency variants in the low linkage disequilibrium African ancestry genomes. In the sex-combined analyses, we identified one novel locus (TCF7L2/HABP2) for WHRadjBMI and eight previously established loci at P < 5×10−8: seven for BMI, and one for WHRadjBMI in African ancestry individuals. An additional novel locus (SPRYD7/DLEU2) was identified for WHRadjBMI when combined with European GWAS. In the sex-stratified analyses, we identified three novel loci for BMI (INTS10/LPL and MLC1 in men, IRX4/IRX2 in women) and four for WHRadjBMI (SSX2IP, CASC8, PDE3B and ZDHHC1/HSD11B2 in women) in individuals of African ancestry or both African and European ancestry. For four of the novel variants, the minor allele frequency was low (<5%). In the trans-ethnic fine mapping of 47 BMI loci and 27 WHRadjBMI loci that were locus-wide significant (P < 0.05 adjusted for effective number of variants per locus) from the African ancestry sex-combined and sex-stratified analyses, 26 BMI loci and 17 WHRadjBMI loci contained ≤ 20 variants in the credible sets that jointly account for 99% posterior probability of driving the associations. The lead variants in 13 of these loci had a high probability of being causal. As compared to our previous HapMap imputed GWAS for BMI and WHRadjBMI including up to 71,412 and 27,350 African ancestry individuals, respectively, our results suggest that 1000 Genomes imputation showed modest improvement in

  8. Discovery and fine-mapping of adiposity loci using high density imputation of genome-wide association studies in individuals of African ancestry: African Ancestry Anthropometry Genetics Consortium.

    Science.gov (United States)

    Ng, Maggie C Y; Graff, Mariaelisa; Lu, Yingchang; Justice, Anne E; Mudgal, Poorva; Liu, Ching-Ti; Young, Kristin; Yanek, Lisa R; Feitosa, Mary F; Wojczynski, Mary K; Rand, Kristin; Brody, Jennifer A; Cade, Brian E; Dimitrov, Latchezar; Duan, Qing; Guo, Xiuqing; Lange, Leslie A; Nalls, Michael A; Okut, Hayrettin; Tajuddin, Salman M; Tayo, Bamidele O; Vedantam, Sailaja; Bradfield, Jonathan P; Chen, Guanjie; Chen, Wei-Min; Chesi, Alessandra; Irvin, Marguerite R; Padhukasahasram, Badri; Smith, Jennifer A; Zheng, Wei; Allison, Matthew A; Ambrosone, Christine B; Bandera, Elisa V; Bartz, Traci M; Berndt, Sonja I; Bernstein, Leslie; Blot, William J; Bottinger, Erwin P; Carpten, John; Chanock, Stephen J; Chen, Yii-Der Ida; Conti, David V; Cooper, Richard S; Fornage, Myriam; Freedman, Barry I; Garcia, Melissa; Goodman, Phyllis J; Hsu, Yu-Han H; Hu, Jennifer; Huff, Chad D; Ingles, Sue A; John, Esther M; Kittles, Rick; Klein, Eric; Li, Jin; McKnight, Barbara; Nayak, Uma; Nemesure, Barbara; Ogunniyi, Adesola; Olshan, Andrew; Press, Michael F; Rohde, Rebecca; Rybicki, Benjamin A; Salako, Babatunde; Sanderson, Maureen; Shao, Yaming; Siscovick, David S; Stanford, Janet L; Stevens, Victoria L; Stram, Alex; Strom, Sara S; Vaidya, Dhananjay; Witte, John S; Yao, Jie; Zhu, Xiaofeng; Ziegler, Regina G; Zonderman, Alan B; Adeyemo, Adebowale; Ambs, Stefan; Cushman, Mary; Faul, Jessica D; Hakonarson, Hakon; Levin, Albert M; Nathanson, Katherine L; Ware, Erin B; Weir, David R; Zhao, Wei; Zhi, Degui; Arnett, Donna K; Grant, Struan F A; Kardia, Sharon L R; Oloapde, Olufunmilayo I; Rao, D C; Rotimi, Charles N; Sale, Michele M; Williams, L Keoki; Zemel, Babette S; Becker, Diane M; Borecki, Ingrid B; Evans, Michele K; Harris, Tamara B; Hirschhorn, Joel N; Li, Yun; Patel, Sanjay R; Psaty, Bruce M; Rotter, Jerome I; Wilson, James G; Bowden, Donald W; Cupples, L Adrienne; Haiman, Christopher A; Loos, Ruth J F; North, Kari E

    2017-04-01

    Genome-wide association studies (GWAS) have identified >300 loci associated with measures of adiposity including body mass index (BMI) and waist-to-hip ratio (adjusted for BMI, WHRadjBMI), but few have been identified through screening of the African ancestry genomes. We performed large scale meta-analyses and replications in up to 52,895 individuals for BMI and up to 23,095 individuals for WHRadjBMI from the African Ancestry Anthropometry Genetics Consortium (AAAGC) using 1000 Genomes phase 1 imputed GWAS to improve coverage of both common and low frequency variants in the low linkage disequilibrium African ancestry genomes. In the sex-combined analyses, we identified one novel locus (TCF7L2/HABP2) for WHRadjBMI and eight previously established loci at P ancestry individuals. An additional novel locus (SPRYD7/DLEU2) was identified for WHRadjBMI when combined with European GWAS. In the sex-stratified analyses, we identified three novel loci for BMI (INTS10/LPL and MLC1 in men, IRX4/IRX2 in women) and four for WHRadjBMI (SSX2IP, CASC8, PDE3B and ZDHHC1/HSD11B2 in women) in individuals of African ancestry or both African and European ancestry. For four of the novel variants, the minor allele frequency was low (ancestry sex-combined and sex-stratified analyses, 26 BMI loci and 17 WHRadjBMI loci contained ≤ 20 variants in the credible sets that jointly account for 99% posterior probability of driving the associations. The lead variants in 13 of these loci had a high probability of being causal. As compared to our previous HapMap imputed GWAS for BMI and WHRadjBMI including up to 71,412 and 27,350 African ancestry individuals, respectively, our results suggest that 1000 Genomes imputation showed modest improvement in identifying GWAS loci including low frequency variants. Trans-ethnic meta-analyses further improved fine mapping of putative causal variants in loci shared between the African and European ancestry populations.

  9. Discovery and fine-mapping of adiposity loci using high density imputation of genome-wide association studies in individuals of African ancestry: African Ancestry Anthropometry Genetics Consortium.

    Directory of Open Access Journals (Sweden)

    Maggie C Y Ng

    2017-04-01

    Full Text Available Genome-wide association studies (GWAS have identified >300 loci associated with measures of adiposity including body mass index (BMI and waist-to-hip ratio (adjusted for BMI, WHRadjBMI, but few have been identified through screening of the African ancestry genomes. We performed large scale meta-analyses and replications in up to 52,895 individuals for BMI and up to 23,095 individuals for WHRadjBMI from the African Ancestry Anthropometry Genetics Consortium (AAAGC using 1000 Genomes phase 1 imputed GWAS to improve coverage of both common and low frequency variants in the low linkage disequilibrium African ancestry genomes. In the sex-combined analyses, we identified one novel locus (TCF7L2/HABP2 for WHRadjBMI and eight previously established loci at P < 5×10-8: seven for BMI, and one for WHRadjBMI in African ancestry individuals. An additional novel locus (SPRYD7/DLEU2 was identified for WHRadjBMI when combined with European GWAS. In the sex-stratified analyses, we identified three novel loci for BMI (INTS10/LPL and MLC1 in men, IRX4/IRX2 in women and four for WHRadjBMI (SSX2IP, CASC8, PDE3B and ZDHHC1/HSD11B2 in women in individuals of African ancestry or both African and European ancestry. For four of the novel variants, the minor allele frequency was low (<5%. In the trans-ethnic fine mapping of 47 BMI loci and 27 WHRadjBMI loci that were locus-wide significant (P < 0.05 adjusted for effective number of variants per locus from the African ancestry sex-combined and sex-stratified analyses, 26 BMI loci and 17 WHRadjBMI loci contained ≤ 20 variants in the credible sets that jointly account for 99% posterior probability of driving the associations. The lead variants in 13 of these loci had a high probability of being causal. As compared to our previous HapMap imputed GWAS for BMI and WHRadjBMI including up to 71,412 and 27,350 African ancestry individuals, respectively, our results suggest that 1000 Genomes imputation showed modest improvement

  10. Louisiana State Soil Geographic, General Soil Map, Geographic NAD83, NWRC (1998) [statsgo_soils_NWRC_1998

    Data.gov (United States)

    Louisiana Geographic Information Center — This data set contains vector line map information. The vector data contain selected base categories of geographic features, and characteristics of these features,...

  11. Representations built from a true geographic database

    DEFF Research Database (Denmark)

    Bodum, Lars

    2005-01-01

    The development of a system for geovisualisation under the Centre for 3D GeoInformation at Aalborg University, Denmark, has exposed the need for a rethinking of the representation of virtual environments. Now that almost everything is possible (due to technological advances in computer graphics...... a representation based on geographic and geospatial principles. The system GRIFINOR, developed at 3DGI, Aalborg University, DK, is capable of creating this object-orientation and furthermore does this on top of a true Geographic database. A true Geographic database can be characterized as a database that can cover......) within the creation of Virtual Environments, what will be the next challenge within Urban simulation and modelling to overcome? It will certainly not be to create the models as real as possible or refine details in the texturing. The challenge will be to do a proper object-orientation and thereby secure...

  12. Thematic cartography as a geographical application

    Directory of Open Access Journals (Sweden)

    Drago Perko

    2002-12-01

    Full Text Available A thematic map may be a geographical application (tool in itself or the basis for some other geographical work. The development of Slovene thematic cartography accelerated considerably following the independence of the country in 1991. From the viewpoint of content and technology, its greatest achievements are the Geographical Atlas of Slovenia and the National Atlas of Slovenia, which are outstanding achievements at the international level and of great significance for the promotion of Slovenia and Slovene geography and cartography. However, this rapid development has been accompanied by numerous problems, for example, the ignoring of various Slovene and international conventions for the preparation of maps including United Nations resolutions, Slovene and international (SIST ISO, and copyright laws.

  13. Pre-Service Geography Teachers' Confidence in Geographical Subject Matter Knowledge and Teaching Geographical Skills

    Science.gov (United States)

    Harte, Wendy; Reitano, Paul

    2015-01-01

    This research tracked the confidence of 16 undergraduate and postgraduate pre-service geography teachers as they completed a single semester, senior phase geography curriculum course. The study focused specifically on the pre-service teachers' confidence in geographical subject matter knowledge and their confidence in teaching geographical skills.…

  14. Road pricing : a transport geographical perspective. Geographical accessibility and short and long-term behavioural effects

    NARCIS (Netherlands)

    Tillema, T.

    2007-01-01

    The introduction of a road pricing measure leads to changes in the transport costs on (certain) roads in a network at a certain time, possibly influencing the geographical accessibility of (groups of) people or firms at certain locations. Geographical accessibility indicators or measures give the

  15. Geographical classification of Chilean wines by an electronic nose

    Directory of Open Access Journals (Sweden)

    Nicolás H Beltrán

    2009-08-01

    Full Text Available Nicolás H Beltrán, Manuel A Duarte-Mermoud, Ricardo E MuñozDepartment of Electrical Engineering, University of Chile, Santiago, ChileAbstract: This paper discusses the classification of Chilean wines by geographical origin based only on aroma information. The varieties of Cabernet Sauvignon, Merlot, and Carménère analyzed here are produced in four different valleys in the central part of Chile (Colchagua, Maipo, Maule, and Rapel. Aroma information was obtained with a zNoseTM (fast gas chromatograph and the data was analyzed by applying wavelet transform for feature extraction followed by an analysis with support vector machines for classification. Two evaluations of the classification technique were performed; the average percentage of correct classification performed on the validation set was obtained by means of cross-validation against the percentage of correct classification obtained on the test set. This developed technique obtained results on classification rates over 94% in both cases. The geographical origin of a Chilean wine can be resolved rapidly with fast gas chromatography and data processing.Keywords: geographical origin, origin denomination, wine classification, pattern recognition, support vector machines, wavelet analysis, feature extraction

  16. África en el National Geographic

    OpenAIRE

    Cartagena, Andrés

    2011-01-01

    Por muchos años, el National Geographic Magazine sirvió como el único medio en el cual las personas leían sobre África. En mi trabajo propongo que la imagen que ha creado el National Geographic Magazine a través de sus artículos y fotos sobre el continente africano es una imagen exótica y en donde se exalta los valores imperialistas y los beneficios del colonialismo para África. Es una imagen en la que el continente es representado como un lugar a donde los estadounidenses pueden ir a hacer s...

  17. Geographical knowledge in patients with Alzheimer's disease.

    Science.gov (United States)

    Beatty, W W; Bernstein, N

    1989-01-01

    Geographical knowledge, a measure of remote memory for visuospatial information, was studied in mildly and moderately demented patients who met NINCDS-Alzheimer's Disease and Related Disorders Association criteria for Alzheimer's disease. Alzheimer's disease patients were moderately impaired on a test that emphasizes locating gross features of US geography and profoundly impaired in locating cities on a map of the region of the United States in which they resided. The possibility that performance on tests of geographical knowledge can be used to predict impending difficulties of demented patients in wayfinding is discussed.

  18. The Rebirth of the Theory of Imputation in the Science of Criminal Law: to an Overcoming Stage or an Involution to Pre-Scientific Conceptions?

    Directory of Open Access Journals (Sweden)

    Nicolás Santiago Cordini

    2015-06-01

    Full Text Available The Science of Criminal Law goes through a moment that can be characterized as a “crisis”. Faced with this situation, have been proliferate theories that define themselves as “theories of imputation” that leave, in whole or in part, the theory of crime up to now dominating. The aim of this article is to analyze three theories enrolled under the concept of imputation and determine in which proportion they conserve other they get off the categories proposed by the theory of crime. Then, we will establish in which proportion these theories constitute an advance for the Science of Criminal Law or, on the contrary, they are manifestations of a retreat to a pre-scientific stage.

  19. Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records.

    Science.gov (United States)

    Ryan, Ronan; Vernon, Sally; Lawrence, Gill; Wilson, Sue

    2012-01-23

    Information on ethnicity is commonly used by health services and researchers to plan services, ensure equality of access, and for epidemiological studies. In common with other important demographic and clinical data it is often incompletely recorded. This paper presents a method for imputing missing data on the ethnicity of cancer patients, developed for a regional cancer registry in the UK. Routine records from cancer screening services, name recognition software (Nam Pehchan and Onomap), 2001 national Census data, and multiple imputation were used to predict the ethnicity of the 23% of cases that were still missing following linkage with self-reported ethnicity from inpatient hospital records. The name recognition software were good predictors of ethnicity for South Asian cancer cases when compared with data on ethnicity derived from hospital inpatient records, especially when combined (sensitivity 90.5%; specificity 99.9%; PPV 93.3%). Onomap was a poor predictor of ethnicity for other minority ethnic groups (sensitivity 4.4% for Black cases and 0.0% for Chinese/Other ethnic groups). Area-based data derived from the national Census was also a poor predictor non-White ethnicity (sensitivity: South Asian 7.4%; Black 2.3%; Chinese/Other 0.0%; Mixed 0.0%). Currently, neither method for assigning individuals to an ethnic group (name recognition and ethnic distribution of area of residence) performs well across all ethnic groups. We recommend further development of name recognition applications and the identification of additional methods for predicting ethnicity to improve their precision and accuracy for comparisons of health outcomes. However, real improvements can only come from better recording of ethnicity by health services.

  20. Causality of the relationship between geographic distribution and species abundance

    DEFF Research Database (Denmark)

    Borregaard, Michael Krabbe; Rahbek, Carsten

    2010-01-01

    The positive relationship between a species' geographic distribution and its abundance is one of ecology's most well-documented patterns, yet the causes behind this relationship remain unclear. Although many hypotheses have been proposed to account for distribution-abundance relationships none have...... differences in terminology and ecological point of view. Realizing and accounting for these differences facilitates integration, so that the relative contributions of each mechanism may be evaluated. Here, we review all the mechanisms that have been proposed to account for distribution-abundance relationships...

  1. Subjective Poverty in Mexico: the Role of Income Evaluation Norms

    Directory of Open Access Journals (Sweden)

    Mariano Rojas

    2008-07-01

    Full Text Available This investigation studies the relationship between poverty concepts based on presumption and imputation of well–being and a poverty concept based on a person's own evaluation of his/her condition (subjective poverty. It is shown that there are important dissonances in the classification of people as poor or non–poor between the imputation/presumption concepts and the subjective poverty concept. Dissonances are explained on the basis of multiple discrepancy theory. It is shown that a person's evaluation of his/her life condition depends on his/her historical and social situation; as well as on the existence of important intra–household scale economies. Empirical work is based on a survey applied to 1 540 persons in five states of central and south Mexico.

  2. Groundwater quality mapping using geographic information system ...

    African Journals Online (AJOL)

    Spatial variations in ground water quality in the corporation area of Gulbarga City located in the northern part of Karnataka State, India, have been studied using geographic information system (GIS) technique. GIS, a tool which is used for storing, analyzing and displaying spatial data is also used for investigating ground ...

  3. Geographical information modelling for land resource survey

    NARCIS (Netherlands)

    Bruin, de S.

    2000-01-01

    The increasing popularity of geographical information systems (GIS) has at least three major implications for land resources survey. Firstly, GIS allows alternative and richer representation of spatial phenomena than is possible with the traditional paper map. Secondly, digital technology

  4. Geometric algorithms for delineating geographic regions

    NARCIS (Netherlands)

    Reinbacher, I.

    2006-01-01

    Everyone of us is used to geographical regions like the south of Utrecht, the dutch Randstad, or the mountainous areas of Austria. Some of these regions have crisp, fixed boundaries like Utrecht or Austria. Others, like the dutch Randstad and the Austrian mountains, have no such boundaries and are

  5. House Prices, Geographical Mobility, and Unemployment

    DEFF Research Database (Denmark)

    Ingholt, Marcus Mølbak

    2017-01-01

    Geographical mobility correlates positively with house prices and negatively with unemployment over the U.S. business cycle. I present a DSGE model in which declining house prices and tight credit conditions impede the mobility of indebted workers. This reduces the workers’ cross-area competition...

  6. Geographic Variation in Condom Availability and Accessibility.

    Science.gov (United States)

    Shacham, Enbal; J Nelson, Erik; Schulte, Lauren; Bloomfield, Mark; Murphy, Ryan

    2016-12-01

    Identifying predictors that contribute to geographic disparities in sexually transmitted infections (STIs) is necessary in order to reduce disparities. This study assesses the spatial relationship condom availability and accessibility in order to better identify determinants of geographic disparities in STIs. We conducted a telephone-based audit among potential-condom selling establishments. Descriptive analyses were conducted to detect differences in condom-selling characteristics by stores and by store type. Geocoding, mapping, and spatial analysis were conducted to measure the availability of condoms. A total of 850 potential condom-selling establishments participated in the condom availability and accessibility audit in St. Louis city; 29 % sold condoms. There were several significant geographic clusters of stores identified across the study area. The first consisted of fewer convenience stores and gas stations that sold condoms in the northern section of the city, whereas condoms were less likely to be sold in non-convenience store settings in the southwestern and central parts of the city. Additionally, locations that distributed free condoms clustered significantly in city center. However, there was a dearth of businesses that were neither convenience stores nor gas stations in the northern region of the city, which also had the highest concentration of condoms sold. This initial study was conducted to provide evidence that condom availability and accessibility differ by geographic region, and likely are a determinant of social norms surrounding condom use and ultimately impact STI rates.

  7. Geographical distances and support from family members

    NARCIS (Netherlands)

    Mulder, C.H.; van der Meer, M.J.

    2009-01-01

    We address two questions: to what extent does geographical distance to parents, siblings and children living outside the household influence receiving support from them? And to what extent does the availability of other network members living closer play a part in receiving support? We use the

  8. Genetic variation and geographical differentiation revealed using ...

    Indian Academy of Sciences (India)

    [Zhang L., Lu S., Sun D., and Peng J. 2015 Genetic variation and geographical differentiation revealed using ISSR markers in tung tree,. Vernicia fordii. J. Genet. 94, e5–e9. Online only: http://www.ias.ac.in/jgenet/OnlineResources/94/e5.pdf]. Introduction. Tung tree, Vernicia fordii is an oil-bearing woody plant species of ...

  9. Using Educational Tourism in Geographical Education

    Science.gov (United States)

    Prakapiene, Dalia; Olberkyte, Loreta

    2013-01-01

    The article analyses and defines the concept of educational tourism, presents the structure of the concept and looks into the opportunities for using educational tourism in geographical education. In order to reveal such opportunities a research was carried out in the Lithuanian national and regional parks using the qualitative method of content…

  10. GEOGRAPHERS AND ECOSYSTEMS: A POINT OF VIEW

    African Journals Online (AJOL)

    GEOGRAPHERS AND ECOSYSTEMS: A POINT OF VIEW. Frances Gamble. A short note pertaining to the new ecQsystems section of the. South African Standard 10 Core. Syllabus in Goograpily. 'rhe ideas were presented at a workshop for teachers held in tile. Southern Transvnal region by the South African Geagrap}lical.

  11. GENETIC DIVERSITY AND ECO-GEOGRAPHICAL DISTRIBUTION ...

    African Journals Online (AJOL)

    ACSS

    Classic F-statistics revealed the highest intra-specific polymorphism recorded for E. africana (32.45%), followed by E. coracana (16.83%); implying .... Genetic diversity and eco-geographical distribution of Eleusine species. 47 regions of Ethiopia as a ... were collected from the central highlands, west, northwest, northern and ...

  12. Geographical localisation of the geomagnetic secular variation

    DEFF Research Database (Denmark)

    Aubert, Julien; Finlay, Chris; Olsen, Nils

    2013-01-01

    Directly observed changes in Earth’s magnetic field occur most prominently at low latitudes beneath the Atlantic hemisphere, while the Pacific is comparatively quiet. This striking hemispheric asymmetry in geomagnetic secular variation is a consequence of the geographical localisation of intense...

  13. Decision Support System integrated with Geographic Information ...

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Earth System Science; Volume 124; Issue 1. Decision Support System integrated with Geographic Information System to target restoration actions in watersheds of arid environment: A case study of Hathmati watershed, Sabarkantha district, Gujarat. Dhruvesh P Patel Prashant K Srivastava ...

  14. Europeans among themselves: Geographical and linguistic stereotypes

    NARCIS (Netherlands)

    Mamadouh, V.D.; Dąbrowska, A.; Pisarek, W.; Stickel, G.

    2017-01-01

    Stereotypes can be studied from the perspective of political geography and critical geopolitics as part of geographical imaginations, in other words those geopolitical representations that help us make sense of the world around us. They necessarily frame our perception of ongoing events, and inform

  15. Representing Historical Knowledge in Geographic Information Systems

    Science.gov (United States)

    Grossner, Karl Eric

    2010-01-01

    A growing number of historical scholars in social science and humanities fields are using geographic information systems (GIS) to help investigate spatial questions and map their findings. The nature of historical data and historiographic practices present several challenges in using GIS that have been addressed only partially to date. For…

  16. Geographic characterisation of African provenances of Faidherbia ...

    African Journals Online (AJOL)

    The objectives of the study were to: (i) determine the phylogenetic relationship among the 16 provenances in order to establish the species centre of origin, and (ii) determine the extent of genetic diversity in F. alb ida using PCR markers. ITS data did not produce any consistent regional or geographic pattern. RAPD data ...

  17. Geographical Information Systems Approach To Surveillance Of ...

    African Journals Online (AJOL)

    Geographical Information Systems Approach To Surveillance Of Buruli Ulcer And Access To Health Care In Ga District Of Ghana. ... Journal of Applied Science and Technology ... Accessibility to health care facilities determined by using linear data indica-ted locational efficiency of existing health care facilities. In order to ...

  18. The Geographic Extent of Global Supply Chains

    DEFF Research Database (Denmark)

    Machikita, Tomohiro; Ueki, Yasushi

    2012-01-01

    We study the extent to which inter-firm relationships are locally concentrated and what determines firm differences in geographic proximity to domestic or foreign suppliers and customers. From micro-data on selfreported customer and supplier data of firms in Indonesia, the Philippines, Thailand, ...

  19. Geography and Geographical Information Science: Interdisciplinary Integrators

    Science.gov (United States)

    Ellul, Claire

    2015-01-01

    To understand how Geography and Geographical Information Science (GIS) can contribute to Interdisciplinary Research (IDR), it is relevant to articulate the differences between the different types of such research. "Multidisciplinary" researchers work in a "parallel play" mode, completing work in their disciplinary work streams…

  20. Formal Ontologies and Uncertainty. In Geographical Knowledge

    Directory of Open Access Journals (Sweden)

    Matteo Caglioni

    2014-05-01

    Full Text Available Formal ontologies have proved to be a very useful tool to manage interoperability among data, systems and knowledge. In this paper we will show how formal ontologies can evolve from a crisp, deterministic framework (ontologies of hard knowledge to new probabilistic, fuzzy or possibilistic frameworks (ontologies of soft knowledge. This can considerably enlarge the application potential of formal ontologies in geographic analysis and planning, where soft knowledge is intrinsically linked to the complexity of the phenomena under study.  The paper briefly presents these new uncertainty-based formal ontologies. It then highlights how ontologies are formal tools to define both concepts and relations among concepts. An example from the domain of urban geography finally shows how the cause-to-effect relation between household preferences and urban sprawl can be encoded within a crisp, a probabilistic and a possibilistic ontology, respectively. The ontology formalism will also determine the kind of reasoning that can be developed from available knowledge. Uncertain ontologies can be seen as the preliminary phase of more complex uncertainty-based models. The advantages of moving to uncertainty-based models is evident: whether it is in the analysis of geographic space or in decision support for planning, reasoning on geographic space is almost always reasoning with uncertain knowledge of geographic phenomena.

  1. A Comparison of Geographical Information Science Competency ...

    African Journals Online (AJOL)

    The authors conclude that a new competency set based on the findings of the research is needed to best serve the GISc industry and academia. Recommendations for further research are made. Keywords: Curriculum design, data acquisition, geographical information science (GISc), knowledge and skills requirements, ...

  2. Trends and Prospects of GIS in Geographical Education

    OpenAIRE

    Sakaue, Hiroaki

    2013-01-01

    Developing of geographical skills about information education is one of the important themes in new Course of Study.GIS has two definitions, one is Geographical Information System which means computer skills to treat with information, another is Geographical Information Science that means method or way of thinking to control and analyze the geographical information. GIS as education also has "teaching with GIS" and "teaching about GIS". GIS in geographical education plays the important roll o...

  3. Road pricing : a transport geographical perspective. Geographical accessibility and short and long-term behavioural effects

    OpenAIRE

    Tillema, T

    2007-01-01

    The introduction of a road pricing measure leads to changes in the transport costs on (certain) roads in a network at a certain time, possibly influencing the geographical accessibility of (groups of) people or firms at certain locations. Geographical accessibility indicators or measures give the opportunity to gain a quick and interpretable insight into the (accessibility) effects as a result of changes in the land-use or transport system (e.g. caused by certain policy interventions). These ...

  4. Geographical and temporal changes of anthropometric traits in historical Yemen.

    Science.gov (United States)

    Danubio, Maria Enrica; Milia, Nicola; Coppa, Alfredo; Rufo, Fabrizio; Sanna, Emanuele

    2016-02-01

    This study investigates secular changes of anthropometric variables among four geographic groups in historical Yemen, to evaluate possible regional differences in the evolution of living standards. Nineteen somatic and cephalic measures collected by Coon in 1939, and 8 anthropometric indices in 1244 Yemenite adult males were analyzed. The individuals were divided into 10-year age groups. Within-group variations were tested by One-way ANCOVA (age as covariate). ANCOVA (controlling for age), and Forward stepwise discriminant analysis were used to evaluate and represent regional differences. ANCOVA and discriminant analysis confirmed and enhanced previous findings. At the time, the Yemenite population presented high intergroup heterogeneity. The highest mean values of height at all ages were found in the "mountain" region, which is characterized by very fertile soils and where, nowadays, most of the cereals and pulses are grown and where most livestock is raised. Within-group variations were limited and generally inconsistent in all geographic regions and concern vertical dimensions, but mean values of height never differed. The prolonged internal isolation of these groups resulted in significant regional morphometric differentiation. The main evidence comes from height which suggests that socioeconomic factors have played a role. Nevertheless, the possible better living conditions experienced by the "mountain" group, with the highest mean values of stature in all periods, did not allow the secular trend to take place in that region, too. Copyright © 2015. Published by Elsevier GmbH.

  5. Investigating urban geochemistry using Geographical Information Systems.

    Science.gov (United States)

    Thums, C; Farago, M

    2001-01-01

    Geographical Information System (GIS) is an interactive digital extension of the two-dimensional paper map. Customised maps are created by the selection and aggregation of data from independent sources to assist studies in urban geochemistry. The metropolitan area of Wolverhampton, in the West Midlands, UK is used to illustrate the types of output that can be generated. These include: geographic and geological feature; geochemical data and land use. Multi-layered maps can be used to investigate spatial relationships, for example, between elevated concentrations of metals in soils and industrial land use. Such maps can also be used to assist the assessment of potential exposure of groundwater, ecosystems and humans using maps incorporating guideline values for metals in soils.

  6. Geographical origin: meaningful reference or marketing tool?

    DEFF Research Database (Denmark)

    Hedegaard, Liselotte

    2015-01-01

    constitutes a meaningful reference to a link between food and place that represents expectations of taste and quality. In Denmark, this link is not attributed similar meaning and, hence, the difference between meaningful references and images formed through the language of marketing is less discernible...... collected in France where geographical origin is perceived as indicator of quality. A possible explanation resides in the double standards rendered possible by the European labels as they refer to provenance as well as geographical origin. Provenance means to issue from a place in the sense that the place...... of production can be positioned on a map, but there is no precise standard for the rootedness of the product in this place. Origin means to be from a place in the sense that natural factors, cultural practices and historical links contribute to the understanding of the link between food and place...

  7. Geographical conceptualization of quality of life

    Directory of Open Access Journals (Sweden)

    Murgaš František

    2016-12-01

    Full Text Available The conceptualization of quality of life in terms of geography is based on two assumptions. The first assumption is that the quality of life consists of two dimensions: subjective and objective. The subjective is known as ‘well-being’, while the objective is the proposed term ‘quality of place’. The second assumption is based on the recognition that quality of life is always a spatial dimension. The concept of quality of life is closely linked with the concept of a good life; geographers enriched this concept by using the term ‘good place’ as a place in which the conditions are created for a good life. The quality of life for individuals in terms of a good place overlaps with the quality of life in society, namely the societal quality of life. The geographical conceptualisation of quality of life is applied to settlements within the city of Liberec.

  8. A Systems Perspective on Volunteered Geographic Information

    Directory of Open Access Journals (Sweden)

    Victoria Fast

    2014-12-01

    Full Text Available Volunteered geographic information (VGI is geographic information collected by way of crowdsourcing. However, the distinction between VGI as an information product and the processes that create VGI is blurred. Clearly, the environment that influences the creation of VGI is different than the information product itself, yet most literature treats them as one and the same. Thus, this research is motivated by the need to formalize and standardize the systems that support the creation of VGI. To this end, we propose a conceptual framework for VGI systems, the main components of which—project, participants, and technical infrastructure—form an environment conducive to the creation of VGI. Drawing on examples from OpenStreetMap, Ushahidi, and RinkWatch, we illustrate the pragmatic relevance of these components. Applying a system perspective to VGI allows us to better understand the components and functionality needed to effectively create VGI.

  9. SOLID WASTE: PRESENCE AND THREATIN GEOGRAPHICAL SPACE

    Directory of Open Access Journals (Sweden)

    Clesley Maria Tavares do Nascimento

    2017-12-01

    Full Text Available This article deals with the trajectory of the solid waste in different historical periods, configuring them as a constructive element of geographical space. The intention to bring the theme from the timeline perspective, is marked out in the conviction of the inseparability of the categories of space and time and its importance in understanding a geographical phenomenon. The methodological support of this research relied on the documentary type of research involving literature, consultation of secondary sources such as books, academic journals, dissertations and theses on the subject. The results presented and discussed in this paper indicated that the production of waste is adjacent to historical time, reflects societies and techniques that generated them, and is a permanent part of the dialectical process of spatial formation.

  10. PEDIATRIC FITNESS: SECULAR TRENDS AND GEOGRAPHIC VARIABILITY

    Directory of Open Access Journals (Sweden)

    Grant R. Tomkinson

    2007-06-01

    Full Text Available DESCRIPTION This book describes and discusses children's physical capacity in terms of aerobic and anaerobic power generation according to secular trends and geographic variability. PURPOSE To discuss the controversial issue of whether present day's children and adolescents are fitter than their equals of the past and whether they are fitter if they live in the more prosperous countries. AUDIENCE Pediatricians, medical practitioners, physical educators, exercise and/or sport scientists, exercise physiologists, personal trainers and graduate students in relevant fields will find this book helpful when dealing with contemporary trends and geographic variability in pediatric fitness. FEATURES The volume starts by examining the general picture on children fitness by the editors. The individual chapter's authors discuses the data gathered since the late 1950s on secular trends and geographic changeability in aerobic and anaerobic pediatric fitness performances of children and adolescents from 23 countries in Africa, Asia, Australasia, Europe, the Middle East and North America. There are chapters proposing that there is proof that there has been a world-wide decline in pediatric aerobic performance in recent decades, relative stability in anaerobic performance, and that the best performing children come from northern and central Europe. In final chapters possible causes to that end are considered, including whether weakening in aerobic performance are the result of distributional or widespread declines, and whether increases in obesity alone can explain the failure in aerobic performance. ASSESSMENT The editors have assembled a volume of Medicine and Sports Science that is necessary and essential reading for all who are interested in understanding and improving the fitness of children. The readers will find useful information in this book on secular trends and geographic variability in pediatric fitness. I believe, the book will serve as a first

  11. U-PLANT GEOGRAPHIC ZONE CLEANUP PROTOTYPE

    Energy Technology Data Exchange (ETDEWEB)

    ROMINE, L.D.

    2006-02-01

    The U Plant geographic zone (UPZ) occupies 0.83 square kilometers on the Hanford Site Central Plateau (200 Area). It encompasses the U Plant canyon (221-U Facility), ancillary facilities that supported the canyon, soil waste sites, and underground pipelines. The UPZ cleanup initiative coordinates the cleanup of the major facilities, ancillary facilities, waste sites, and contaminated pipelines (collectively identified as ''cleanup items'') within the geographic zone. The UPZ was selected as a geographic cleanup zone prototype for resolving regulatory, technical, and stakeholder issues and demonstrating cleanup methods for several reasons: most of the area is inactive, sufficient characterization information is available to support decisions, cleanup of the high-risk waste sites will help protect the groundwater, and the zone contains a representative cross-section of the types of cleanup actions that will be required in other geographic zones. The UPZ cleanup demonstrates the first of 22 integrated zone cleanup actions on the Hanford Site Central Plateau to address threats to groundwater, the environment, and human health. The UPZ contains more than 100 individual cleanup items. Cleanup actions in the zone will be undertaken using multiple regulatory processes and decision documents. Cleanup actions will include building demolition, waste site and pipeline excavation, and the construction of multiple, large engineered barriers. In some cases, different cleanup actions may be taken at item locations that are immediately adjacent to each other. The cleanup planning and field activities for each cleanup item must be undertaken in a coordinated and cohesive manner to ensure effective execution of the UPZ cleanup initiative. The UPZ zone cleanup implementation plan (ZCIP) was developed to address the need for a fundamental integration tool for UPZ cleanup. As UPZ cleanup planning and implementation moves forward, the ZCIP is intended to be a living

  12. Using Geographic Information Systems in science classrooms

    OpenAIRE

    Whitaker,Diane

    2011-01-01

    This qualitative study examines GIS use in two North Carolina classrooms and illustrates several GIS lessons that span the gamut of worksheet type lessons to independent student research. Using Geographic Information Systems, GIS, in the science classroom has a variety of benefits which the associated literature describes. The teachers in this study report that GIS is a technology that a wide range of students enjoy using. Visual learners find GIS a way to establish and communicate relationsh...

  13. Advanced Data Structure and Geographic Information Systems

    Science.gov (United States)

    Peuquet, D. (Principal Investigator)

    1984-01-01

    The current state of the art in specified areas of Geographic Information Systems GIS technology is examined. Study of the question of very large, efficient, heterogeneous spatial databases is required in order to explore the potential application of remotely sensed data for studying the long term habitability of the Earth. Research includes a review of spatial data structures and storage, development of operations required by GIS, and preparation of a testbed system to compare Vaster data structure with NASA's Topological Raster Structure.

  14. Geographical assemblages of European raptors and owls

    Science.gov (United States)

    López-López, Pascual; Benavent-Corai, José; García-Ripollés, Clara

    2008-09-01

    In this work we look for geographical structure patterns in European raptors (Order: Falconiformes) and owls (Order: Strigiformes). For this purpose we have conducted our research using freely available tools such as statistical software and databases. To perform the study, presence-absence data for the European raptors and owl species (Class Aves) were downloaded from the BirdLife International website. Using the freely available "pvclust" R-package, we applied similarity Jaccard index and cluster analysis in order to delineate biogeographical relationships for European countries. According to the cluster of similarity, we found that Europe is structured into two main geographical assemblages. The larger length branch separated two main groups: one containing Iceland, Greenland and the countries of central, northern and northwestern Europe, and the other group including the countries of eastern, southern and southwestern Europe. Both groups are divided into two main subgroups. According to our results, the European raptors and owls could be considered structured into four meta-communities well delimited by suture zones defined by Remington (1968) [Remington, C.L., 1968. Suture-zones of hybrid interaction between recently joined biotas. Evol. Biol. 2, 321-428]. Climatic oscillations during the Quaternary Ice Ages could explain at least in part the modern geographical distribution of the group.

  15. GEOGRAPHICAL EDUCATION MEDIATIZATION AND MEDIASECURITY ISSUES

    Directory of Open Access Journals (Sweden)

    M. R. Arpentieva

    2017-01-01

    Full Text Available The article is devoted to the interaction of legal and moral development of media technologies in the context of geographical education. The article summarizes the experience of the theoretical analysis of mediatization in geographic education, the legal and moral aspects of the disorders and ways of their prevention and correction in the process of educational interaction between teacher and student, between student and teacher, mediated mediatechnologies. It is noted that geographical education in the modern world is education, which is closely associated with the use of media technologies. In other types of education the role of media technologies in improving the quality of education is less obvious, in the field of teaching and learning geography, it speaks very clearly. Therefore, the problems associated with its mediatization, are very important and their solution is particularly compelling. These issues are primarily associated with actively flowing social, economic, political and ideological crisis in many communities and countries of the Earth. Many of them as in the “mirror” are reflected in the sphere of high technologies, including media technologies. The article provides guidance and direction to the correction of violations at the individual and social levels.

  16. Geographically isolated wetlands: Rethinking a misnomer

    Science.gov (United States)

    Mushet, David M.; Calhoun, Aram J.K.; Alexander, Laurie C.; Cohen, Matthew J.; DeKeyser, Edward S.; Fowler, Laurie G.; Lane, Charles R.; Lang, Megan W.; Rains, Mark C.; Walls, Susan

    2015-01-01

    We explore the category “geographically isolated wetlands” (GIWs; i.e., wetlands completely surrounded by uplands at the local scale) as used in the wetland sciences. As currently used, the GIW category (1) hampers scientific efforts by obscuring important hydrological and ecological differences among multiple wetland functional types, (2) aggregates wetlands in a manner not reflective of regulatory and management information needs, (3) implies wetlands so described are in some way “isolated,” an often incorrect implication, (4) is inconsistent with more broadly used and accepted concepts of “geographic isolation,” and (5) has injected unnecessary confusion into scientific investigations and discussions. Instead, we suggest other wetland classification systems offer more informative alternatives. For example, hydrogeomorphic (HGM) classes based on well-established scientific definitions account for wetland functional diversity thereby facilitating explorations into questions of connectivity without an a priori designation of “isolation.” Additionally, an HGM-type approach could be used in combination with terms reflective of current regulatory or policymaking needs. For those rare cases in which the condition of being surrounded by uplands is the relevant distinguishing characteristic, use of terminology that does not unnecessarily imply isolation (e.g., “upland embedded wetlands”) would help alleviate much confusion caused by the “geographically isolated wetlands” misnomer.

  17. Genomic evaluations with many more genotypes

    Directory of Open Access Journals (Sweden)

    Wiggans George R

    2011-03-01

    Full Text Available Abstract Background Genomic evaluations in Holstein dairy cattle have quickly become more reliable over the last two years in many countries as more animals have been genotyped for 50,000 markers. Evaluations can also include animals genotyped with more or fewer markers using new tools such as the 777,000 or 2,900 marker chips recently introduced for cattle. Gains from more markers can be predicted using simulation, whereas strategies to use fewer markers have been compared using subsets of actual genotypes. The overall cost of selection is reduced by genotyping most animals at less than the highest density and imputing their missing genotypes using haplotypes. Algorithms to combine different densities need to be efficient because numbers of genotyped animals and markers may continue to grow quickly. Methods Genotypes for 500,000 markers were simulated for the 33,414 Holsteins that had 50,000 marker genotypes in the North American database. Another 86,465 non-genotyped ancestors were included in the pedigree file, and linkage disequilibrium was generated directly in the base population. Mixed density datasets were created by keeping 50,000 (every tenth of the markers for most animals. Missing genotypes were imputed using a combination of population haplotyping and pedigree haplotyping. Reliabilities of genomic evaluations using linear and nonlinear methods were compared. Results Differing marker sets for a large population were combined with just a few hours of computation. About 95% of paternal alleles were determined correctly, and > 95% of missing genotypes were called correctly. Reliability of breeding values was already high (84.4% with 50,000 simulated markers. The gain in reliability from increasing the number of markers to 500,000 was only 1.6%, but more than half of that gain resulted from genotyping just 1,406 young bulls at higher density. Linear genomic evaluations had reliabilities 1.5% lower than the nonlinear evaluations with 50

  18. Geographic network diversity: How does it affect exploratory innovation?

    NARCIS (Netherlands)

    Bahlmann, M.D.

    2014-01-01

    This study examines the underexplored effect of the geographic configuration of entrepreneurs' networks on their ventures' levels of exploratory innovation. As entrepreneurs are found to engage in both proximate and distant knowledge ties, this paper's main predictor involves the geographic

  19. GNIS: Geographic Names Information Systems - All features (2013)

    Data.gov (United States)

    Earth Data Analysis Center, University of New Mexico — The Geographic Names Information System (GNIS) is the Federal standard for geographic nomenclature. The U.S. Geological Survey developed the GNIS for the U.S. Board...

  20. Racial and geographic variation in coronary heart disease mortality trends

    Directory of Open Access Journals (Sweden)

    Gillum Richard F

    2012-06-01

    Full Text Available Abstract Background Magnitudes, geographic and racial variation in trends in coronary heart disease (CHD mortality within the US require updating for health services and health disparities research. Therefore the aim of this study is to present data on these trends through 2007. Methods Data for CHD were analyzed using the US mortality files for 1999–2007 obtained from the US Centers for Disease Control and Prevention. Age-adjusted annual death rates were computed for non-Hispanic African Americans (AA and European Americans (EA aged 35–84 years. The direct method was used to standardize rates by age, using the 2000 US standard population. Joinpoint regression models were used to evaluate trends, expressed as annual percent change (APC. Results For both AA men and women the magnitude in CHD mortality is higher compared to EA men and women, respectively. Between 1999 and 2007 the rate declined both in AA and in EA of both sexes in every geographic division; however, relative declines varied. For example, among men, relative average annual declines ranged from 3.2% to 4.7% in AA and from 4.4% to 5.5% in EA among geographic divisions. In women, rates declined more in later years of the decade and in women over 54 years. In 2007, age-adjusted death rate per 100,000 for CHD ranged from 93 in EA women in New England to 345 in AA men in the East North Central division. In EA, areas near the Ohio and lower Mississippi Rivers had above average rates. Disparities in trends by urbanization level were also found. For AA in the East North Central division, the APC was similar in large central metro (−4.2, large fringe metro (−4.3, medium metro urbanization strata (−4.4, and small metro (−3.9. APC was somewhat higher in the micropolitan/non-metro (−5.3, and especially the non-core/non-metro (−6.5. For EA in the East South Central division, the APC was higher in large central metro (−5.3, large fringe metro (−4.3 and medium metro

  1. Human-geographical concept of the regional geodemographic system

    Directory of Open Access Journals (Sweden)

    Kateryna Sehida

    2017-10-01

    Full Text Available The synergetic analysis of geodemographic researches indicates that they can be solved with use of modern technologies of management. according to the theory of a sotsioaktogenez, for this purpose it is necessary to define and formulate accurately the purpose of future phase transition, to construct consistent system of the purposes taking into account own and provided resources, to create executive system, effective from the point of view of optimum use of the available methods (technologies and means of activity, and to control and analyze obtaining result. The analysis of results of social management demands the quantitative description and comparison of real result with his expected model (purpose. The offered concept of geodemographic system of the region on the basis of dissipative structures which treats people, groups of people, society is aimed at the development and functioning of the studied system where the special role belongs to implementation of administrative decisions. In article it is covered the generalized structure of the concept, it is revealed her the purpose, an object subject area. It is defined public and spatial localization of a research, in particular within regional, region and local communities. It is identified geodemographic process as composite human and geographical process as sotsioaktogenez (with determination of stages of motivation, system of the purposes, executive system and result from a line item of society and a family as self-development and self-organization (with determination of the internal and external factors supporting and evolutionary resources, mechanisms as process (information exchange, external and internal adaptation. Methodological approaches (geographical, system, synergy, information, historical, research techniques (the analysis of system indices, simulation of a path of development, the component analysis and evaluation and prognostic simulation are opened. Technological procedures

  2. Geographic Information Systems and Web Page Development

    Science.gov (United States)

    Reynolds, Justin

    2004-01-01

    The Facilities Engineering and Architectural Branch is responsible for the design and maintenance of buildings, laboratories, and civil structures. In order to improve efficiency and quality, the FEAB has dedicated itself to establishing a data infrastructure based on Geographic Information Systems, GIS. The value of GIS was explained in an article dating back to 1980 entitled "Need for a Multipurpose Cadastre" which stated, "There is a critical need for a better land-information system in the United States to improve land-conveyance procedures, furnish a basis for equitable taxation, and provide much-needed information for resource management and environmental planning." Scientists and engineers both point to GIS as the solution. What is GIS? According to most text books, Geographic Information Systems is a class of software that stores, manages, and analyzes mapable features on, above, or below the surface of the earth. GIS software is basically database management software to the management of spatial data and information. Simply put, Geographic Information Systems manage, analyze, chart, graph, and map spatial information. GIS can be broken down into two main categories, urban GIS and natural resource GIS. Further still, natural resource GIS can be broken down into six sub-categories, agriculture, forestry, wildlife, catchment management, archaeology, and geology/mining. Agriculture GIS has several applications, such as agricultural capability analysis, land conservation, market analysis, or whole farming planning. Forestry GIs can be used for timber assessment and management, harvest scheduling and planning, environmental impact assessment, and pest management. GIS when used in wildlife applications enables the user to assess and manage habitats, identify and track endangered and rare species, and monitor impact assessment.

  3. Epidemiology of hip fracture: Worldwide geographic variation

    Directory of Open Access Journals (Sweden)

    Dinesh K Dhanwal

    2011-01-01

    Full Text Available Osteoporosis is a major health problem, especially in elderly populations, and is associated with fragility fractures at the hip, spine, and wrist. Hip fracture contributes to both morbidity and mortality in the elderly. The demographics of world populations are set to change, with more elderly living in developing countries, and it has been estimated that by 2050 half of hip fractures will occur in Asia. This review conducted using the PubMed database describes the incidence of hip fracture in different regions of the world and discusses the possible causes of this wide geographic variation. The analysis of data from different studies show a wide geographic variation across the world, with higher hip fracture incidence reported from industrialized countries as compared to developing countries. The highest hip fracture rates are seen in North Europe and the US and lowest in Latin America and Africa. Asian countries such as Kuwait, Iran, China, and Hong Kong show intermediate hip fracture rates. There is also a north-south gradient seen in European studies, and more fractures are seen in the north of the US than in the south. The factors responsible of this variation are population demographics (with more elderly living in countries with higher incidence rates and the influence of ethnicity, latitude, and environmental factors. The understanding of this changing geographic variation will help policy makers to develop strategies to reduce the burden of hip fractures in developing countries such as India, which will face the brunt of this problem over the coming decades.

  4. Geographic Information Processings for Astronomical Site Survey

    Science.gov (United States)

    Wu, N.; Liu, Y.; Zhao, M. Y.

    2015-01-01

    The geographic information is of great importance for the site survey of ground-based telescopes. Especially, an effective utilization of the geographic information system (GIS) has been one of the most significant methods for the remote analysis of modern site survey. The astronomical site survey should give consideration to the following geographical conditions: a large relative fall, convenient traffic conditions, and far away from populated areas. Taking into account of the convenience of construction and maintenance of the observatories as well as the living conditions of the scientists-in-residence, the optimum candidate locations may meet the conditions to be at a altitude between 3000 m and 5000 m and within one-hour drive from villages/towns. In this paper, as an example, we take the regions of the Great Baicao mountain ridge at Dayao county in Yunnan province to research the role of the GIS for site survey task. The results indicate that the GIS can provide accurate and intuitive data for us to understand the three dimensional landforms, rivers, roads, villages, and the distributions of the electric power as well as to forecast the tendency of the population and city development around. According to the analysis based on the GIS, we find that the top of the Great Baicao mountain ridge is flat and droughty. There are few inhabitants to distribute around the place while the traffic conditions are convenient. Moreover, it is a natural conservation area protected by the local government, and no industry with pollution sources exists in this region. Its top is 1500 m higher than the nearby village 10 km away, and 1800 m higher than the town center 50 km away. The Great Baicao mountain ridge is definitely an isolated peak in the area of the Yi nationality of Yunnan. Therefore, the GIS data analysis is a very useful for the remote investigation stage for site survey, and the GIS is the indispensable source for modern astronomical site survey.

  5. Geolinde - a geographical online learning platform

    Science.gov (United States)

    Steinmüller, Max

    2017-04-01

    Starting about ten years ago during a classroom project on Africa, two colleagues and me started developing an educational platform with geographic content: www.geolinde.musin.de The basic concept was to collect and present a wide range of free educational materials, which could be used by teachers, students and anyone who is interested in geography as well. Soon we found out that producing units for our students also means working on age-appropriate texts on each topic. We made our learning units matching the curriculum for Bavarian 'Gymnasium' and are still working on the improvement of each single unit, especially on the basis of suggestions by our students and our teaching experience. The main advantage in teaching with units from geolinde is that the students work at their own speed, repeat topics, use the glossary or have a look at the skill pages. Everyone uses the wide range of materials in his own way to achieve the curricular goals. Many topics contain short online tests, so that the students can control their basic understanding. The teacher is set free for giving helpful advice, discussing special questions and to monitor the learning progress. After a certain time a question and answer session follows and puts the focus on major curricular goals. Until now www.geolinde.musin.de consists of several blended learning units: Africa, Europe, Climate, Climate Change, Plate tectonics,… It also contains thematic pages on many geographical skills, a glossary of more than one thousand geographic terms and last but not least a collection of approximately 23.000 photos of places of interest all around the world. All the many thousand web pages can be used freely (CC-BY-SA 4.0). The only limitation is www.geolinde.musin.de is available in German only.

  6. Geographical variation in the heterogeneity of mutualistic networks.

    Science.gov (United States)

    Sakai, Shoko; Metelmann, Soeren; Toquenaga, Yukihiko; Telschow, Arndt

    2016-06-01

    Plant-animal mutualistic networks are characterized by highly heterogeneous degree distributions. The majority of species interact with few partner species, while a small number are highly connected to form network hubs that are proposed to play an important role in community stability. It has not been investigated, however, if or how the degree distributions vary among types of mutualisms or communities, or between plants and animals in the same network. Here, we evaluate the degree distributions of pollination and seed-dispersal networks, which are two major types of mutualistic networks that have often been discussed in parallel, using an index based on Pielou's evenness. Among 56 pollination networks we found strong negative correlation of the heterogeneity between plants and animals, and geographical shifts of network hubs from plants in temperate regions to animals in the tropics. For 28 seed-dispersal networks, by contrast, the correlation was positive, and there is no comparable geographical pattern. These results may be explained by evolution towards specialization in the presence of context-dependent costs that occur if plants share the animal species as interaction partner. How the identity of network hubs affects the stability and resilience of the community is an important question for future studies.

  7. Geographical variation in the heterogeneity of mutualistic networks

    Science.gov (United States)

    Sakai, Shoko; Metelmann, Soeren; Toquenaga, Yukihiko; Telschow, Arndt

    2016-01-01

    Plant–animal mutualistic networks are characterized by highly heterogeneous degree distributions. The majority of species interact with few partner species, while a small number are highly connected to form network hubs that are proposed to play an important role in community stability. It has not been investigated, however, if or how the degree distributions vary among types of mutualisms or communities, or between plants and animals in the same network. Here, we evaluate the degree distributions of pollination and seed-dispersal networks, which are two major types of mutualistic networks that have often been discussed in parallel, using an index based on Pielou's evenness. Among 56 pollination networks we found strong negative correlation of the heterogeneity between plants and animals, and geographical shifts of network hubs from plants in temperate regions to animals in the tropics. For 28 seed-dispersal networks, by contrast, the correlation was positive, and there is no comparable geographical pattern. These results may be explained by evolution towards specialization in the presence of context-dependent costs that occur if plants share the animal species as interaction partner. How the identity of network hubs affects the stability and resilience of the community is an important question for future studies. PMID:27429761

  8. µ-shapes: Delineating urban neighborhoods using volunteered geographic information

    Directory of Open Access Journals (Sweden)

    Matt Aadland

    2016-06-01

    Full Text Available Urban neighborhoods are a unique form of geography in that their boundaries rely on a social definition rather than a well-defined physical or administrative boundary. Currently, geographic gazetteers capture little more than then the centroid of a neighborhood, limiting potential applications of the data. In this paper, we present µ-shapes, an algorithm that employs fuzzy-set theory to model neighborhood boundaries suitable for populating gazetteers using volunteered geographic information (VGI. The algorithm is evaluated using a reference dataset and VGI from the Map Kibera Project. A confusion matrix comparison between the reference dataset and µ-shape's output demonstrated high sensitivity and accuracy. Analysis of variance indicated that the algorithm was able to distinguish between boundary and interior blocks. This suggests that, given the existing state of GIS technology, the µ-shapes algorithm can enable neighborhood-related queries that incorporate spatial uncertainty, e.g., find all restaurants within the core of a neighborhood.

  9. Geographic Accessibility to Higher Education on the Island of Ireland

    Science.gov (United States)

    Walsh, Sharon; Flannery, Darragh; Cullinan, John

    2015-01-01

    This paper presents, for the first time, comprehensive measures of geographic accessibility to higher education both within and between the Republic of Ireland and Northern Ireland. Using geographic information system techniques, we find high levels of geographic accessibility to higher education in both jurisdictions. However, when we…

  10. Surveying and Mapping Geographical Information from the Perspective of Geography

    Directory of Open Access Journals (Sweden)

    LÜ Guonian

    2017-10-01

    Full Text Available It briefly reviewed the history of geographic information content development since the existence of geographic information system. It pointed out that the current definition of geographic information is always the extension from the "spatial+ attributes" basic mapping framework of geographic information. It is increasingly difficult to adapt to the analysis and application of spatial-temporal big data. From the perspective of geography research subject and content, it summarized systematically that the content and extension of the "geographic information" that geography needs. It put forward that a six-element expression model of geographic information, including spatial location, semantic description, attribute characteristics, geometric form, evolution process, and objects relationship.Under the guidance of the laws of geography, for geographical phenomenon of spatial distribution, temporal pattern and evolution process, the interaction mechanism of the integrated expression, system analysis and efficient management, it designed that a unified GIS data model which is expressed by six basic elements, a new GIS data structure driven by geographical rules and interaction, and key technologies of unstructured spatio-temporal data organization and storage. It provided that a theoretical basis and technical support for the shift from the surveying and mapping geographic information to the scientific geographic information, and it can help improving the organization, management, analysis and expression ability of the GIS of the geographical laws such as geographical pattern, evolution process, and interaction between elements.

  11. Dynamic management of geographic data in a virtual environment

    NARCIS (Netherlands)

    Jense, G.J.; Donkers, K.

    1996-01-01

    In order to achieve true 3D user interaction with geographic information, an interface between a virtual environment system and a geographic information system has been designed and implemented. This VE/GIS interface is based on a loose coupling of the underlying geographic database and the virtual

  12. Geographical Inquiry in Australian Schools: A Retrospective Analysis

    Science.gov (United States)

    Kidman, Gillian

    2012-01-01

    This paper explores the occurrence of geographical inquiry in the Australian curriculum since Geography became a high school subject in 1911. In this historical overview, I reflect upon my own experiences of undertaking geographical inquiry during the 1970s and 1980s. Primary school geographical inquiry experiences can be virtually non-existent…

  13. Cartography and Geographic Information Science in Current Contents

    Directory of Open Access Journals (Sweden)

    Nedjeljko Frančula

    2009-12-01

    Full Text Available The Cartography and Geographic Information Science (CaGIS journal was published as The American Cartographer from 1974 to 1989, after that as Cartography and Geographic Information System, and since then has been published with its current name. It is published by the Cartography and Geographic Information Society, a member of the American Congress on Surveying and Mapping.

  14. Harvesting geographic features from heterogeneous raster maps

    Science.gov (United States)

    Chiang, Yao-Yi

    2010-11-01

    Raster maps offer a great deal of geospatial information and are easily accessible compared to other geospatial data. However, harvesting geographic features locked in heterogeneous raster maps to obtain the geospatial information is challenging. This is because of the varying image quality of raster maps (e.g., scanned maps with poor image quality and computer-generated maps with good image quality), the overlapping geographic features in maps, and the typical lack of metadata (e.g., map geocoordinates, map source, and original vector data). Previous work on map processing is typically limited to a specific type of map and often relies on intensive manual work. In contrast, this thesis investigates a general approach that does not rely on any prior knowledge and requires minimal user effort to process heterogeneous raster maps. This approach includes automatic and supervised techniques to process raster maps for separating individual layers of geographic features from the maps and recognizing geographic features in the separated layers (i.e., detecting road intersections, generating and vectorizing road geometry, and recognizing text labels). The automatic technique eliminates user intervention by exploiting common map properties of how road lines and text labels are drawn in raster maps. For example, the road lines are elongated linear objects and the characters are small connected-objects. The supervised technique utilizes labels of road and text areas to handle complex raster maps, or maps with poor image quality, and can process a variety of raster maps with minimal user input. The results show that the general approach can handle raster maps with varying map complexity, color usage, and image quality. By matching extracted road intersections to another geospatial dataset, we can identify the geocoordinates of a raster map and further align the raster map, separated feature layers from the map, and recognized features from the layers with the geospatial

  15. Virtual Globe Games for Geographic Learning

    Directory of Open Access Journals (Sweden)

    Ola Ahlqvist

    2010-02-01

    Full Text Available Virtual, online maps and globes allow for volunteered geographic information to capitalize on users as sensors and generate unprecedented access to information resources and services. These new "Web 2.0" applications will probably dominate development and use of virtual globes and maps in the near future. We present an experimental platform that integrates an existing virtual globe interface with added functionality as follows; an interactive layer on top of the existing map that support real time creation and manipulation of spatial interaction objects. These objects, together with the existing information delivered through the virtual globe, form a game board that can be used for educational purposes.

  16. Tanzanian food origins and protected geographical indications

    DEFF Research Database (Denmark)

    John, Innocensia Festo; Egelyng, Henrik; Lokina, Azack

    2016-01-01

    of food origin products in Tanzania that have potential for GI certification. The hypothesis was that there are origin products in Tanzania whose unique characteristics are linked to the area of production. Geographical indications can be useful policy instruments contributing to food security...... area, product quality perceived by the consumer in terms of taste, flavour, texture, aroma, appearance (colour, size) and perceptions of links between geography related factors (soil, land weather characteristics) and product qualities. A qualitative case study analysis was done for each of the (five...

  17. Practical Statistics for Geographers and Earth Scientists

    CERN Document Server

    Walford, Nigel

    2011-01-01

    Practical Statistics for Geographers and Earth Scientists is a text that all students can work through, regardless of their geography or earth science stream degree pathway and their existing mathematical knowledge. The text demystifies the mathematical component of statistics and presents these techniques in an easy-to-understand fashion. Case studies that illustrate the workings of each technique through photographs and diagrams will help students visualize some of the processes involved. Also covered in the book is a clear explanation of how statistical software packages work.

  18. A note on the relationships between multiple imputation, maximum likelihood and fully Bayesian methods for missing responses in linear regression models.

    Science.gov (United States)

    Chen, Qingxia; Ibrahim, Joseph G

    2014-07-01

    Multiple Imputation, Maximum Likelihood and Fully Bayesian methods are the three most commonly used model-based approaches in missing data problems. Although it is easy to show that when the responses are missing at random (MAR), the complete case analysis is unbiased and efficient, the aforementioned methods are still commonly used in practice for this setting. To examine the performance of and relationships between these three methods in this setting, we derive and investigate small sample and asymptotic expressions of the estimates and standard errors, and fully examine how these estimates are related for the three approaches in the linear regression model when the responses are MAR. We show that when the responses are MAR in the linear model, the estimates of the regression coefficients using these three methods are asymptotically equivalent to the complete case estimates under general conditions. One simulation and a real data set from a liver cancer clinical trial are given to compare the properties of these methods when the responses are MAR.

  19. A mixed approach and a distribution-free multiple imputation technique for the estimation of a multivariate probit model with missing values.

    Science.gov (United States)

    Spiess, M; Keller, F

    1999-05-01

    In the present paper a mixed generalized estimating/pseudo-score equations (GEPSE) approach together with a distribution-free multiple imputation technique is proposed for the estimation of regression and correlation structure parameters of multivariate probit models with missing values for an ordered categorical time-invariant variable. Furthermore, a generalization of the squared trace correlation (RT2) for multivariate probit models, denoted by pseudo-RT2, is proposed. A simulation study was conducted, simulating a probit model with an equicorrelation structure in the errors of an underlying regression model and using two different missing mechanisms. For a low 'true' correlation the difference between the GEPSE, a generalized estimating equations (GEE) and a maximum likelihood (ML) estimator were negligible. For a high 'true' correlation the GEPSE estimator turned out to be more efficient than the GEE and very efficient relative to the ML estimator. Furthermore, the pseudo-RT2 was close to RT2 of the underlying linear model. The mixed approach is illustrated using a psychiatric data set of depressive in-patients. The results of this analysis suggest that the depression score at discharge from a psychiatric hospital and the occurrence of stressful life events seem to increase the probability of having an episode of major depression within a one-year interval after discharge. Furthermore, the correlation structure points to short-time effects on having or not having a depressive episode, not accounted for in the systematic part of the regression model.

  20. Geographically weighted regression model on poverty indicator

    Science.gov (United States)

    Slamet, I.; Nugroho, N. F. T. A.; Muslich

    2017-12-01

    In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.

  1. Geographic Gossip: Efficient Averaging for Sensor Networks

    Science.gov (United States)

    Dimakis, Alexandros D. G.; Sarwate, Anand D.; Wainwright, Martin J.

    Gossip algorithms for distributed computation are attractive due to their simplicity, distributed nature, and robustness in noisy and uncertain environments. However, using standard gossip algorithms can lead to a significant waste in energy by repeatedly recirculating redundant information. For realistic sensor network model topologies like grids and random geometric graphs, the inefficiency of gossip schemes is related to the slow mixing times of random walks on the communication graph. We propose and analyze an alternative gossiping scheme that exploits geographic information. By utilizing geographic routing combined with a simple resampling method, we demonstrate substantial gains over previously proposed gossip protocols. For regular graphs such as the ring or grid, our algorithm improves standard gossip by factors of $n$ and $\\sqrt{n}$ respectively. For the more challenging case of random geometric graphs, our algorithm computes the true average to accuracy $\\epsilon$ using $O(\\frac{n^{1.5}}{\\sqrt{\\log n}} \\log \\epsilon^{-1})$ radio transmissions, which yields a $\\sqrt{\\frac{n}{\\log n}}$ factor improvement over standard gossip algorithms. We illustrate these theoretical results with experimental comparisons between our algorithm and standard methods as applied to various classes of random fields.

  2. Acquiring geographical data with web harvesting

    Science.gov (United States)

    Dramowicz, K.

    2016-04-01

    Many websites contain very attractive and up to date geographical information. This information can be extracted, stored, analyzed and mapped using web harvesting techniques. Poorly organized data from websites are transformed with web harvesting into a more structured format, which can be stored in a database and analyzed. Almost 25% of web traffic is related to web harvesting, mostly while using search engines. This paper presents how to harvest geographic information from web documents using the free tool called the Beautiful Soup, one of the most commonly used Python libraries for pulling data from HTML and XML files. It is a relatively easy task to process one static HTML table. The more challenging task is to extract and save information from tables located in multiple and poorly organized websites. Legal and ethical aspects of web harvesting are discussed as well. The paper demonstrates two case studies. The first one shows how to extract various types of information about the Good Country Index from the multiple web pages, load it into one attribute table and map the results. The second case study shows how script tools and GIS can be used to extract information from one hundred thirty six websites about Nova Scotia wines. In a little more than three minutes a database containing one hundred and six liquor stores selling these wines is created. Then the availability and spatial distribution of various types of wines (by grape types, by wineries, and by liquor stores) are mapped and analyzed.

  3. A Method for Analyzing Volunteered Geographic Information ...

    Science.gov (United States)

    Volunteered geographic information (VGI) can be used to identify public valuation of ecosystem services in a defined geographic area using photos as a representation of lived experiences. This method can help researchers better survey and report on the values and preferences of stakeholders involved in rehabilitation and revitalization projects. Current research utilizes VGI in the form of geotagged social media photos from three platforms: Flickr, Instagram, and Panaramio. Social media photos have been obtained for the neighborhoods next to the St. Louis River in Duluth, Minnesota, and are being analyzed along several dimensions. These dimensions include the spatial distribution of each platform, the characteristics of the physical environment portrayed in the photos, and finally, the ecosystem service depicted. In this poster, we focus on the photos from the Irving and Fairmount neighborhoods of Duluth, MN to demonstrate the method at the neighborhood scale. This study demonstrates a method for translating the values expressed in social media photos into ecosystem services and spatially-explicit data to be used in multiple settings, including the City of Duluth’s Comprehensive Planning and community revitalization efforts, habitat restoration in a Great Lakes Area of Concern, and the USEPA’s Office of Research and Development. This poster will demonstrate a method for translating values expressed in social media photos into ecosystem services and spatially

  4. Geographic wormhole detection in wireless sensor networks.

    Science.gov (United States)

    Sookhak, Mehdi; Akhundzada, Adnan; Sookhak, Alireza; Eslaminejad, Mohammadreza; Gani, Abdullah; Khurram Khan, Muhammad; Li, Xiong; Wang, Xiaomin

    2015-01-01

    Wireless sensor networks (WSNs) are ubiquitous and pervasive, and therefore; highly susceptible to a number of security attacks. Denial of Service (DoS) attack is considered the most dominant and a major threat to WSNs. Moreover, the wormhole attack represents one of the potential forms of the Denial of Service (DoS) attack. Besides, crafting the wormhole attack is comparatively simple; though, its detection is nontrivial. On the contrary, the extant wormhole defense methods need both specialized hardware and strong assumptions to defend against static and dynamic wormhole attack. The ensuing paper introduces a novel scheme to detect wormhole attacks in a geographic routing protocol (DWGRP). The main contribution of this paper is to detect malicious nodes and select the best and the most reliable neighbors based on pairwise key pre-distribution technique and the beacon packet. Moreover, this novel technique is not subject to any specific assumption, requirement, or specialized hardware, such as a precise synchronized clock. The proposed detection method is validated by comparisons with several related techniques in the literature, such as Received Signal Strength (RSS), Authentication of Nodes Scheme (ANS), Wormhole Detection uses Hound Packet (WHOP), and Wormhole Detection with Neighborhood Information (WDI) using the NS-2 simulator. The analysis of the simulations shows promising results with low False Detection Rate (FDR) in the geographic routing protocols.

  5. Geographic wormhole detection in wireless sensor networks.

    Directory of Open Access Journals (Sweden)

    Mehdi Sookhak

    Full Text Available Wireless sensor networks (WSNs are ubiquitous and pervasive, and therefore; highly susceptible to a number of security attacks. Denial of Service (DoS attack is considered the most dominant and a major threat to WSNs. Moreover, the wormhole attack represents one of the potential forms of the Denial of Service (DoS attack. Besides, crafting the wormhole attack is comparatively simple; though, its detection is nontrivial. On the contrary, the extant wormhole defense methods need both specialized hardware and strong assumptions to defend against static and dynamic wormhole attack. The ensuing paper introduces a novel scheme to detect wormhole attacks in a geographic routing protocol (DWGRP. The main contribution of this paper is to detect malicious nodes and select the best and the most reliable neighbors based on pairwise key pre-distribution technique and the beacon packet. Moreover, this novel technique is not subject to any specific assumption, requirement, or specialized hardware, such as a precise synchronized clock. The proposed detection method is validated by comparisons with several related techniques in the literature, such as Received Signal Strength (RSS, Authentication of Nodes Scheme (ANS, Wormhole Detection uses Hound Packet (WHOP, and Wormhole Detection with Neighborhood Information (WDI using the NS-2 simulator. The analysis of the simulations shows promising results with low False Detection Rate (FDR in the geographic routing protocols.

  6. Protection of Geographical Indication Intellectual Property of Tea in Zhejiang Province

    OpenAIRE

    Sun, Zhiguo; Xiong, Wanzhen; Wang, Shuting; Zhong, Xuebin

    2013-01-01

    As to tea resources in Zhejiang Province at present, there are 8 kinds of national geographical indication products, 23 national geographical indication trademarks, and 7 kinds of national geographical indication of agricultural products. From the geographical indication protection, geographical indication trademark registration, geographical indication registration of agricultural products, we conduct a analysis on the current protection of geographical indication intellectual property of te...

  7. What Influences Geography Teachers' Usage of Geographic Information Systems? A Structural Equation Analysis

    Science.gov (United States)

    Lay, Jinn-Guey; Chi, Yu-Lin; Hsieh, Yeu-Sheng; Chen, Yu-Wen

    2013-01-01

    Understanding the usage of the geographic information system (GIS) among geography teachers is a crucial step in evaluating the current dissemination of GIS knowledge and skills in Taiwan's educational system. The primary contribution of this research is to further our understanding of the factors that affect teachers' GIS usage. The structural…

  8. The Significant Surface-Water Connectivity of “Geographically Isolated Wetlands”

    Science.gov (United States)

    We evaluated the current literature, coupled with our collective research expertise, on surface-water connectivity of wetlands considered to be “geographically isolated” (sensu Tiner Wetlands 23:494–516, 2003a) to critically assess the scientific foundation of g...

  9. Using Metadata to Build Geographic Information Sharing Environment on Internet

    Directory of Open Access Journals (Sweden)

    Chih-hong Sun

    1999-12-01

    Full Text Available Internet provides a convenient environment to share geographic information. Web GIS (Geographic Information System even provides users a direct access environment to geographic databases through Internet. However, the complexity of geographic data makes it difficult for users to understand the real content and the limitation of geographic information. In some cases, users may misuse the geographic data and make wrong decisions. Meanwhile, geographic data are distributed across various government agencies, academic institutes, and private organizations, which make it even more difficult for users to fully understand the content of these complex data. To overcome these difficulties, this research uses metadata as a guiding mechanism for users to fully understand the content and the limitation of geographic data. We introduce three metadata standards commonly used for geographic data and metadata authoring tools available in the US. We also review the current development of geographic metadata standard in Taiwan. Two metadata authoring tools are developed in this research, which will enable users to build their own geographic metadata easily.[Article content in Chinese

  10. Métodos de imputación para el tratamiento de datos faltantes: aplicación mediante R/Splus = Imputation methods to handle the problem of missing data: an application using R/Splus

    Directory of Open Access Journals (Sweden)

    Muñoz Rosas, Juan Francisco

    2009-01-01

    Full Text Available La aparición de datos faltantes es un problema común en la mayoría de las encuestas llevadas a cabo en distintos ámbitos. Una técnica tradicional y muy conocida para el tratamiento de datos faltantes es la imputación. La mayoría de los estudios relacionados con los métodos de imputación se centran en el problema de la estimación de la media y su varianza y están basados en diseños muestrales simples tales como el muestreo aleatorio simple. En este trabajo se describen los métodos de imputación más conocidos y se plantean bajo el contexto de un diseño muestral general y para el caso de diferentes mecanismos de respuesta. Mediante estudios de simulación Monte Carlo basados en datos reales extraídos del ámbito de la economía y la empresa, analizamos las propiedades de varios métodos de imputación en la estimación de otros parámetros que también son utilizados con frecuencia en la práctica, como son las funciones de distribución y los cuantiles. Con el fin de que los métodos de imputación descritos en este trabajo se puedan implementar y usar con mayor facilidad, se proporcionan sus códigos en los lenguajes de programación R y Splus. = Missing values are a common problem in many sampling surveys, and imputation is usually employed to compensate for non-response. Most imputation methods are based upon the problem of the mean estimation and its variance, and they also assume simple sampling designs such as the simple random sampling without replacement. In this paper we describe some imputation methods and define them under a general sampling design. Different response mechanisms are also discussed. Assuming some populations based upon real data extracted from the context of the economy and business, Monte Carlo simulations are carried out to analyze the properties of the various imputation methods in the estimation of parameters such as distribution functions and quantiles. The various imputation methods are implemented

  11. Enhancing Phylogeography by Improving Geographical Information from GenBank

    Science.gov (United States)

    Scotch, Matthew; Sarkar, Indra Neil; Mei, Changjiang; Leaman, Robert; Cheung, Kei-Hoi; Ortiz, Pierina; Singraur, Ashutosh; Gonzalez, Graciela

    2011-01-01

    Phylogeography is a field that focuses on the geographical lineages of species such as vertebrates or viruses. Here, geographical data, such as location of a species or viral host is as important as the sequence information extracted from the species. Together, this information can help illustrate the migration of the species over time within a geographical area, the impact of geography over the evolutionary history, or the expected population of the species within the area. Molecular sequence data from NCBI, specifically GenBank, provide an abundance of available sequence data for phylogeography. However, geographical data is inconsistently represented and sparse across GenBank entries. This can impede analysis and in situations where the geographical information is inferred, and potentially lead to erroneous results. In this paper, we describe the current state of geographical data in GenBank, and illustrate how automated processing techniques such as named entity recognition, can enhance the geographical data available for phylogeographic studies. PMID:21723960

  12. Using Geographic Information System (GIS) to Improve Fourth Graders' Geographic Content Knowledge and Map Skills

    Science.gov (United States)

    Shin, Eui-kyung

    2006-01-01

    This research was based on an instructional module developed and used to investigate whether GIS can be used to enhance fourth grade students' geographic knowledge and map skills. Another goal was to identify challenges the teacher and the students face using GIS. Findings from the study suggest that using GIS in the classroom helps students…

  13. Geographic Information Systems and Web Page Development

    Science.gov (United States)

    Reynolds, Justin

    2004-01-01

    The Facilities Engineering and Architectural Branch is responsible for the design and maintenance of buildings, laboratories, and civil structures. In order to improve efficiency and quality, the FEAB has dedicated itself to establishing a data infrastructure based on Geographic Information Systems, GIs. The value of GIS was explained in an article dating back to 1980 entitled "Need for a Multipurpose Cadastre which stated, "There is a critical need for a better land-information system in the United States to improve land-conveyance procedures, furnish a basis for equitable taxation, and provide much-needed information for resource management and environmental planning." Scientists and engineers both point to GIS as the solution. What is GIS? According to most text books, Geographic Information Systems is a class of software that stores, manages, and analyzes mapable features on, above, or below the surface of the earth. GIS software is basically database management software to the management of spatial data and information. Simply put, Geographic Information Systems manage, analyze, chart, graph, and map spatial information. At the outset, I was given goals and expectations from my branch and from my mentor with regards to the further implementation of GIs. Those goals are as follows: (1) Continue the development of GIS for the underground structures. (2) Extract and export annotated data from AutoCAD drawing files and construct a database (to serve as a prototype for future work). (3) Examine existing underground record drawings to determine existing and non-existing underground tanks. Once this data was collected and analyzed, I set out on the task of creating a user-friendly database that could be assessed by all members of the branch. It was important that the database be built using programs that most employees already possess, ruling out most AutoCAD-based viewers. Therefore, I set out to create an Access database that translated onto the web using Internet

  14. Comprehensive Monitoring for Heterogeneous Geographically Distributed Storage

    Energy Technology Data Exchange (ETDEWEB)

    Ratnikova, N. [Fermilab; Karavakis, E. [CERN; Lammel, S. [Fermilab; Wildish, T. [Princeton U.

    2015-12-23

    Storage capacity at CMS Tier-1 and Tier-2 sites reached over 100 Petabytes in 2014, and will be substantially increased during Run 2 data taking. The allocation of storage for the individual users analysis data, which is not accounted as a centrally managed storage space, will be increased to up to 40%. For comprehensive tracking and monitoring of the storage utilization across all participating sites, CMS developed a space monitoring system, which provides a central view of the geographically dispersed heterogeneous storage systems. The first prototype was deployed at pilot sites in summer 2014, and has been substantially reworked since then. In this paper we discuss the functionality and our experience of system deployment and operation on the full CMS scale.

  15. Studying the making of geographical knowledge

    DEFF Research Database (Denmark)

    Adriansen, Hanne Kirstine; Madsen, Lene Møller

    2009-01-01

    The article addresses the issue of being a ‘double' insider when conducting interviews. Double insider means being an insider both in relation to one's research matter - in the authors' case the making of geographical knowledge - and in relation to one's interviewees - our colleagues. The article...... is a reflection paper in the sense that we reflect upon experiences drawn from a previous research project carried out in Danish academia. It is important that the project was situated in a Scandinavian workplace culture because this has bearings for the social, cultural, and economic situation in which knowledge...... was constructed. The authors show that being a double insider affects both the interview situation and how interviews are planned, located, and analysed. Being an insider in relation to one's interviewees gives the advantage of having a shared history and a close knowledge of the context, and these benefits...

  16. Host specificity in phylogenetic and geographic space.

    Science.gov (United States)

    Poulin, Robert; Krasnov, Boris R; Mouillot, David

    2011-08-01

    The measurement of host specificity goes well beyond counting how many host species can successfully be used by a parasite. In particular, specificity can be assessed with respect to how closely related the host species are, or whether a parasite exploits the same or different hosts across its entire geographic range. Recent developments in the measurement of biodiversity offer a new set of analytical tools that can be used to quantify the many aspects of host specificity. We describe here the multifaceted nature of host specificity, summarize the indices available to measure its different facets one at a time or in combination, and discuss their implications for parasite evolution and disease epidemiology. Copyright © 2011 Elsevier Ltd. All rights reserved.

  17. Clean Air Corridors: A Geographic and.

    Science.gov (United States)

    Green, Mark C; Gebhart, Kristi A

    1997-03-01

    Meteorological factors, pollutant emissions, and geographic regions related to transport of low optical extinction coefficient air to Grand Canyon National Park were examined. Back trajectories were generated by two models, the Atmospheric Transport and Dispersion Model (ATAD) and an approach using the Nested Grid Model output for a Lagrangian particle transport model (NGM/ CAPITA). Meteorological information along the trajectories was analyzed for its relationship to visibility at the Grand Canyon. Case studies considered days with anomalously clean air from the southwest and dirty air from the northwest. Clean air was most frequently from the north and northwest, rarely from the south. Low emissions, high ventilation and washout by precipitation was associated with clean air. All clean days with transport from the Los Angeles area had upper-level low pressure over the region with high ventilation and usually abundant precipitation. The dirtiest days with transport from the northwest were affected by forest fires.

  18. A situated knowledge representation of geographical information

    Energy Technology Data Exchange (ETDEWEB)

    Gahegan, Mark N.; Pike, William A.

    2006-11-01

    In this paper we present an approach to conceiving of, constructing and comparing the concepts developed and used by geographers, environmental scientists and other earth science researchers to help describe, analyze and ultimately understand their subject of study. Our approach is informed by the situations under which concepts are conceived and applied, captures details of their construction, use and evolution and supports their ultimate sharing along with the means for deep exploration of conceptual similarities and differences that may arise among a distributed network of researchers. The intent here is to support different perspectives onto GIS resources that researchers may legitimately take, and to capture and compute with aspects of epistemology, to complement the ontologies that are currently receiving much attention in the GIScience community.

  19. The geographical distribution of Q fever.

    Science.gov (United States)

    KAPLAN, M M; BERTAGNA, P

    1955-01-01

    The results of a WHO-assisted survey of the distribution of Q fever in 32 countries and an analysis of reports published to date indicate that Q fever exists in 51 countries on five continents. Q-fever infection was most often reported in man and the domestic ruminants, such as cattle, sheep, and goats.The disease was found to exist in most countries where investigations were carried out. Notable exceptions were Ireland, the Netherlands, New Zealand, Poland, and the Scandinavian countries. With the exception of Poland, where the results were inconclusive, all these countries import relatively few domestic ruminants-the most important animal reservoirs of human Q-fever infection. It seems, therefore, that the traffic of infected ruminants may be one of the most important, if not the most important, means for the geographical spread of Q fever. The importance, if any, of ticks associated with such traffic needs to be defined.

  20. Evaluering

    DEFF Research Database (Denmark)

    Wahlgren, Bjarne; Andersen, Michael; Wandall, Jakob

    Idéen til denne bog opstod i forbindelse med undervisningen i evaluering af uddannelse på Århus Universitet. Vi oplevede, at der nok fandtes megen litteratur om evaluering, både på dansk og især engelsk, men ikke meget litteratur, der på en overskuelig og dækkende måde kunne bruges som indføring i...... det brede felt, som bogen dækker. En del dansksproget evalueringslitteratur drejer sig om programevaluering på et mere generelt niveau. Meget af denne litteratur har fokus på offentlig virksomhed, men kun i mindre grad på uddannelse. En del litteratur omhandler evaluering af undervisning, en del har...... fokus på elevers læring. Vi har villet skrive en bog, der dækker hele feltet: Evaluering af læring, undervisning og uddannelse. Vi har med bogen villet skabe overblik over dette omfattende felt, som udvikler sig i mange retninger. Dette ud fra den opfattelse, at evaluering får stadig større betydning...