WorldWideScience

Sample records for genotype imputation methods

  1. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Comparison of three boosting methods in parent-offspring trios for genotype imputation using simulation study

    Abbas Mikhchi

    2016-01-01

    Full Text Available Abstract Background Genotype imputation is an important process of predicting unknown genotypes, which uses reference population with dense genotypes to predict missing genotypes for both human and animal genetic variations at a low cost. Machine learning methods specially boosting methods have been used in genetic studies to explore the underlying genetic profile of disease and build models capable of predicting missing values of a marker. Methods In this study strategies and factors affecting the imputation accuracy of parent-offspring trios compared from lower-density SNP panels (5 K to high density (10 K SNP panel using three different Boosting methods namely TotalBoost (TB, LogitBoost (LB and AdaBoost (AB. The methods employed using simulated data to impute the un-typed SNPs in parent-offspring trios. Four different datasets of G1 (100 trios with 5 k SNPs, G2 (100 trios with 10 k SNPs, G3 (500 trios with 5 k SNPs, and G4 (500 trio with 10 k SNPs were simulated. In four datasets all parents were genotyped completely, and offspring genotyped with a lower density panel. Results Comparison of the three methods for imputation showed that the LB outperformed AB and TB for imputation accuracy. The time of computation were different between methods. The AB was the fastest algorithm. The higher SNP densities resulted the increase of the accuracy of imputation. Larger trios (i.e. 500 was better for performance of LB and TB. Conclusions The conclusion is that the three methods do well in terms of imputation accuracy also the dense chip is recommended for imputation of parent-offspring trios.

  3. LinkImputeR: user-guided genotype calling and imputation for non-model organisms.

    Money, Daniel; Migicovsky, Zoë; Gardner, Kyle; Myles, Sean

    2017-07-10

    Genomic studies such as genome-wide association and genomic selection require genome-wide genotype data. All existing technologies used to create these data result in missing genotypes, which are often then inferred using genotype imputation software. However, existing imputation methods most often make use only of genotypes that are successfully inferred after having passed a certain read depth threshold. Because of this, any read information for genotypes that did not pass the threshold, and were thus set to missing, is ignored. Most genomic studies also choose read depth thresholds and quality filters without investigating their effects on the size and quality of the resulting genotype data. Moreover, almost all genotype imputation methods require ordered markers and are therefore of limited utility in non-model organisms. Here we introduce LinkImputeR, a software program that exploits the read count information that is normally ignored, and makes use of all available DNA sequence information for the purposes of genotype calling and imputation. It is specifically designed for non-model organisms since it requires neither ordered markers nor a reference panel of genotypes. Using next-generation DNA sequence (NGS) data from apple, cannabis and grape, we quantify the effect of varying read count and missingness thresholds on the quantity and quality of genotypes generated from LinkImputeR. We demonstrate that LinkImputeR can increase the number of genotype calls by more than an order of magnitude, can improve genotyping accuracy by several percent and can thus improve the power of downstream analyses. Moreover, we show that the effects of quality and read depth filters can differ substantially between data sets and should therefore be investigated on a per-study basis. By exploiting DNA sequence data that is normally ignored during genotype calling and imputation, LinkImputeR can significantly improve both the quantity and quality of genotype data generated from

  4. Comparison of different methods for imputing genome-wide marker genotypes in Swedish and Finnish Red Cattle

    Ma, Peipei; Brøndum, Rasmus Froberg; Qin, Zahng

    2013-01-01

    This study investigated the imputation accuracy of different methods, considering both the minor allele frequency and relatedness between individuals in the reference and test data sets. Two data sets from the combined population of Swedish and Finnish Red Cattle were used to test the influence...... coefficient was lower when the minor allele frequency was lower. The results indicate that Beagle and IMPUTE2 provide the most robust and accurate imputation accuracies, but considering computing time and memory usage, FImpute is another alternative method....

  5. Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS Data.

    Ariel W Chan

    Full Text Available Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS methods, such as Genotyping-By-Sequencing (GBS, offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1 can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2 are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'. We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and

  6. Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.

    Chan, Ariel W; Hamblin, Martha T; Jannink, Jean-Luc

    2016-01-01

    Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the

  7. Assessing accuracy of genotype imputation in American Indians.

    Alka Malhotra

    Full Text Available Genotype imputation is commonly used in genetic association studies to test untyped variants using information on linkage disequilibrium (LD with typed markers. Imputing genotypes requires a suitable reference population in which the LD pattern is known, most often one selected from HapMap. However, some populations, such as American Indians, are not represented in HapMap. In the present study, we assessed accuracy of imputation using HapMap reference populations in a genome-wide association study in Pima Indians.Data from six randomly selected chromosomes were used. Genotypes in the study population were masked (either 1% or 20% of SNPs available for a given chromosome. The masked genotypes were then imputed using the software Markov Chain Haplotyping Algorithm. Using four HapMap reference populations, average genotype error rates ranged from 7.86% for Mexican Americans to 22.30% for Yoruba. In contrast, use of the original Pima Indian data as a reference resulted in an average error rate of 1.73%.Our results suggest that the use of HapMap reference populations results in substantial inaccuracy in the imputation of genotypes in American Indians. A possible solution would be to densely genotype or sequence a reference American Indian population.

  8. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

    Ward Judson A

    2013-01-01

    Full Text Available Abstract Background Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry. Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Results Genotyping by Sequencing (GBS was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs linked these results to published maps for cross-validation and map comparison. Conclusions GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation

  9. R package imputeTestbench to compare imputations methods for univariate time series

    Bokde, Neeraj; Kulat, Kishore; Beck, Marcus W; Asencio-Cortés, Gualberto

    2016-01-01

    This paper describes the R package imputeTestbench that provides a testbench for comparing imputation methods for missing data in univariate time series. The imputeTestbench package can be used to simulate the amount and type of missing data in a complete dataset and compare filled data using different imputation methods. The user has the option to simulate missing data by removing observations completely at random or in blocks of different sizes. Several default imputation methods are includ...

  10. A spatial haplotype copying model with applications to genotype imputation.

    Yang, Wen-Yun; Hormozdiari, Farhad; Eskin, Eleazar; Pasaniuc, Bogdan

    2015-05-01

    Ever since its introduction, the haplotype copy model has proven to be one of the most successful approaches for modeling genetic variation in human populations, with applications ranging from ancestry inference to genotype phasing and imputation. Motivated by coalescent theory, this approach assumes that any chromosome (haplotype) can be modeled as a mosaic of segments copied from a set of chromosomes sampled from the same population. At the core of the model is the assumption that any chromosome from the sample is equally likely to contribute a priori to the copying process. Motivated by recent works that model genetic variation in a geographic continuum, we propose a new spatial-aware haplotype copy model that jointly models geography and the haplotype copying process. We extend hidden Markov models of haplotype diversity such that at any given location, haplotypes that are closest in the genetic-geographic continuum map are a priori more likely to contribute to the copying process than distant ones. Through simulations starting from the 1000 Genomes data, we show that our model achieves superior accuracy in genotype imputation over the standard spatial-unaware haplotype copy model. In addition, we show the utility of our model in selecting a small personalized reference panel for imputation that leads to both improved accuracy as well as to a lower computational runtime than the standard approach. Finally, we show our proposed model can be used to localize individuals on the genetic-geographical map on the basis of their genotype data.

  11. Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation

    Wang, Chaolong; Zhan, Xiaowei; Liang, Liming; Abecasis, Gonçalo R.; Lin, Xihong

    2015-01-01

    Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources. PMID:26027497

  12. Imputation of genotypes in Danish two-way crossbred pigs using low density panels

    Xiang, Tao; Christensen, Ole Fredslund; Legarra, Andres

    Genotype imputation is commonly used as an initial step of genomic selection. Studies on humans, plants and ruminants suggested many factors would affect the performance of imputation. However, studies rarely investigated pigs, especially crossbred pigs. In this study, different scenarios...... of imputation from 5K SNPs to 7K SNPs on Danish Landrace, Yorkshire, and crossbred Landrace-Yorkshire were compared. In conclusion, genotype imputation on crossbreds performs equally well as in purebreds, when parental breeds are used as the reference panel. When the size of reference is considerably large...... SNPs. This dataset will be analyzed for genomic selection in a future study...

  13. Performance of genotype imputation for low frequency and rare variants from the 1000 genomes.

    Zheng, Hou-Feng; Rong, Jing-Jing; Liu, Ming; Han, Fang; Zhang, Xing-Wei; Richards, J Brent; Wang, Li

    2015-01-01

    Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF ≤ 0.3%), only 0-1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute.

  14. Imputation of microsatellite alleles from dense SNP genotypes for parental verification

    Matthew eMcclure

    2012-08-01

    Full Text Available Microsatellite (MS markers have recently been used for parental verification and are still the international standard despite higher cost, error rate, and turnaround time compared with Single Nucleotide Polymorphisms (SNP-based assays. Despite domestic and international interest from producers and research communities, no viable means currently exist to verify parentage for an individual unless all familial connections were analyzed using the same DNA marker type (MS or SNP. A simple and cost-effective method was devised to impute MS alleles from SNP haplotypes within breeds. For some MS, imputation results may allow inference across breeds. A total of 347 dairy cattle representing 4 dairy breeds (Brown Swiss, Guernsey, Holstein, and Jersey were used to generate reference haplotypes. This approach has been verified (>98% accurate for imputing the International Society of Animal Genetics (ISAG recommended panel of 12 MS for cattle parentage verification across a validation set of 1,307 dairy animals.. Implementation of this method will allow producers and breed associations to transition to SNP-based parentage verification utilizing MS genotypes from historical data on parents where SNP genotypes are missing. This approach may be applicable to additional cattle breeds and other species that wish to migrate from MS- to SNP- based parental verification.

  15. Imputation of missing genotypes within LD-blocks relying on the basic coalescent and beyond: consideration of population growth and structure.

    Kabisch, Maria; Hamann, Ute; Lorenzo Bermejo, Justo

    2017-10-17

    Genotypes not directly measured in genetic studies are often imputed to improve statistical power and to increase mapping resolution. The accuracy of standard imputation techniques strongly depends on the similarity of linkage disequilibrium (LD) patterns in the study and reference populations. Here we develop a novel approach for genotype imputation in low-recombination regions that relies on the coalescent and permits to explicitly account for population demographic factors. To test the new method, study and reference haplotypes were simulated and gene trees were inferred under the basic coalescent and also considering population growth and structure. The reference haplotypes that first coalesced with study haplotypes were used as templates for genotype imputation. Computer simulations were complemented with the analysis of real data. Genotype concordance rates were used to compare the accuracies of coalescent-based and standard (IMPUTE2) imputation. Simulations revealed that, in LD-blocks, imputation accuracy relying on the basic coalescent was higher and less variable than with IMPUTE2. Explicit consideration of population growth and structure, even if present, did not practically improve accuracy. The advantage of coalescent-based over standard imputation increased with the minor allele frequency and it decreased with population stratification. Results based on real data indicated that, even in low-recombination regions, further research is needed to incorporate recombination in coalescence inference, in particular for studies with genetically diverse and admixed individuals. To exploit the full potential of coalescent-based methods for the imputation of missing genotypes in genetic studies, further methodological research is needed to reduce computer time, to take into account recombination, and to implement these methods in user-friendly computer programs. Here we provide reproducible code which takes advantage of publicly available software to facilitate

  16. Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels

    Xiaoyi eGao

    2012-06-01

    Full Text Available Genotype imputation is a vital tool in genome-wide association studies (GWAS and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for reference panels. Multiple GWAS and meta-analyses are targeting Latinos, the most populous and fastest growing minority group in the US. However, genotype imputation resources for Latinos are rather limited compared to individuals of European ancestry at present, largely because of the lack of good reference data. One choice of reference panel for Latinos is one derived from the population of Mexican individuals in Los Angeles contained in the HapMap Phase 3 project and the 1000 Genomes Project. However, a detailed evaluation of the quality of the imputed genotypes derived from the public reference panels has not yet been reported. Using simulation studies, the Illumina OmniExpress GWAS data from the Los Angles Latino Eye Study and the MACH software package, we evaluated the accuracy of genotype imputation in Latinos. Our results show that the 1000 Genomes Project AMR+CEU+YRI reference panel provides the highest imputation accuracy for Latinos, and that also including Asian samples in the panel can reduce imputation accuracy. We also provide the imputation accuracy for each autosomal chromosome using the 1000 Genomes Project panel for Latinos. Our results serve as a guide to future imputation-based analysis in Latinos.

  17. Multi-generational imputation of single nucleotide polymorphism marker genotypes and accuracy of genomic selection.

    Toghiani, S; Aggrey, S E; Rekaya, R

    2016-07-01

    Availability of high-density single nucleotide polymorphism (SNP) genotyping platforms provided unprecedented opportunities to enhance breeding programmes in livestock, poultry and plant species, and to better understand the genetic basis of complex traits. Using this genomic information, genomic breeding values (GEBVs), which are more accurate than conventional breeding values. The superiority of genomic selection is possible only when high-density SNP panels are used to track genes and QTLs affecting the trait. Unfortunately, even with the continuous decrease in genotyping costs, only a small fraction of the population has been genotyped with these high-density panels. It is often the case that a larger portion of the population is genotyped with low-density and low-cost SNP panels and then imputed to a higher density. Accuracy of SNP genotype imputation tends to be high when minimum requirements are met. Nevertheless, a certain rate of genotype imputation errors is unavoidable. Thus, it is reasonable to assume that the accuracy of GEBVs will be affected by imputation errors; especially, their cumulative effects over time. To evaluate the impact of multi-generational selection on the accuracy of SNP genotypes imputation and the reliability of resulting GEBVs, a simulation was carried out under varying updating of the reference population, distance between the reference and testing sets, and the approach used for the estimation of GEBVs. Using fixed reference populations, imputation accuracy decayed by about 0.5% per generation. In fact, after 25 generations, the accuracy was only 7% lower than the first generation. When the reference population was updated by either 1% or 5% of the top animals in the previous generations, decay of imputation accuracy was substantially reduced. These results indicate that low-density panels are useful, especially when the generational interval between reference and testing population is small. As the generational interval

  18. Improving accuracy of genomic prediction in Brangus cattle by adding animals with imputed low-density SNP genotypes.

    Lopes, F B; Wu, X-L; Li, H; Xu, J; Perkins, T; Genho, J; Ferretti, R; Tait, R G; Bauck, S; Rosa, G J M

    2018-02-01

    Reliable genomic prediction of breeding values for quantitative traits requires the availability of sufficient number of animals with genotypes and phenotypes in the training set. As of 31 October 2016, there were 3,797 Brangus animals with genotypes and phenotypes. These Brangus animals were genotyped using different commercial SNP chips. Of them, the largest group consisted of 1,535 animals genotyped by the GGP-LDV4 SNP chip. The remaining 2,262 genotypes were imputed to the SNP content of the GGP-LDV4 chip, so that the number of animals available for training the genomic prediction models was more than doubled. The present study showed that the pooling of animals with both original or imputed 40K SNP genotypes substantially increased genomic prediction accuracies on the ten traits. By supplementing imputed genotypes, the relative gains in genomic prediction accuracies on estimated breeding values (EBV) were from 12.60% to 31.27%, and the relative gain in genomic prediction accuracies on de-regressed EBV was slightly small (i.e. 0.87%-18.75%). The present study also compared the performance of five genomic prediction models and two cross-validation methods. The five genomic models predicted EBV and de-regressed EBV of the ten traits similarly well. Of the two cross-validation methods, leave-one-out cross-validation maximized the number of animals at the stage of training for genomic prediction. Genomic prediction accuracy (GPA) on the ten quantitative traits was validated in 1,106 newly genotyped Brangus animals based on the SNP effects estimated in the previous set of 3,797 Brangus animals, and they were slightly lower than GPA in the original data. The present study was the first to leverage currently available genotype and phenotype resources in order to harness genomic prediction in Brangus beef cattle. © 2018 Blackwell Verlag GmbH.

  19. Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy.

    Johnson, Eric O; Hancock, Dana B; Levy, Joshua L; Gaddis, Nathan C; Saccone, Nancy L; Bierut, Laura J; Page, Grier P

    2013-05-01

    A great promise of publicly sharing genome-wide association data is the potential to create composite sets of controls. However, studies often use different genotyping arrays, and imputation to a common set of SNPs has shown substantial bias: a problem which has no broadly applicable solution. Based on the idea that using differing genotyped SNP sets as inputs creates differential imputation errors and thus bias in the composite set of controls, we examined the degree to which each of the following occurs: (1) imputation based on the union of genotyped SNPs (i.e., SNPs available on one or more arrays) results in bias, as evidenced by spurious associations (type 1 error) between imputed genotypes and arbitrarily assigned case/control status; (2) imputation based on the intersection of genotyped SNPs (i.e., SNPs available on all arrays) does not evidence such bias; and (3) imputation quality varies by the size of the intersection of genotyped SNP sets. Imputations were conducted in European Americans and African Americans with reference to HapMap phase II and III data. Imputation based on the union of genotyped SNPs across the Illumina 1M and 550v3 arrays showed spurious associations for 0.2 % of SNPs: ~2,000 false positives per million SNPs imputed. Biases remained problematic for very similar arrays (550v1 vs. 550v3) and were substantial for dissimilar arrays (Illumina 1M vs. Affymetrix 6.0). In all instances, imputing based on the intersection of genotyped SNPs (as few as 30 % of the total SNPs genotyped) eliminated such bias while still achieving good imputation quality.

  20. The Use of Imputed Sibling Genotypes in Sibship-Based Association Analysis: On Modeling Alternatives, Power and Model Misspecification

    Minica, C.C.; Dolan, C.V.; Willemsen, G.; Vink, J.M.; Boomsma, D.I.

    2013-01-01

    When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of

  1. The multiple imputation method: a case study involving secondary data analysis.

    Walani, Salimah R; Cleland, Charles M

    2015-05-01

    To illustrate with the example of a secondary data analysis study the use of the multiple imputation method to replace missing data. Most large public datasets have missing data, which need to be handled by researchers conducting secondary data analysis studies. Multiple imputation is a technique widely used to replace missing values while preserving the sample size and sampling variability of the data. The 2004 National Sample Survey of Registered Nurses. The authors created a model to impute missing values using the chained equation method. They used imputation diagnostics procedures and conducted regression analysis of imputed data to determine the differences between the log hourly wages of internationally educated and US-educated registered nurses. The authors used multiple imputation procedures to replace missing values in a large dataset with 29,059 observations. Five multiple imputed datasets were created. Imputation diagnostics using time series and density plots showed that imputation was successful. The authors also present an example of the use of multiple imputed datasets to conduct regression analysis to answer a substantive research question. Multiple imputation is a powerful technique for imputing missing values in large datasets while preserving the sample size and variance of the data. Even though the chained equation method involves complex statistical computations, recent innovations in software and computation have made it possible for researchers to conduct this technique on large datasets. The authors recommend nurse researchers use multiple imputation methods for handling missing data to improve the statistical power and external validity of their studies.

  2. Which DTW Method Applied to Marine Univariate Time Series Imputation

    Phan , Thi-Thu-Hong; Caillault , Émilie; Lefebvre , Alain; Bigand , André

    2017-01-01

    International audience; Missing data are ubiquitous in any domains of applied sciences. Processing datasets containing missing values can lead to a loss of efficiency and unreliable results, especially for large missing sub-sequence(s). Therefore, the aim of this paper is to build a framework for filling missing values in univariate time series and to perform a comparison of different similarity metrics used for the imputation task. This allows to suggest the most suitable methods for the imp...

  3. Imputation of single nucleotide polymorhpism genotypes of Hereford cattle: reference panel size, family relationship and population structure

    The objective of this study is to investigate single nucleotide polymorphism (SNP) genotypes imputation of Hereford cattle. Purebred Herefords were from two sources, Line 1 Hereford (N=240) and representatives of Industry Herefords (N=311). Using different reference panels of 62 and 494 males with 1...

  4. Evaluating imputation algorithms for low-depth genotyping-by-sequencing (GBS) data

    Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordabl...

  5. The Ability of Different Imputation Methods to Preserve the Significant Genes and Pathways in Cancer

    Rosa Aghdam

    2017-12-01

    Full Text Available Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving significant genes and pathways. In the first step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next, 10 well-known imputation methods were applied to the complete datasets. The significance analysis of microarrays (SAM method was applied to detect the significant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving significant genes. To determine the impact of different imputation methods on the identification of important genes, the chi-squared test was used to compare the proportions of overlaps between significant genes detected from original data and those detected from the imputed datasets. Additionally, the significant genes are tested for their enrichment in important pathways, using the ConsensusPathDB. Our results showed that almost all the significant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no significant difference in the performance of various imputation methods tested. The source code and selected datasets are available on http://profiles.bs.ipm.ir/softwares/imputation_methods/.

  6. The Ability of Different Imputation Methods to Preserve the Significant Genes and Pathways in Cancer.

    Aghdam, Rosa; Baghfalaki, Taban; Khosravi, Pegah; Saberi Ansari, Elnaz

    2017-12-01

    Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving significant genes and pathways. In the first step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next, 10 well-known imputation methods were applied to the complete datasets. The significance analysis of microarrays (SAM) method was applied to detect the significant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving significant genes. To determine the impact of different imputation methods on the identification of important genes, the chi-squared test was used to compare the proportions of overlaps between significant genes detected from original data and those detected from the imputed datasets. Additionally, the significant genes are tested for their enrichment in important pathways, using the ConsensusPathDB. Our results showed that almost all the significant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no significant difference in the performance of various imputation methods tested. The source code and selected datasets are available on http://profiles.bs.ipm.ir/softwares/imputation_methods/. Copyright © 2017. Production and hosting by Elsevier B.V.

  7. Traffic Speed Data Imputation Method Based on Tensor Completion

    Bin Ran

    2015-01-01

    Full Text Available Traffic speed data plays a key role in Intelligent Transportation Systems (ITS; however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS. In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC, an efficient tensor completion method, is employed to estimate the missing traffic speed data. This proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. The proposed method is evaluated on Performance Measurement System (PeMS database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches.

  8. Traffic speed data imputation method based on tensor completion.

    Ran, Bin; Tan, Huachun; Feng, Jianshuai; Liu, Ying; Wang, Wuhong

    2015-01-01

    Traffic speed data plays a key role in Intelligent Transportation Systems (ITS); however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS). In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC), an efficient tensor completion method, is employed to estimate the missing traffic speed data. This proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. The proposed method is evaluated on Performance Measurement System (PeMS) database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches.

  9. Missing data imputation using statistical and machine learning methods in a real breast cancer problem.

    Jerez, José M; Molina, Ignacio; García-Laencina, Pedro J; Alba, Emilio; Ribelles, Nuria; Martín, Miguel; Franco, Leonardo

    2010-10-01

    Missing data imputation is an important task in cases where it is crucial to use all available data and not discard records with missing values. This work evaluates the performance of several statistical and machine learning imputation methods that were used to predict recurrence in patients in an extensive real breast cancer data set. Imputation methods based on statistical techniques, e.g., mean, hot-deck and multiple imputation, and machine learning techniques, e.g., multi-layer perceptron (MLP), self-organisation maps (SOM) and k-nearest neighbour (KNN), were applied to data collected through the "El Álamo-I" project, and the results were then compared to those obtained from the listwise deletion (LD) imputation method. The database includes demographic, therapeutic and recurrence-survival information from 3679 women with operable invasive breast cancer diagnosed in 32 different hospitals belonging to the Spanish Breast Cancer Research Group (GEICAM). The accuracies of predictions on early cancer relapse were measured using artificial neural networks (ANNs), in which different ANNs were estimated using the data sets with imputed missing values. The imputation methods based on machine learning algorithms outperformed imputation statistical methods in the prediction of patient outcome. Friedman's test revealed a significant difference (p=0.0091) in the observed area under the ROC curve (AUC) values, and the pairwise comparison test showed that the AUCs for MLP, KNN and SOM were significantly higher (p=0.0053, p=0.0048 and p=0.0071, respectively) than the AUC from the LD-based prognosis model. The methods based on machine learning techniques were the most suited for the imputation of missing values and led to a significant enhancement of prognosis accuracy compared to imputation methods based on statistical procedures. Copyright © 2010 Elsevier B.V. All rights reserved.

  10. An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.

    Liu, Yuzhe; Gopalakrishnan, Vanathi

    2017-03-01

    Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.

  11. Imputation methods for filling missing data in urban air pollution data for Malaysia

    Nur Afiqah Zakaria

    2018-06-01

    Full Text Available The air quality measurement data obtained from the continuous ambient air quality monitoring (CAAQM station usually contained missing data. The missing observations of the data usually occurred due to machine failure, routine maintenance and human error. In this study, the hourly monitoring data of CO, O3, PM10, SO2, NOx, NO2, ambient temperature and humidity were used to evaluate four imputation methods (Mean Top Bottom, Linear Regression, Multiple Imputation and Nearest Neighbour. The air pollutants observations were simulated into four percentages of simulated missing data i.e. 5%, 10%, 15% and 20%. Performance measures namely the Mean Absolute Error, Root Mean Squared Error, Coefficient of Determination and Index of Agreement were used to describe the goodness of fit of the imputation methods. From the results of the performance measures, Mean Top Bottom method was selected as the most appropriate imputation method for filling in the missing values in air pollutants data.

  12. Comparison of missing value imputation methods in time series: the case of Turkish meteorological data

    Yozgatligil, Ceylan; Aslan, Sipan; Iyigun, Cem; Batmaz, Inci

    2013-04-01

    This study aims to compare several imputation methods to complete the missing values of spatio-temporal meteorological time series. To this end, six imputation methods are assessed with respect to various criteria including accuracy, robustness, precision, and efficiency for artificially created missing data in monthly total precipitation and mean temperature series obtained from the Turkish State Meteorological Service. Of these methods, simple arithmetic average, normal ratio (NR), and NR weighted with correlations comprise the simple ones, whereas multilayer perceptron type neural network and multiple imputation strategy adopted by Monte Carlo Markov Chain based on expectation-maximization (EM-MCMC) are computationally intensive ones. In addition, we propose a modification on the EM-MCMC method. Besides using a conventional accuracy measure based on squared errors, we also suggest the correlation dimension (CD) technique of nonlinear dynamic time series analysis which takes spatio-temporal dependencies into account for evaluating imputation performances. Depending on the detailed graphical and quantitative analysis, it can be said that although computational methods, particularly EM-MCMC method, are computationally inefficient, they seem favorable for imputation of meteorological time series with respect to different missingness periods considering both measures and both series studied. To conclude, using the EM-MCMC algorithm for imputing missing values before conducting any statistical analyses of meteorological data will definitely decrease the amount of uncertainty and give more robust results. Moreover, the CD measure can be suggested for the performance evaluation of missing data imputation particularly with computational methods since it gives more precise results in meteorological time series.

  13. Candidate gene analysis using imputed genotypes: cell cycle single-nucleotide polymorphisms and ovarian cancer risk

    Goode, Ellen L; Fridley, Brooke L; Vierkant, Robert A

    2009-01-01

    Polymorphisms in genes critical to cell cycle control are outstanding candidates for association with ovarian cancer risk; numerous genes have been interrogated by multiple research groups using differing tagging single-nucleotide polymorphism (SNP) sets. To maximize information gleaned from......, and rs3212891; CDK2 rs2069391, rs2069414, and rs17528736; and CCNE1 rs3218036. These results exemplify the utility of imputation in candidate gene studies and lend evidence to a role of cell cycle genes in ovarian cancer etiology, suggest a reduced set of SNPs to target in additional cases and controls....

  14. A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods.

    Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A

    2015-12-01

    Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.

  15. Dealing with missing data in a multi-question depression scale: a comparison of imputation methods

    Stuart Heather

    2006-12-01

    Full Text Available Abstract Background Missing data present a challenge to many research projects. The problem is often pronounced in studies utilizing self-report scales, and literature addressing different strategies for dealing with missing data in such circumstances is scarce. The objective of this study was to compare six different imputation techniques for dealing with missing data in the Zung Self-reported Depression scale (SDS. Methods 1580 participants from a surgical outcomes study completed the SDS. The SDS is a 20 question scale that respondents complete by circling a value of 1 to 4 for each question. The sum of the responses is calculated and respondents are classified as exhibiting depressive symptoms when their total score is over 40. Missing values were simulated by randomly selecting questions whose values were then deleted (a missing completely at random simulation. Additionally, a missing at random and missing not at random simulation were completed. Six imputation methods were then considered; 1 multiple imputation, 2 single regression, 3 individual mean, 4 overall mean, 5 participant's preceding response, and 6 random selection of a value from 1 to 4. For each method, the imputed mean SDS score and standard deviation were compared to the population statistics. The Spearman correlation coefficient, percent misclassified and the Kappa statistic were also calculated. Results When 10% of values are missing, all the imputation methods except random selection produce Kappa statistics greater than 0.80 indicating 'near perfect' agreement. MI produces the most valid imputed values with a high Kappa statistic (0.89, although both single regression and individual mean imputation also produced favorable results. As the percent of missing information increased to 30%, or when unbalanced missing data were introduced, MI maintained a high Kappa statistic. The individual mean and single regression method produced Kappas in the 'substantial agreement' range

  16. Imputation of genotypes from low density (50,000 markers) to high density (700,000 markers) of cows from research herds in Europe, North America, and Australasia using 2 reference populations

    Pryce, J E; Johnston, J; Hayes, B J

    2014-01-01

    detection in genome-wide association studies and the accuracy of genomic selection may increase when the low-density genotypes are imputed to higher density. Genotype data were available from 10 research herds: 5 from Europe [Denmark, Germany, Ireland, the Netherlands, and the United Kingdom (UK)], 2 from...... reference populations. Although it was not possible to use a combined reference population, which would probably result in the highest accuracies of imputation, differences arising from using 2 high-density reference populations on imputing 50,000-marker genotypes of 583 animals (from the UK) were...... information exploited. The UK animals were also included in the North American data set (n = 1,579) that was imputed to high density using a reference population of 2,018 bulls. After editing, 591,213 genotypes on 5,999 animals from 10 research herds remained. The correlation between imputed allele...

  17. Statistical Analysis of a Class: Monte Carlo and Multiple Imputation Spreadsheet Methods for Estimation and Extrapolation

    Fish, Laurel J.; Halcoussis, Dennis; Phillips, G. Michael

    2017-01-01

    The Monte Carlo method and related multiple imputation methods are traditionally used in math, physics and science to estimate and analyze data and are now becoming standard tools in analyzing business and financial problems. However, few sources explain the application of the Monte Carlo method for individuals and business professionals who are…

  18. Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

    Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

    2009-01-01

    Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....

  19. Missing value imputation in DNA microarrays based on conjugate gradient method.

    Dorri, Fatemeh; Azmi, Paeiz; Dorri, Faezeh

    2012-02-01

    Analysis of gene expression profiles needs a complete matrix of gene array values; consequently, imputation methods have been suggested. In this paper, an algorithm that is based on conjugate gradient (CG) method is proposed to estimate missing values. k-nearest neighbors of the missed entry are first selected based on absolute values of their Pearson correlation coefficient. Then a subset of genes among the k-nearest neighbors is labeled as the best similar ones. CG algorithm with this subset as its input is then used to estimate the missing values. Our proposed CG based algorithm (CGimpute) is evaluated on different data sets. The results are compared with sequential local least squares (SLLSimpute), Bayesian principle component analysis (BPCAimpute), local least squares imputation (LLSimpute), iterated local least squares imputation (ILLSimpute) and adaptive k-nearest neighbors imputation (KNNKimpute) methods. The average of normalized root mean squares error (NRMSE) and relative NRMSE in different data sets with various missing rates shows CGimpute outperforms other methods. Copyright © 2011 Elsevier Ltd. All rights reserved.

  20. Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis

    Eekhout, I.; Wiel, M.A. van de; Heymans, M.W.

    2017-01-01

    Background. Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels

  1. Missing data imputation: focusing on single imputation.

    Zhang, Zhongheng

    2016-01-01

    Complete case analysis is widely used for handling missing data, and it is the default method in many statistical packages. However, this method may introduce bias and some useful information will be omitted from analysis. Therefore, many imputation methods are developed to make gap end. The present article focuses on single imputation. Imputations with mean, median and mode are simple but, like complete case analysis, can introduce bias on mean and deviation. Furthermore, they ignore relationship with other variables. Regression imputation can preserve relationship between missing values and other variables. There are many sophisticated methods exist to handle missing values in longitudinal data. This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations.

  2. Using imputed genotype data in the joint score tests for genetic association and gene-environment interactions in case-control studies.

    Song, Minsun; Wheeler, William; Caporaso, Neil E; Landi, Maria Teresa; Chatterjee, Nilanjan

    2018-03-01

    Genome-wide association studies (GWAS) are now routinely imputed for untyped single nucleotide polymorphisms (SNPs) based on various powerful statistical algorithms for imputation trained on reference datasets. The use of predicted allele counts for imputed SNPs as the dosage variable is known to produce valid score test for genetic association. In this paper, we investigate how to best handle imputed SNPs in various modern complex tests for genetic associations incorporating gene-environment interactions. We focus on case-control association studies where inference for an underlying logistic regression model can be performed using alternative methods that rely on varying degree on an assumption of gene-environment independence in the underlying population. As increasingly large-scale GWAS are being performed through consortia effort where it is preferable to share only summary-level information across studies, we also describe simple mechanisms for implementing score tests based on standard meta-analysis of "one-step" maximum-likelihood estimates across studies. Applications of the methods in simulation studies and a dataset from GWAS of lung cancer illustrate ability of the proposed methods to maintain type-I error rates for the underlying testing procedures. For analysis of imputed SNPs, similar to typed SNPs, the retrospective methods can lead to considerable efficiency gain for modeling of gene-environment interactions under the assumption of gene-environment independence. Methods are made available for public use through CGEN R software package. © 2017 WILEY PERIODICALS, INC.

  3. Assessment of imputation methods using varying ecological information to fill the gaps in a tree functional trait database

    Poyatos, Rafael; Sus, Oliver; Vilà-Cabrera, Albert; Vayreda, Jordi; Badiella, Llorenç; Mencuccini, Maurizio; Martínez-Vilalta, Jordi

    2016-04-01

    Plant functional traits are increasingly being used in ecosystem ecology thanks to the growing availability of large ecological databases. However, these databases usually contain a large fraction of missing data because measuring plant functional traits systematically is labour-intensive and because most databases are compilations of datasets with different sampling designs. As a result, within a given database, there is an inevitable variability in the number of traits available for each data entry and/or the species coverage in a given geographical area. The presence of missing data may severely bias trait-based analyses, such as the quantification of trait covariation or trait-environment relationships and may hamper efforts towards trait-based modelling of ecosystem biogeochemical cycles. Several data imputation (i.e. gap-filling) methods have been recently tested on compiled functional trait databases, but the performance of imputation methods applied to a functional trait database with a regular spatial sampling has not been thoroughly studied. Here, we assess the effects of data imputation on five tree functional traits (leaf biomass to sapwood area ratio, foliar nitrogen, maximum height, specific leaf area and wood density) in the Ecological and Forest Inventory of Catalonia, an extensive spatial database (covering 31900 km2). We tested the performance of species mean imputation, single imputation by the k-nearest neighbors algorithm (kNN) and a multiple imputation method, Multivariate Imputation with Chained Equations (MICE) at different levels of missing data (10%, 30%, 50%, and 80%). We also assessed the changes in imputation performance when additional predictors (species identity, climate, forest structure, spatial structure) were added in kNN and MICE imputations. We evaluated the imputed datasets using a battery of indexes describing departure from the complete dataset in trait distribution, in the mean prediction error, in the correlation matrix

  4. Comparison of different Methods for Univariate Time Series Imputation in R

    Moritz, Steffen; Sardá, Alexis; Bartz-Beielstein, Thomas; Zaefferer, Martin; Stork, Jörg

    2015-01-01

    Missing values in datasets are a well-known problem and there are quite a lot of R packages offering imputation functions. But while imputation in general is well covered within R, it is hard to find functions for imputation of univariate time series. The problem is, most standard imputation techniques can not be applied directly. Most algorithms rely on inter-attribute correlations, while univariate time series imputation needs to employ time dependencies. This paper provides an overview of ...

  5. Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information

    Poyatos, Rafael; Sus, Oliver; Badiella, Llorenç; Mencuccini, Maurizio; Martínez-Vilalta, Jordi

    2018-05-01

    The ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. At the same time, they offer specific challenges in terms of data imputation. Here we compare statistical imputation approaches, using varying levels of environmental information, for five plant traits (leaf biomass to sapwood area ratio, leaf nitrogen content, maximum tree height, leaf mass per area and wood density) in a spatially explicit plant trait dataset of temperate and Mediterranean tree species (Ecological and Forest Inventory of Catalonia, IEFC, dataset for Catalonia, north-east Iberian Peninsula, 31 900 km2). We simulated gaps at different missingness levels (10-80 %) in a complete trait matrix, and we used overall trait means, species means, k nearest neighbours (kNN), ordinary and regression kriging, and multivariate imputation using chained equations (MICE) to impute missing trait values. We assessed these methods in terms of their accuracy and of their ability to preserve trait distributions, multi-trait correlation structure and bivariate trait relationships. The relatively good performance of mean and species mean imputations in terms of accuracy masked a poor representation of trait distributions and multivariate trait structure. Species identity improved MICE imputations for all traits, whereas forest structure and topography improved imputations for some traits. No method performed best consistently for the five studied traits, but, considering all traits and performance metrics, MICE informed by relevant ecological variables gave the best results. However, at higher missingness (> 30 %), species mean imputations and regression kriging tended to outperform MICE for some traits. MICE informed by relevant ecological variables allowed us to fill the gaps in

  6. Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information

    R. Poyatos

    2018-05-01

    Full Text Available The ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. At the same time, they offer specific challenges in terms of data imputation. Here we compare statistical imputation approaches, using varying levels of environmental information, for five plant traits (leaf biomass to sapwood area ratio, leaf nitrogen content, maximum tree height, leaf mass per area and wood density in a spatially explicit plant trait dataset of temperate and Mediterranean tree species (Ecological and Forest Inventory of Catalonia, IEFC, dataset for Catalonia, north-east Iberian Peninsula, 31 900 km2. We simulated gaps at different missingness levels (10–80 % in a complete trait matrix, and we used overall trait means, species means, k nearest neighbours (kNN, ordinary and regression kriging, and multivariate imputation using chained equations (MICE to impute missing trait values. We assessed these methods in terms of their accuracy and of their ability to preserve trait distributions, multi-trait correlation structure and bivariate trait relationships. The relatively good performance of mean and species mean imputations in terms of accuracy masked a poor representation of trait distributions and multivariate trait structure. Species identity improved MICE imputations for all traits, whereas forest structure and topography improved imputations for some traits. No method performed best consistently for the five studied traits, but, considering all traits and performance metrics, MICE informed by relevant ecological variables gave the best results. However, at higher missingness (> 30 %, species mean imputations and regression kriging tended to outperform MICE for some traits. MICE informed by relevant ecological variables

  7. Use of Multiple Imputation Method to Improve Estimation of Missing Baseline Serum Creatinine in Acute Kidney Injury Research

    Peterson, Josh F.; Eden, Svetlana K.; Moons, Karel G.; Ikizler, T. Alp; Matheny, Michael E.

    2013-01-01

    Summary Background and objectives Baseline creatinine (BCr) is frequently missing in AKI studies. Common surrogate estimates can misclassify AKI and adversely affect the study of related outcomes. This study examined whether multiple imputation improved accuracy of estimating missing BCr beyond current recommendations to apply assumed estimated GFR (eGFR) of 75 ml/min per 1.73 m2 (eGFR 75). Design, setting, participants, & measurements From 41,114 unique adult admissions (13,003 with and 28,111 without BCr data) at Vanderbilt University Hospital between 2006 and 2008, a propensity score model was developed to predict likelihood of missing BCr. Propensity scoring identified 6502 patients with highest likelihood of missing BCr among 13,003 patients with known BCr to simulate a “missing” data scenario while preserving actual reference BCr. Within this cohort (n=6502), the ability of various multiple-imputation approaches to estimate BCr and classify AKI were compared with that of eGFR 75. Results All multiple-imputation methods except the basic one more closely approximated actual BCr than did eGFR 75. Total AKI misclassification was lower with multiple imputation (full multiple imputation + serum creatinine) (9.0%) than with eGFR 75 (12.3%; Pcreatinine) (15.3%) versus eGFR 75 (40.5%; P<0.001). Multiple imputation improved specificity and positive predictive value for detecting AKI at the expense of modestly decreasing sensitivity relative to eGFR 75. Conclusions Multiple imputation can improve accuracy in estimating missing BCr and reduce misclassification of AKI beyond currently proposed methods. PMID:23037980

  8. Evaluation and application of summary statistic imputation to discover new height-associated loci.

    Rüeger, Sina; McDaid, Aaron; Kutalik, Zoltán

    2018-05-01

    As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian

  9. Randomly and Non-Randomly Missing Renal Function Data in the Strong Heart Study: A Comparison of Imputation Methods.

    Nawar Shara

    Full Text Available Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS. Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989-1991, 2 (1993-1995, and 3 (1998-1999 was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results.

  10. Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes

    Lotz Meredith J

    2008-01-01

    Full Text Available Abstract Background Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method has its own advantages, but the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of eight current imputation methods on multiple types of microarray experiments, including time series, multiple exposures, and multiple exposures × time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data set. Results We found that the optimal imputation algorithms (LSA, LLS, and BPCA are all highly competitive with each other, and that no method is uniformly superior in all the data sets we examined. The success of each method can also depend on the underlying "complexity" of the expression data, where we take complexity to indicate the difficulty in mapping the gene expression matrix to a lower-dimensional subspace. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS scheme is useful for selecting an appropriate imputation algorithm. We further propose a simulation-based self-training selection (STS scheme. This technique has been used previously for microarray data imputation, but for different purposes. The scheme selects the optimal or near-optimal method with high accuracy but at an increased computational cost. Conclusion Our findings provide insight into the problem of which imputation method is optimal for a given data set. Three top-performing methods (LSA, LLS and BPCA are competitive with each other. Global-based imputation methods (PLS, SVD, BPCA

  11. Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes.

    Brock, Guy N; Shaffer, John R; Blakesley, Richard E; Lotz, Meredith J; Tseng, George C

    2008-01-10

    Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method has its own advantages, but the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of eight current imputation methods on multiple types of microarray experiments, including time series, multiple exposures, and multiple exposures x time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data set. We found that the optimal imputation algorithms (LSA, LLS, and BPCA) are all highly competitive with each other, and that no method is uniformly superior in all the data sets we examined. The success of each method can also depend on the underlying "complexity" of the expression data, where we take complexity to indicate the difficulty in mapping the gene expression matrix to a lower-dimensional subspace. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS) scheme is useful for selecting an appropriate imputation algorithm. We further propose a simulation-based self-training selection (STS) scheme. This technique has been used previously for microarray data imputation, but for different purposes. The scheme selects the optimal or near-optimal method with high accuracy but at an increased computational cost. Our findings provide insight into the problem of which imputation method is optimal for a given data set. Three top-performing methods (LSA, LLS and BPCA) are competitive with each other. Global-based imputation methods (PLS, SVD, BPCA) performed better on mcroarray data with lower complexity

  12. A Nonparametric, Multiple Imputation-Based Method for the Retrospective Integration of Data Sets

    Carrig, Madeline M.; Manrique-Vallier, Daniel; Ranby, Krista W.; Reiter, Jerome P.; Hoyle, Rick H.

    2015-01-01

    Complex research questions often cannot be addressed adequately with a single data set. One sensible alternative to the high cost and effort associated with the creation of large new data sets is to combine existing data sets containing variables related to the constructs of interest. The goal of the present research was to develop a flexible, broadly applicable approach to the integration of disparate data sets that is based on nonparametric multiple imputation and the collection of data from a convenient, de novo calibration sample. We demonstrate proof of concept for the approach by integrating three existing data sets containing items related to the extent of problematic alcohol use and associations with deviant peers. We discuss both necessary conditions for the approach to work well and potential strengths and weaknesses of the method compared to other data set integration approaches. PMID:26257437

  13. A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method.

    Yang, Jun-He; Cheng, Ching-Hsue; Chan, Chia-Pan

    2017-01-01

    Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir's water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir's water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.

  14. A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method

    Jun-He Yang

    2017-01-01

    Full Text Available Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir’s water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir’s water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.

  15. Missing in space: an evaluation of imputation methods for missing data in spatial analysis of risk factors for type II diabetes.

    Baker, Jannah; White, Nicole; Mengersen, Kerrie

    2014-11-20

    Spatial analysis is increasingly important for identifying modifiable geographic risk factors for disease. However, spatial health data from surveys are often incomplete, ranging from missing data for only a few variables, to missing data for many variables. For spatial analyses of health outcomes, selection of an appropriate imputation method is critical in order to produce the most accurate inferences. We present a cross-validation approach to select between three imputation methods for health survey data with correlated lifestyle covariates, using as a case study, type II diabetes mellitus (DM II) risk across 71 Queensland Local Government Areas (LGAs). We compare the accuracy of mean imputation to imputation using multivariate normal and conditional autoregressive prior distributions. Choice of imputation method depends upon the application and is not necessarily the most complex method. Mean imputation was selected as the most accurate method in this application. Selecting an appropriate imputation method for health survey data, after accounting for spatial correlation and correlation between covariates, allows more complete analysis of geographic risk factors for disease with more confidence in the results to inform public policy decision-making.

  16. Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies

    McElwee Joshua

    2009-06-01

    Full Text Available Abstract Background Although high-throughput genotyping arrays have made whole-genome association studies (WGAS feasible, only a small proportion of SNPs in the human genome are actually surveyed in such studies. In addition, various SNP arrays assay different sets of SNPs, which leads to challenges in comparing results and merging data for meta-analyses. Genome-wide imputation of untyped markers allows us to address these issues in a direct fashion. Methods 384 Caucasian American liver donors were genotyped using Illumina 650Y (Ilmn650Y arrays, from which we also derived genotypes from the Ilmn317K array. On these data, we compared two imputation methods: MACH and BEAGLE. We imputed 2.5 million HapMap Release22 SNPs, and conducted GWAS on ~40,000 liver mRNA expression traits (eQTL analysis. In addition, 200 Caucasian American and 200 African American subjects were genotyped using the Affymetrix 500 K array plus a custom 164 K fill-in chip. We then imputed the HapMap SNPs and quantified the accuracy by randomly masking observed SNPs. Results MACH and BEAGLE perform similarly with respect to imputation accuracy. The Ilmn650Y results in excellent imputation performance, and it outperforms Affx500K or Ilmn317K sets. For Caucasian Americans, 90% of the HapMap SNPs were imputed at 98% accuracy. As expected, imputation of poorly tagged SNPs (untyped SNPs in weak LD with typed markers was not as successful. It was more challenging to impute genotypes in the African American population, given (1 shorter LD blocks and (2 admixture with Caucasian populations in this population. To address issue (2, we pooled HapMap CEU and YRI data as an imputation reference set, which greatly improved overall performance. The approximate 40,000 phenotypes scored in these populations provide a path to determine empirically how the power to detect associations is affected by the imputation procedures. That is, at a fixed false discovery rate, the number of cis

  17. Handling missing data for the identification of charged particles in a multilayer detector: A comparison between different imputation methods

    Riggi, S., E-mail: sriggi@oact.inaf.it [INAF - Osservatorio Astrofisico di Catania (Italy); Riggi, D. [Keras Strategy - Milano (Italy); Riggi, F. [Dipartimento di Fisica e Astronomia - Università di Catania (Italy); INFN, Sezione di Catania (Italy)

    2015-04-21

    Identification of charged particles in a multilayer detector by the energy loss technique may also be achieved by the use of a neural network. The performance of the network becomes worse when a large fraction of information is missing, for instance due to detector inefficiencies. Algorithms which provide a way to impute missing information have been developed over the past years. Among the various approaches, we focused on normal mixtures’ models in comparison with standard mean imputation and multiple imputation methods. Further, to account for the intrinsic asymmetry of the energy loss data, we considered skew-normal mixture models and provided a closed form implementation in the Expectation-Maximization (EM) algorithm framework to handle missing patterns. The method has been applied to a test case where the energy losses of pions, kaons and protons in a six-layers’ Silicon detector are considered as input neurons to a neural network. Results are given in terms of reconstruction efficiency and purity of the various species in different momentum bins.

  18. Multiple imputation strategies for zero-inflated cost data in economic evaluations : which method works best?

    MacNeil Vroomen, Janet; Eekhout, Iris; Dijkgraaf, Marcel G; van Hout, Hein; de Rooij, Sophia E; Heymans, Martijn W; Bosmans, Judith E

    2016-01-01

    Cost and effect data often have missing data because economic evaluations are frequently added onto clinical studies where cost data are rarely the primary outcome. The objective of this article was to investigate which multiple imputation strategy is most appropriate to use for missing

  19. ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

    Kamatani Naoyuki

    2011-05-01

    Full Text Available Abstract Background Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required. Results We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo. Conclusions ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.

  20. An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency.

    Guo, Wei-Li; Huang, De-Shuang

    2017-08-22

    Transcription factors (TFs) are DNA-binding proteins that have a central role in regulating gene expression. Identification of DNA-binding sites of TFs is a key task in understanding transcriptional regulation, cellular processes and disease. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) enables genome-wide identification of in vivo TF binding sites. However, it is still difficult to map every TF in every cell line owing to cost and biological material availability, which poses an enormous obstacle for integrated analysis of gene regulation. To address this problem, we propose a novel computational approach, TFBSImpute, for predicting additional TF binding profiles by leveraging information from available ChIP-seq TF binding data. TFBSImpute fuses the dataset to a 3-mode tensor and imputes missing TF binding signals via simultaneous completion of multiple TF binding matrices with positional consistency. We show that signals predicted by our method achieve overall similarity with experimental data and that TFBSImpute significantly outperforms baseline approaches, by assessing the performance of imputation methods against observed ChIP-seq TF binding profiles. Besides, motif analysis shows that TFBSImpute preforms better in capturing binding motifs enriched in observed data compared with baselines, indicating that the higher performance of TFBSImpute is not simply due to averaging related samples. We anticipate that our approach will constitute a useful complement to experimental mapping of TF binding, which is beneficial for further study of regulation mechanisms and disease.

  1. Improving accuracy of rare variant imputation with a two-step imputation approach

    Kreiner-Møller, Eskil; Medina-Gomez, Carolina; Uitterlinden, André G

    2015-01-01

    not being comprehensively scrutinized. Next-generation arrays ensuring sufficient coverage together with new reference panels, as the 1000 Genomes panel, are emerging to facilitate imputation of low frequent single-nucleotide polymorphisms (minor allele frequency (MAF) ... reference sample genotyped on a dense array and hereafter to the 1000 Genomes reference panel. We show that mean imputation quality, measured by the r(2) using this approach, increases by 28% for variants with a MAF between 1 and 5% as compared with direct imputation to 1000 Genomes reference. Similarly......Genotype imputation has been the pillar of the success of genome-wide association studies (GWAS) for identifying common variants associated with common diseases. However, most GWAS have been run using only 60 HapMap samples as reference for imputation, meaning less frequent and rare variants...

  2. The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based methods.

    Rui Martiniano

    2017-07-01

    Full Text Available We analyse new genomic data (0.05-2.95x from 14 ancient individuals from Portugal distributed from the Middle Neolithic (4200-3500 BC to the Middle Bronze Age (1740-1430 BC and impute genomewide diploid genotypes in these together with published ancient Eurasians. While discontinuity is evident in the transition to agriculture across the region, sensitive haplotype-based analyses suggest a significant degree of local hunter-gatherer contribution to later Iberian Neolithic populations. A more subtle genetic influx is also apparent in the Bronze Age, detectable from analyses including haplotype sharing with both ancient and modern genomes, D-statistics and Y-chromosome lineages. However, the limited nature of this introgression contrasts with the major Steppe migration turnovers within third Millennium northern Europe and echoes the survival of non-Indo-European language in Iberia. Changes in genomic estimates of individual height across Europe are also associated with these major cultural transitions, and ancestral components continue to correlate with modern differences in stature.

  3. A Maximum-Likelihood Method to Correct for Allelic Dropout in Microsatellite Data with No Replicate Genotypes

    Wang, Chaolong; Schroeder, Kari B.; Rosenberg, Noah A.

    2012-01-01

    Allelic dropout is a commonly observed source of missing data in microsatellite genotypes, in which one or both allelic copies at a locus fail to be amplified by the polymerase chain reaction. Especially for samples with poor DNA quality, this problem causes a downward bias in estimates of observed heterozygosity and an upward bias in estimates of inbreeding, owing to mistaken classifications of heterozygotes as homozygotes when one of the two copies drops out. One general approach for avoiding allelic dropout involves repeated genotyping of homozygous loci to minimize the effects of experimental error. Existing computational alternatives often require replicate genotyping as well. These approaches, however, are costly and are suitable only when enough DNA is available for repeated genotyping. In this study, we propose a maximum-likelihood approach together with an expectation-maximization algorithm to jointly estimate allelic dropout rates and allele frequencies when only one set of nonreplicated genotypes is available. Our method considers estimates of allelic dropout caused by both sample-specific factors and locus-specific factors, and it allows for deviation from Hardy–Weinberg equilibrium owing to inbreeding. Using the estimated parameters, we correct the bias in the estimation of observed heterozygosity through the use of multiple imputations of alleles in cases where dropout might have occurred. With simulated data, we show that our method can (1) effectively reproduce patterns of missing data and heterozygosity observed in real data; (2) correctly estimate model parameters, including sample-specific dropout rates, locus-specific dropout rates, and the inbreeding coefficient; and (3) successfully correct the downward bias in estimating the observed heterozygosity. We find that our method is fairly robust to violations of model assumptions caused by population structure and by genotyping errors from sources other than allelic dropout. Because the data sets

  4. Multiple imputation and its application

    Carpenter, James

    2013-01-01

    A practical guide to analysing partially observed data. Collecting, analysing and drawing inferences from data is central to research in the medical and social sciences. Unfortunately, it is rarely possible to collect all the intended data. The literature on inference from the resulting incomplete  data is now huge, and continues to grow both as methods are developed for large and complex data structures, and as increasing computer power and suitable software enable researchers to apply these methods. This book focuses on a particular statistical method for analysing and drawing inferences from incomplete data, called Multiple Imputation (MI). MI is attractive because it is both practical and widely applicable. The authors aim is to clarify the issues raised by missing data, describing the rationale for MI, the relationship between the various imputation models and associated algorithms and its application to increasingly complex data structures. Multiple Imputation and its Application: Discusses the issues ...

  5. Inclusion of Population-specific Reference Panel from India to the 1000 Genomes Phase 3 Panel Improves Imputation Accuracy.

    Ahmad, Meraj; Sinha, Anubhav; Ghosh, Sreya; Kumar, Vikrant; Davila, Sonia; Yajnik, Chittaranjan S; Chandak, Giriraj R

    2017-07-27

    Imputation is a computational method based on the principle of haplotype sharing allowing enrichment of genome-wide association study datasets. It depends on the haplotype structure of the population and density of the genotype data. The 1000 Genomes Project led to the generation of imputation reference panels which have been used globally. However, recent studies have shown that population-specific panels provide better enrichment of genome-wide variants. We compared the imputation accuracy using 1000 Genomes phase 3 reference panel and a panel generated from genome-wide data on 407 individuals from Western India (WIP). The concordance of imputed variants was cross-checked with next-generation re-sequencing data on a subset of genomic regions. Further, using the genome-wide data from 1880 individuals, we demonstrate that WIP works better than the 1000 Genomes phase 3 panel and when merged with it, significantly improves the imputation accuracy throughout the minor allele frequency range. We also show that imputation using only South Asian component of the 1000 Genomes phase 3 panel works as good as the merged panel, making it computationally less intensive job. Thus, our study stresses that imputation accuracy using 1000 Genomes phase 3 panel can be further improved by including population-specific reference panels from South Asia.

  6. Mapping wildland fuels and forest structure for land management: a comparison of nearest neighbor imputation and other methods

    Kenneth B. Pierce; Janet L. Ohmann; Michael C. Wimberly; Matthew J. Gregory; Jeremy S. Fried

    2009-01-01

    Land managers need consistent information about the geographic distribution of wildland fuels and forest structure over large areas to evaluate fire risk and plan fuel treatments. We compared spatial predictions for 12 fuel and forest structure variables across three regions in the western United States using gradient nearest neighbor (GNN) imputation, linear models (...

  7. Family-based Association Analyses of Imputed Genotypes Reveal Genome-Wide Significant Association of Alzheimer’s disease with OSBPL6, PTPRG and PDCL3

    Herold, Christine; Hooli, Basavaraj V.; Mullin, Kristina; Liu, Tian; Roehr, Johannes T; Mattheisen, Manuel; Parrado, Antonio R.; Bertram, Lars; Lange, Christoph; Tanzi, Rudolph E.

    2015-01-01

    The genetic basis of Alzheimer's disease (AD) is complex and heterogeneous. Over 200 highly penetrant pathogenic variants in the genes APP, PSEN1 and PSEN2 cause a subset of early-onset familial Alzheimer's disease (EOFAD). On the other hand, susceptibility to late-onset forms of AD (LOAD) is indisputably associated to the ε4 allele in the gene APOE, and more recently to variants in more than two-dozen additional genes identified in the large-scale genome-wide association studies (GWAS) and meta-analyses reports. Taken together however, although the heritability in AD is estimated to be as high as 80%, a large proportion of the underlying genetic factors still remain to be elucidated. In this study we performed a systematic family-based genome-wide association and meta-analysis on close to 15 million imputed variants from three large collections of AD families (~3,500 subjects from 1,070 families). Using a multivariate phenotype combining affection status and onset age, meta-analysis of the association results revealed three single nucleotide polymorphisms (SNPs) that achieved genome-wide significance for association with AD risk: rs7609954 in the gene PTPRG (P-value = 3.98·10−08), rs1347297 in the gene OSBPL6 (P-value = 4.53·10−08), and rs1513625 near PDCL3 (P-value = 4.28·10−08). In addition, rs72953347 in OSBPL6 (P-value = 6.36·10−07) and two SNPs in the gene CDKAL1 showed marginally significant association with LOAD (rs10456232, P-value: 4.76·10−07; rs62400067, P-value: 3.54·10−07). In summary, family-based GWAS meta-analysis of imputed SNPs revealed novel genomic variants in (or near) PTPRG, OSBPL6, and PDCL3 that influence risk for AD with genome-wide significance. PMID:26830138

  8. Analyzing the Impacts of Alternated Number of Iterations in Multiple Imputation Method on Explanatory Factor Analysis

    Duygu KOÇAK

    2017-11-01

    Full Text Available The study aims to identify the effects of iteration numbers used in multiple iteration method, one of the methods used to cope with missing values, on the results of factor analysis. With this aim, artificial datasets of different sample sizes were created. Missing values at random and missing values at complete random were created in various ratios by deleting data. For the data in random missing values, a second variable was iterated at ordinal scale level and datasets with different ratios of missing values were obtained based on the levels of this variable. The data were generated using “psych” program in R software, while “dplyr” program was used to create codes that would delete values according to predetermined conditions of missing value mechanism. Different datasets were generated by applying different iteration numbers. Explanatory factor analysis was conducted on the datasets completed and the factors and total explained variances are presented. These values were first evaluated based on the number of factors and total variance explained of the complete datasets. The results indicate that multiple iteration method yields a better performance in cases of missing values at random compared to datasets with missing values at complete random. Also, it was found that increasing the number of iterations in both missing value datasets decreases the difference in the results obtained from complete datasets.

  9. A Comparison of Joint Model and Fully Conditional Specification Imputation for Multilevel Missing Data

    Mistler, Stephen A.; Enders, Craig K.

    2017-01-01

    Multiple imputation methods can generally be divided into two broad frameworks: joint model (JM) imputation and fully conditional specification (FCS) imputation. JM draws missing values simultaneously for all incomplete variables using a multivariate distribution, whereas FCS imputes variables one at a time from a series of univariate conditional…

  10. Different methods for analysing and imputation missing values in wind speed series; La problematica de la calidad de la informacion en series de velocidad del viento-metodologias de analisis y imputacion de datos faltantes

    Ferreira, A. M.

    2004-07-01

    This study concerns about different methods for analysing and imputation missing values in wind speed series. The algorithm EM and a methodology derivated from the sequential hot deck have been utilized. Series with missing values imputed are compared with original and complete series, using several criteria, such the wind potential; and appears to exist a significant goodness of fit between the estimates and real values. (Author)

  11. Highly accurate sequence imputation enables precise QTL mapping in Brown Swiss cattle.

    Frischknecht, Mirjam; Pausch, Hubert; Bapst, Beat; Signer-Hasler, Heidi; Flury, Christine; Garrick, Dorian; Stricker, Christian; Fries, Ruedi; Gredler-Grandl, Birgit

    2017-12-29

    Within the last few years a large amount of genomic information has become available in cattle. Densities of genomic information vary from a few thousand variants up to whole genome sequence information. In order to combine genomic information from different sources and infer genotypes for a common set of variants, genotype imputation is required. In this study we evaluated the accuracy of imputation from high density chips to whole genome sequence data in Brown Swiss cattle. Using four popular imputation programs (Beagle, FImpute, Impute2, Minimac) and various compositions of reference panels, the accuracy of the imputed sequence variant genotypes was high and differences between the programs and scenarios were small. We imputed sequence variant genotypes for more than 1600 Brown Swiss bulls and performed genome-wide association studies for milk fat percentage at two stages of lactation. We found one and three quantitative trait loci for early and late lactation fat content, respectively. Known causal variants that were imputed from the sequenced reference panel were among the most significantly associated variants of the genome-wide association study. Our study demonstrates that whole-genome sequence information can be imputed at high accuracy in cattle populations. Using imputed sequence variant genotypes in genome-wide association studies may facilitate causal variant detection.

  12. Molecular methods for bacterial genotyping and analyzed gene regions

    İbrahim Halil Yıldırım1, Seval Cing Yıldırım2, Nadir Koçak3

    2011-06-01

    Full Text Available Bacterial strain typing is an important process for diagnosis, treatment and epidemiological investigations. Current bacterial strain typing methods may be classified into two main categories: phenotyping and genotyping. Phenotypic characters are the reflection of genetic contents. Genotyping, which refers discrimination of bacterial strains based on their genetic content, has recently become widely used for bacterial strain typing. The methods already used in genotypingof bacteria are quite different from each other. In this review we tried to summarize the basic principles of DNA-based methods used in genotyping of bacteria and describe some important DNA regions that are used in genotyping of bacteria. J Microbiol Infect Dis 2011;1(1:42-46.

  13. Multiple centroid method to evaluate the adaptability of alfalfa genotypes

    Moysés Nascimento

    2015-02-01

    Full Text Available This study aimed to evaluate the efficiency of multiple centroids to study the adaptability of alfalfa genotypes (Medicago sativa L.. In this method, the genotypes are compared with ideotypes defined by the bissegmented regression model, according to the researcher's interest. Thus, genotype classification is carried out as determined by the objective of the researcher and the proposed recommendation strategy. Despite the great potential of the method, it needs to be evaluated under the biological context (with real data. In this context, we used data on the evaluation of dry matter production of 92 alfalfa cultivars, with 20 cuttings, from an experiment in randomized blocks with two repetitions carried out from November 2004 to June 2006. The multiple centroid method proved efficient for classifying alfalfa genotypes. Moreover, it showed no unambiguous indications and provided that ideotypes were defined according to the researcher's interest, facilitating data interpretation.

  14. Gaussian mixture clustering and imputation of microarray data.

    Ouyang, Ming; Welsh, William J; Georgopoulos, Panos

    2004-04-12

    In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.

  15. Genotype call for chromosomal deletions using read-depth from whole genome sequence variants in cattle

    Mesbah-Uddin, Md; Guldbrandtsen, Bernt; Lund, Mogens Sandø

    2018-01-01

    We presented a deletion genotyping (copy-number estimation) method that leverages population-scale whole genome sequence variants data from 1K bull genomes project (1KBGP) to build reference panel for imputation. To estimate deletion-genotype likelihood, we extracted read-depth (RD) data of all...

  16. Comparative analysis of minor histocompatibility antigens genotyping methods

    A. S. Vdovin

    2016-01-01

    Full Text Available The wide range of techniques could be employed to find mismatches in minor histocompatibility antigens between transplant recipients and their donors. In the current study we compared three genotyping methods based on polymerase chain reaction (PCR for four minor antigens. Three of the tested methods: allele-specific PCR, restriction fragment length polymorphism and real-time PCR with TaqMan probes demonstrated 100% reliability when compared to Sanger sequencing for all of the studied polymorphisms. High resolution melting analysis was unsuitable for genotyping of one of the tested minor antigens (HA-1 as it has linked synonymous polymorphism. Obtained data could be used to select the strategy for large-scale clinical genotyping.

  17. Methods for MHC genotyping in non-model vertebrates.

    Babik, W

    2010-03-01

    Genes of the major histocompatibility complex (MHC) are considered a paradigm of adaptive evolution at the molecular level and as such are frequently investigated by evolutionary biologists and ecologists. Accurate genotyping is essential for understanding of the role that MHC variation plays in natural populations, but may be extremely challenging. Here, I discuss the DNA-based methods currently used for genotyping MHC in non-model vertebrates, as well as techniques likely to find widespread use in the future. I also highlight the aspects of MHC structure that are relevant for genotyping, and detail the challenges posed by the complex genomic organization and high sequence variation of MHC loci. Special emphasis is placed on designing appropriate PCR primers, accounting for artefacts and the problem of genotyping alleles from multiple, co-amplifying loci, a strategy which is frequently necessary due to the structure of the MHC. The suitability of typing techniques is compared in various research situations, strategies for efficient genotyping are discussed and areas of likely progress in future are identified. This review addresses the well established typing methods such as the Single Strand Conformation Polymorphism (SSCP), Denaturing Gradient Gel Electrophoresis (DGGE), Reference Strand Conformational Analysis (RSCA) and cloning of PCR products. In addition, it includes the intriguing possibility of direct amplicon sequencing followed by the computational inference of alleles and also next generation sequencing (NGS) technologies; the latter technique may, in the future, find widespread use in typing complex multilocus MHC systems. © 2009 Blackwell Publishing Ltd.

  18. Random Forest as an Imputation Method for Education and Psychology Research: Its Impact on Item Fit and Difficulty of the Rasch Model

    Golino, Hudson F.; Gomes, Cristiano M. A.

    2016-01-01

    This paper presents a non-parametric imputation technique, named random forest, from the machine learning field. The random forest procedure has two main tuning parameters: the number of trees grown in the prediction and the number of predictors used. Fifty experimental conditions were created in the imputation procedure, with different…

  19. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel.

    Mitt, Mario; Kals, Mart; Pärn, Kalle; Gabriel, Stacey B; Lander, Eric S; Palotie, Aarno; Ripatti, Samuli; Morris, Andrew P; Metspalu, Andres; Esko, Tõnu; Mägi, Reedik; Palta, Priit

    2017-06-01

    Genetic imputation is a cost-efficient way to improve the power and resolution of genome-wide association (GWA) studies. Current publicly accessible imputation reference panels accurately predict genotypes for common variants with minor allele frequency (MAF)≥5% and low-frequency variants (0.5≤MAF<5%) across diverse populations, but the imputation of rare variation (MAF<0.5%) is still rather limited. In the current study, we evaluate imputation accuracy achieved with reference panels from diverse populations with a population-specific high-coverage (30 ×) whole-genome sequencing (WGS) based reference panel, comprising of 2244 Estonian individuals (0.25% of adult Estonians). Although the Estonian-specific panel contains fewer haplotypes and variants, the imputation confidence and accuracy of imputed low-frequency and rare variants was significantly higher. The results indicate the utility of population-specific reference panels for human genetic studies.

  20. Imputation and quality control steps for combining multiple genome-wide datasets

    Shefali S Verma

    2014-12-01

    Full Text Available The electronic MEdical Records and GEnomics (eMERGE network brings together DNA biobanks linked to electronic health records (EHRs from multiple institutions. Approximately 52,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes, and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2 were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.

  1. Genomic evaluations with many more genotypes

    Wiggans George R

    2011-03-01

    Full Text Available Abstract Background Genomic evaluations in Holstein dairy cattle have quickly become more reliable over the last two years in many countries as more animals have been genotyped for 50,000 markers. Evaluations can also include animals genotyped with more or fewer markers using new tools such as the 777,000 or 2,900 marker chips recently introduced for cattle. Gains from more markers can be predicted using simulation, whereas strategies to use fewer markers have been compared using subsets of actual genotypes. The overall cost of selection is reduced by genotyping most animals at less than the highest density and imputing their missing genotypes using haplotypes. Algorithms to combine different densities need to be efficient because numbers of genotyped animals and markers may continue to grow quickly. Methods Genotypes for 500,000 markers were simulated for the 33,414 Holsteins that had 50,000 marker genotypes in the North American database. Another 86,465 non-genotyped ancestors were included in the pedigree file, and linkage disequilibrium was generated directly in the base population. Mixed density datasets were created by keeping 50,000 (every tenth of the markers for most animals. Missing genotypes were imputed using a combination of population haplotyping and pedigree haplotyping. Reliabilities of genomic evaluations using linear and nonlinear methods were compared. Results Differing marker sets for a large population were combined with just a few hours of computation. About 95% of paternal alleles were determined correctly, and > 95% of missing genotypes were called correctly. Reliability of breeding values was already high (84.4% with 50,000 simulated markers. The gain in reliability from increasing the number of markers to 500,000 was only 1.6%, but more than half of that gain resulted from genotyping just 1,406 young bulls at higher density. Linear genomic evaluations had reliabilities 1.5% lower than the nonlinear evaluations with 50

  2. Missing value imputation for epistatic MAPs

    Ryan, Colm

    2010-04-20

    Abstract Background Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data. Results We identify different categories for the missing data based on their underlying cause, and show that values from the largest category can be imputed effectively. We compare local and global imputation approaches across a variety of distinct E-MAP datasets, showing that both are competitive and preferable to filling in with zeros. In addition we show that these methods are effective in an E-MAP from a different species, suggesting that pairwise imputation techniques will be increasingly useful as analogous epistasis mapping techniques are developed in different species. We show that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions. Finally we show that imputed interactions, generated using nearest neighbor methods, are enriched for annotations in the same manner as measured interactions. Therefore our method potentially

  3. Public Undertakings and Imputability

    Ølykke, Grith Skovgaard

    2013-01-01

    In this article, the issue of impuability to the State of public undertakings’ decision-making is analysed and discussed in the context of the DSBFirst case. DSBFirst is owned by the independent public undertaking DSB and the private undertaking FirstGroup plc and won the contracts in the 2008...... Oeresund tender for the provision of passenger transport by railway. From the start, the services were provided at a loss, and in the end a part of DSBFirst was wound up. In order to frame the problems illustrated by this case, the jurisprudence-based imputability requirement in the definition of State aid...... in Article 107(1) TFEU is analysed. It is concluded that where the public undertaking transgresses the control system put in place by the State, conditions for imputability are not fulfilled, and it is argued that in the current state of law, there is no conditional link between the level of control...

  4. Estimating the accuracy of geographical imputation

    Boscoe Francis P

    2008-01-01

    Full Text Available Abstract Background To reduce the number of non-geocoded cases researchers and organizations sometimes include cases geocoded to postal code centroids along with cases geocoded with the greater precision of a full street address. Some analysts then use the postal code to assign information to the cases from finer-level geographies such as a census tract. Assignment is commonly completed using either a postal centroid or by a geographical imputation method which assigns a location by using both the demographic characteristics of the case and the population characteristics of the postal delivery area. To date no systematic evaluation of geographical imputation methods ("geo-imputation" has been completed. The objective of this study was to determine the accuracy of census tract assignment using geo-imputation. Methods Using a large dataset of breast, prostate and colorectal cancer cases reported to the New Jersey Cancer Registry, we determined how often cases were assigned to the correct census tract using alternate strategies of demographic based geo-imputation, and using assignments obtained from postal code centroids. Assignment accuracy was measured by comparing the tract assigned with the tract originally identified from the full street address. Results Assigning cases to census tracts using the race/ethnicity population distribution within a postal code resulted in more correctly assigned cases than when using postal code centroids. The addition of age characteristics increased the match rates even further. Match rates were highly dependent on both the geographic distribution of race/ethnicity groups and population density. Conclusion Geo-imputation appears to offer some advantages and no serious drawbacks as compared with the alternative of assigning cases to census tracts based on postal code centroids. For a specific analysis, researchers will still need to consider the potential impact of geocoding quality on their results and evaluate

  5. Cost reduction for web-based data imputation

    Li, Zhixu

    2014-01-01

    Web-based Data Imputation enables the completion of incomplete data sets by retrieving absent field values from the Web. In particular, complete fields can be used as keywords in imputation queries for absent fields. However, due to the ambiguity of these keywords and the data complexity on the Web, different queries may retrieve different answers to the same absent field value. To decide the most probable right answer to each absent filed value, existing method issues quite a few available imputation queries for each absent value, and then vote on deciding the most probable right answer. As a result, we have to issue a large number of imputation queries for filling all absent values in an incomplete data set, which brings a large overhead. In this paper, we work on reducing the cost of Web-based Data Imputation in two aspects: First, we propose a query execution scheme which can secure the most probable right answer to an absent field value by issuing as few imputation queries as possible. Second, we recognize and prune queries that probably will fail to return any answers a priori. Our extensive experimental evaluation shows that our proposed techniques substantially reduce the cost of Web-based Imputation without hurting its high imputation accuracy. © 2014 Springer International Publishing Switzerland.

  6. A note on the relationships between multiple imputation, maximum likelihood and fully Bayesian methods for missing responses in linear regression models.

    Chen, Qingxia; Ibrahim, Joseph G

    2014-07-01

    Multiple Imputation, Maximum Likelihood and Fully Bayesian methods are the three most commonly used model-based approaches in missing data problems. Although it is easy to show that when the responses are missing at random (MAR), the complete case analysis is unbiased and efficient, the aforementioned methods are still commonly used in practice for this setting. To examine the performance of and relationships between these three methods in this setting, we derive and investigate small sample and asymptotic expressions of the estimates and standard errors, and fully examine how these estimates are related for the three approaches in the linear regression model when the responses are MAR. We show that when the responses are MAR in the linear model, the estimates of the regression coefficients using these three methods are asymptotically equivalent to the complete case estimates under general conditions. One simulation and a real data set from a liver cancer clinical trial are given to compare the properties of these methods when the responses are MAR.

  7. Fully conditional specification in multivariate imputation

    van Buuren, S.; Brand, J. P.L.; Groothuis-Oudshoorn, C. G.M.; Rubin, D. B.

    2006-01-01

    The use of the Gibbs sampler with fully conditionally specified models, where the distribution of each variable given the other variables is the starting point, has become a popular method to create imputations in incomplete multivariate data. The theoretical weakness of this approach is that the

  8. Quick, “Imputation-free” meta-analysis with proxy-SNPs

    Meesters Christian

    2012-09-01

    Full Text Available Abstract Background Meta-analysis (MA is widely used to pool genome-wide association studies (GWASes in order to a increase the power to detect strong or weak genotype effects or b as a result verification method. As a consequence of differing SNP panels among genotyping chips, imputation is the method of choice within GWAS consortia to avoid losing too many SNPs in a MA. YAMAS (Yet Another Meta Analysis Software, however, enables cross-GWAS conclusions prior to finished and polished imputation runs, which eventually are time-consuming. Results Here we present a fast method to avoid forfeiting SNPs present in only a subset of studies, without relying on imputation. This is accomplished by using reference linkage disequilibrium data from 1,000 Genomes/HapMap projects to find proxy-SNPs together with in-phase alleles for SNPs missing in at least one study. MA is conducted by combining association effect estimates of a SNP and those of its proxy-SNPs. Our algorithm is implemented in the MA software YAMAS. Association results from GWAS analysis applications can be used as input files for MA, tremendously speeding up MA compared to the conventional imputation approach. We show that our proxy algorithm is well-powered and yields valuable ad hoc results, possibly providing an incentive for follow-up studies. We propose our method as a quick screening step prior to imputation-based MA, as well as an additional main approach for studies without available reference data matching the ethnicities of study participants. As a proof of principle, we analyzed six dbGaP Type II Diabetes GWAS and found that the proxy algorithm clearly outperforms naïve MA on the p-value level: for 17 out of 23 we observe an improvement on the p-value level by a factor of more than two, and a maximum improvement by a factor of 2127. Conclusions YAMAS is an efficient and fast meta-analysis program which offers various methods, including conventional MA as well as inserting proxy

  9. [Analytic methods for seed models with genotype x environment interactions].

    Zhu, J

    1996-01-01

    Genetic models with genotype effect (G) and genotype x environment interaction effect (GE) are proposed for analyzing generation means of seed quantitative traits in crops. The total genetic effect (G) is partitioned into seed direct genetic effect (G0), cytoplasm genetic of effect (C), and maternal plant genetic effect (Gm). Seed direct genetic effect (G0) can be further partitioned into direct additive (A) and direct dominance (D) genetic components. Maternal genetic effect (Gm) can also be partitioned into maternal additive (Am) and maternal dominance (Dm) genetic components. The total genotype x environment interaction effect (GE) can also be partitioned into direct genetic by environment interaction effect (G0E), cytoplasm genetic by environment interaction effect (CE), and maternal genetic by environment interaction effect (GmE). G0E can be partitioned into direct additive by environment interaction (AE) and direct dominance by environment interaction (DE) genetic components. GmE can also be partitioned into maternal additive by environment interaction (AmE) and maternal dominance by environment interaction (DmE) genetic components. Partitions of genetic components are listed for parent, F1, F2 and backcrosses. A set of parents, their reciprocal F1 and F2 seeds is applicable for efficient analysis of seed quantitative traits. MINQUE(0/1) method can be used for estimating variance and covariance components. Unbiased estimation for covariance components between two traits can also be obtained by the MINQUE(0/1) method. Random genetic effects in seed models are predictable by the Adjusted Unbiased Prediction (AUP) approach with MINQUE(0/1) method. The jackknife procedure is suggested for estimation of sampling variances of estimated variance and covariance components and of predicted genetic effects, which can be further used in a t-test for parameter. Unbiasedness and efficiency for estimating variance components and predicting genetic effects are tested by

  10. Multiple Imputation of Predictor Variables Using Generalized Additive Models

    de Jong, Roel; van Buuren, Stef; Spiess, Martin

    2016-01-01

    The sensitivity of multiple imputation methods to deviations from their distributional assumptions is investigated using simulations, where the parameters of scientific interest are the coefficients of a linear regression model, and values in predictor variables are missing at random. The

  11. Practical considerations for sensitivity analysis after multiple imputation applied to epidemiological studies with incomplete data

    2012-01-01

    Background Multiple Imputation as usually implemented assumes that data are Missing At Random (MAR), meaning that the underlying missing data mechanism, given the observed data, is independent of the unobserved data. To explore the sensitivity of the inferences to departures from the MAR assumption, we applied the method proposed by Carpenter et al. (2007). This approach aims to approximate inferences under a Missing Not At random (MNAR) mechanism by reweighting estimates obtained after multiple imputation where the weights depend on the assumed degree of departure from the MAR assumption. Methods The method is illustrated with epidemiological data from a surveillance system of hepatitis C virus (HCV) infection in France during the 2001–2007 period. The subpopulation studied included 4343 HCV infected patients who reported drug use. Risk factors for severe liver disease were assessed. After performing complete-case and multiple imputation analyses, we applied the sensitivity analysis to 3 risk factors of severe liver disease: past excessive alcohol consumption, HIV co-infection and infection with HCV genotype 3. Results In these data, the association between severe liver disease and HIV was underestimated, if given the observed data the chance of observing HIV status is high when this is positive. Inference for two other risk factors were robust to plausible local departures from the MAR assumption. Conclusions We have demonstrated the practical utility of, and advocate, a pragmatic widely applicable approach to exploring plausible departures from the MAR assumption post multiple imputation. We have developed guidelines for applying this approach to epidemiological studies. PMID:22681630

  12. BRITS: Bidirectional Recurrent Imputation for Time Series

    Cao, Wei; Wang, Dong; Li, Jian; Zhou, Hao; Li, Lei; Li, Yitan

    2018-01-01

    Time series are widely used as signals in many classification/regression tasks. It is ubiquitous that time series contains many missing values. Given multiple correlated time series data, how to fill in missing values and to predict their class labels? Existing imputation methods often impose strong assumptions of the underlying data generating process, such as linear dynamics in the state space. In this paper, we propose BRITS, a novel method based on recurrent neural networks for missing va...

  13. A comparison of selected parametric and non-parametric imputation methods for estimating forest biomass and basal area

    Donald Gagliasso; Susan Hummel; Hailemariam. Temesgen

    2014-01-01

    Various methods have been used to estimate the amount of above ground forest biomass across landscapes and to create biomass maps for specific stands or pixels across ownership or project areas. Without an accurate estimation method, land managers might end up with incorrect biomass estimate maps, which could lead them to make poorer decisions in their future...

  14. Clustering with Missing Values: No Imputation Required

    Wagstaff, Kiri

    2004-01-01

    Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.

  15. Imputing amino acid polymorphisms in human leukocyte antigens.

    Xiaoming Jia

    Full Text Available DNA sequence variation within human leukocyte antigen (HLA genes mediate susceptibility to a wide range of human diseases. The complex genetic structure of the major histocompatibility complex (MHC makes it difficult, however, to collect genotyping data in large cohorts. Long-range linkage disequilibrium between HLA loci and SNP markers across the major histocompatibility complex (MHC region offers an alternative approach through imputation to interrogate HLA variation in existing GWAS data sets. Here we describe a computational strategy, SNP2HLA, to impute classical alleles and amino acid polymorphisms at class I (HLA-A, -B, -C and class II (-DPA1, -DPB1, -DQA1, -DQB1, and -DRB1 loci. To characterize performance of SNP2HLA, we constructed two European ancestry reference panels, one based on data collected in HapMap-CEPH pedigrees (90 individuals and another based on data collected by the Type 1 Diabetes Genetics Consortium (T1DGC, 5,225 individuals. We imputed HLA alleles in an independent data set from the British 1958 Birth Cohort (N = 918 with gold standard four-digit HLA types and SNPs genotyped using the Affymetrix GeneChip 500 K and Illumina Immunochip microarrays. We demonstrate that the sample size of the reference panel, rather than SNP density of the genotyping platform, is critical to achieve high imputation accuracy. Using the larger T1DGC reference panel, the average accuracy at four-digit resolution is 94.7% using the low-density Affymetrix GeneChip 500 K, and 96.7% using the high-density Illumina Immunochip. For amino acid polymorphisms within HLA genes, we achieve 98.6% and 99.3% accuracy using the Affymetrix GeneChip 500 K and Illumina Immunochip, respectively. Finally, we demonstrate how imputation and association testing at amino acid resolution can facilitate fine-mapping of primary MHC association signals, giving a specific example from type 1 diabetes.

  16. Multiple Improvements of Multiple Imputation Likelihood Ratio Tests

    Chan, Kin Wai; Meng, Xiao-Li

    2017-01-01

    Multiple imputation (MI) inference handles missing data by first properly imputing the missing values $m$ times, and then combining the $m$ analysis results from applying a complete-data procedure to each of the completed datasets. However, the existing method for combining likelihood ratio tests has multiple defects: (i) the combined test statistic can be negative in practice when the reference null distribution is a standard $F$ distribution; (ii) it is not invariant to re-parametrization; ...

  17. Bootstrap inference when using multiple imputation.

    Schomaker, Michael; Heumann, Christian

    2018-04-16

    Many modern estimators require bootstrapping to calculate confidence intervals because either no analytic standard error is available or the distribution of the parameter of interest is nonsymmetric. It remains however unclear how to obtain valid bootstrap inference when dealing with multiple imputation to address missing data. We present 4 methods that are intuitively appealing, easy to implement, and combine bootstrap estimation with multiple imputation. We show that 3 of the 4 approaches yield valid inference, but that the performance of the methods varies with respect to the number of imputed data sets and the extent of missingness. Simulation studies reveal the behavior of our approaches in finite samples. A topical analysis from HIV treatment research, which determines the optimal timing of antiretroviral treatment initiation in young children, demonstrates the practical implications of the 4 methods in a sophisticated and realistic setting. This analysis suffers from missing data and uses the g-formula for inference, a method for which no standard errors are available. Copyright © 2018 John Wiley & Sons, Ltd.

  18. Multiply-Imputed Synthetic Data: Advice to the Imputer

    Loong Bronwyn

    2017-12-01

    Full Text Available Several statistical agencies have started to use multiply-imputed synthetic microdata to create public-use data in major surveys. The purpose of doing this is to protect the confidentiality of respondents’ identities and sensitive attributes, while allowing standard complete-data analyses of microdata. A key challenge, faced by advocates of synthetic data, is demonstrating that valid statistical inferences can be obtained from such synthetic data for non-confidential questions. Large discrepancies between observed-data and synthetic-data analytic results for such questions may arise because of uncongeniality; that is, differences in the types of inputs available to the imputer, who has access to the actual data, and to the analyst, who has access only to the synthetic data. Here, we discuss a simple, but possibly canonical, example of uncongeniality when using multiple imputation to create synthetic data, which specifically addresses the choices made by the imputer. An initial, unanticipated but not surprising, conclusion is that non-confidential design information used to impute synthetic data should be released with the confidential synthetic data to allow users of synthetic data to avoid possible grossly conservative inferences.

  19. Methods for discovering and validating relationships among genotyped animals

    Genomic selection based on single-nucleotide polymorphisms (SNPs) has led to the collection of genotypes for over 2.2 million animals by the Council on Dairy Cattle Breeding in the United States. To assure that a genotype is assigned to the correct animal and that the animal’s pedigree is correct, t...

  20. Increasing imputation and prediction accuracy for Chinese Holsteins using joint Chinese-Nordic reference population

    Ma, Peipei; Lund, Mogens Sandø; Ding, X

    2015-01-01

    This study investigated the effect of including Nordic Holsteins in the reference population on the imputation accuracy and prediction accuracy for Chinese Holsteins. The data used in this study include 85 Chinese Holstein bulls genotyped with both 54K chip and 777K (HD) chip, 2862 Chinese cows...... was improved slightly when using the marker data imputed based on the combined HD reference data, compared with using the marker data imputed based on the Chinese HD reference data only. On the other hand, when using the combined reference population including 4398 Nordic Holstein bulls, the accuracy...... to increase reference population rather than increasing marker density...

  1. A web-based approach to data imputation

    Li, Zhixu

    2013-10-24

    In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Moreover, several optimization techniques are also proposed to reduce the cost of estimating the confidence of imputation queries at both the tuple-level and the database-level. Experiments based on several real-world data collections demonstrate not only the effectiveness of WebPut compared to existing approaches, but also the efficiency of our proposed algorithms and optimization techniques. © 2013 Springer Science+Business Media New York.

  2. Imputation of the rare HOXB13 G84E mutation and cancer risk in a large population-based cohort.

    Thomas J Hoffmann

    2015-01-01

    Full Text Available An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into large well-phenotyped cohorts with existing genome-wide genotype data using large sequenced referenced panels. The success of this approach hinges on the accuracy of rare variant imputation, which remains controversial. For example, a recent study suggested that one cannot adequately impute the HOXB13 G84E mutation associated with prostate cancer risk (carrier frequency of 0.0034 in European ancestry participants in the 1000 Genomes Project. We show that by utilizing the 1000 Genomes Project data plus an enriched reference panel of mutation carriers we were able to accurately impute the G84E mutation into a large cohort of 83,285 non-Hispanic White participants from the Kaiser Permanente Research Program on Genes, Environment and Health Genetic Epidemiology Research on Adult Health and Aging cohort. Imputation authenticity was confirmed via a novel classification and regression tree method, and then empirically validated analyzing a subset of these subjects plus an additional 1,789 men from Kaiser specifically genotyped for the G84E mutation (r2 = 0.57, 95% CI = 0.37–0.77. We then show the value of this approach by using the imputed data to investigate the impact of the G84E mutation on age-specific prostate cancer risk and on risk of fourteen other cancers in the cohort. The age-specific risk of prostate cancer among G84E mutation carriers was higher than among non-carriers. Risk estimates from Kaplan-Meier curves were 36.7% versus 13.6% by age 72, and 64.2% versus 24.2% by age 80, for G84E mutation carriers and non-carriers, respectively (p = 3.4x10-12. The G84E mutation was also associated with an increase in risk for the fourteen other most common cancers considered collectively (p = 5.8x10-4 and more so in cases diagnosed with multiple cancer types, both those including and not including prostate cancer, strongly suggesting

  3. Phenotypic and Genotypic Eligible Methods for Salmonella Typhimurium Source Tracking.

    Ferrari, Rafaela G; Panzenhagen, Pedro H N; Conte-Junior, Carlos A

    2017-01-01

    Salmonellosis is one of the most common causes of foodborne infection and a leading cause of human gastroenteritis. Throughout the last decade, Salmonella enterica serotype Typhimurium (ST) has shown an increase report with the simultaneous emergence of multidrug-resistant isolates, as phage type DT104. Therefore, to successfully control this microorganism, it is important to attribute salmonellosis to the exact source. Studies of Salmonella source attribution have been performed to determine the main food/food-production animals involved, toward which, control efforts should be correctly directed. Hence, the election of a ST subtyping method depends on the particular problem that efforts must be directed, the resources and the data available. Generally, before choosing a molecular subtyping, phenotyping approaches such as serotyping, phage typing, and antimicrobial resistance profiling are implemented as a screening of an investigation, and the results are computed using frequency-matching models (i.e., Dutch, Hald and Asymmetric Island models). Actually, due to the advancement of molecular tools as PFGE, MLVA, MLST, CRISPR, and WGS more precise results have been obtained, but even with these technologies, there are still gaps to be elucidated. To address this issue, an important question needs to be answered: what are the currently suitable subtyping methods to source attribute ST. This review presents the most frequently applied subtyping methods used to characterize ST, analyses the major available microbial subtyping attribution models and ponders the use of conventional phenotyping methods, as well as, the most applied genotypic tools in the context of their potential applicability to investigates ST source tracking.

  4. Phenotypic and Genotypic Eligible Methods for Salmonella Typhimurium Source Tracking

    Rafaela G. Ferrari

    2017-12-01

    Full Text Available Salmonellosis is one of the most common causes of foodborne infection and a leading cause of human gastroenteritis. Throughout the last decade, Salmonella enterica serotype Typhimurium (ST has shown an increase report with the simultaneous emergence of multidrug-resistant isolates, as phage type DT104. Therefore, to successfully control this microorganism, it is important to attribute salmonellosis to the exact source. Studies of Salmonella source attribution have been performed to determine the main food/food-production animals involved, toward which, control efforts should be correctly directed. Hence, the election of a ST subtyping method depends on the particular problem that efforts must be directed, the resources and the data available. Generally, before choosing a molecular subtyping, phenotyping approaches such as serotyping, phage typing, and antimicrobial resistance profiling are implemented as a screening of an investigation, and the results are computed using frequency-matching models (i.e., Dutch, Hald and Asymmetric Island models. Actually, due to the advancement of molecular tools as PFGE, MLVA, MLST, CRISPR, and WGS more precise results have been obtained, but even with these technologies, there are still gaps to be elucidated. To address this issue, an important question needs to be answered: what are the currently suitable subtyping methods to source attribute ST. This review presents the most frequently applied subtyping methods used to characterize ST, analyses the major available microbial subtyping attribution models and ponders the use of conventional phenotyping methods, as well as, the most applied genotypic tools in the context of their potential applicability to investigates ST source tracking.

  5. Flexible Imputation of Missing Data

    van Buuren, Stef

    2012-01-01

    Missing data form a problem in every scientific discipline, yet the techniques required to handle them are complicated and often lacking. One of the great ideas in statistical science--multiple imputation--fills gaps in the data with plausible values, the uncertainty of which is coded in the data itself. It also solves other problems, many of which are missing data problems in disguise. Flexible Imputation of Missing Data is supported by many examples using real data taken from the author's vast experience of collaborative research, and presents a practical guide for handling missing data unde

  6. Comparison of Imputation Methods for Handling Missing Categorical Data with Univariate Pattern|| Una comparación de métodos de imputación de variables categóricas con patrón univariado

    Torres Munguía, Juan Armando

    2014-06-01

    Full Text Available This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random forests. Results showed that the most effective methods for dealing with missing categorical data in most of the scenarios assessed in this paper were hot-deck and polytomous regression approaches. || El presente estudio examina la estimación de proporciones muestrales en la presencia de valores faltantes en una variable categórica. Se utiliza una encuesta de consumo de tabaco (Encuesta Nacional de Adicciones de México 2011 para crear bases de datos simuladas pero reales con 5% y 15% de valores perdidos para cada mecanismo de no respuesta MCAR, MAR y MNAR. Se evalúa el desempeño de seis métodos para tratar la falta de respuesta: listwise, imputación de moda, imputación aleatoria, hot-deck, imputación por regresión politómica y árboles de clasificación. Los resultados de las simulaciones indican que los métodos más efectivos para el tratamiento de la no respuesta en variables categóricas, bajo los escenarios simulados, son hot-deck y la regresión politómica.

  7. A multicenter evaluation of genotypic methods for the epidemiologic typing of Legionella pneumophila serogroup 1

    Fry, Norman K.; Alexiou-Daniel, Stella; Bangsborg, Jette Marie

    1999-01-01

    OBJECTIVES: To compare genotypic methods for epidemiologic typing of Legionella pneumophila serogroup (sg) 1, in order to determine the best available method within Europe for implementation and standardization by members of the European Working Group on Legionella Infections. METHODS: Coded...

  8. PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.

    Oren E Livne

    2015-03-01

    Full Text Available Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm, a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs, from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.

  9. A comparison of different algorithms for phasing haplotypes using Holstein cattle genotypes and pedigree data.

    Miar, Younes; Sargolzaei, Mehdi; Schenkel, Flavio S

    2017-04-01

    Phasing genotypes to haplotypes is becoming increasingly important due to its applications in the study of diseases, population and evolutionary genetics, imputation, and so on. Several studies have focused on the development of computational methods that infer haplotype phase from population genotype data. The aim of this study was to compare phasing algorithms implemented in Beagle, Findhap, FImpute, Impute2, and ShapeIt2 software using 50k and 777k (HD) genotyping data. Six scenarios were considered: no-parents, sire-progeny pairs, sire-dam-progeny trios, each with and without pedigree information in Holstein cattle. Algorithms were compared with respect to their phasing accuracy and computational efficiency. In the studied population, Beagle and FImpute were more accurate than other phasing algorithms. Across scenarios, phasing accuracies for Beagle and FImpute were 99.49-99.90% and 99.44-99.99% for 50k, respectively, and 99.90-99.99% and 99.87-99.99% for HD, respectively. Generally, FImpute resulted in higher accuracy when genotypic information of at least one parent was available. In the absence of parental genotypes and pedigree information, Beagle and Impute2 (with double the default number of states) were slightly more accurate than FImpute. Findhap gave high phasing accuracy when parents' genotypes and pedigree information were available. In terms of computing time, Findhap was the fastest algorithm followed by FImpute. FImpute was 30 to 131, 87 to 786, and 353 to 1,400 times faster across scenarios than Beagle, ShapeIt2, and Impute2, respectively. In summary, FImpute and Beagle were the most accurate phasing algorithms. Moreover, the low computational requirement of FImpute makes it an attractive algorithm for phasing genotypes of large livestock populations. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  10. Data imputation analysis for Cosmic Rays time series

    Fernandes, R. C.; Lucio, P. S.; Fernandez, J. H.

    2017-05-01

    The occurrence of missing data concerning Galactic Cosmic Rays time series (GCR) is inevitable since loss of data is due to mechanical and human failure or technical problems and different periods of operation of GCR stations. The aim of this study was to perform multiple dataset imputation in order to depict the observational dataset. The study has used the monthly time series of GCR Climax (CLMX) and Roma (ROME) from 1960 to 2004 to simulate scenarios of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of missing data compared to observed ROME series, with 50 replicates. Then, the CLMX station as a proxy for allocation of these scenarios was used. Three different methods for monthly dataset imputation were selected: AMÉLIA II - runs the bootstrap Expectation Maximization algorithm, MICE - runs an algorithm via Multivariate Imputation by Chained Equations and MTSDI - an Expectation Maximization algorithm-based method for imputation of missing values in multivariate normal time series. The synthetic time series compared with the observed ROME series has also been evaluated using several skill measures as such as RMSE, NRMSE, Agreement Index, R, R2, F-test and t-test. The results showed that for CLMX and ROME, the R2 and R statistics were equal to 0.98 and 0.96, respectively. It was observed that increases in the number of gaps generate loss of quality of the time series. Data imputation was more efficient with MTSDI method, with negligible errors and best skill coefficients. The results suggest a limit of about 60% of missing data for imputation, for monthly averages, no more than this. It is noteworthy that CLMX, ROME and KIEL stations present no missing data in the target period. This methodology allowed reconstructing 43 time series.

  11. Missing value imputation: with application to handwriting data

    Xu, Zhen; Srihari, Sargur N.

    2015-01-01

    Missing values make pattern analysis difficult, particularly with limited available data. In longitudinal research, missing values accumulate, thereby aggravating the problem. Here we consider how to deal with temporal data with missing values in handwriting analysis. In the task of studying development of individuality of handwriting, we encountered the fact that feature values are missing for several individuals at several time instances. Six algorithms, i.e., random imputation, mean imputation, most likely independent value imputation, and three methods based on Bayesian network (static Bayesian network, parameter EM, and structural EM), are compared with children's handwriting data. We evaluate the accuracy and robustness of the algorithms under different ratios of missing data and missing values, and useful conclusions are given. Specifically, static Bayesian network is used for our data which contain around 5% missing data to provide adequate accuracy and low computational cost.

  12. Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans.

    Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne; Bourgeois, Stephane; Svensson, Peter J; Wadelius, Mia; Deloukas, Panos; Montgomery, Stephen B; Altman, Russ B

    2017-11-24

    Genome-wide association studies are useful for discovering genotype-phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into "gene level" effects. Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression-on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort.

  13. Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations

    Dassonneville, R; Brøndum, Rasmus Froberg; Druet, T

    2011-01-01

    The purpose of this study was to investigate the imputation error and loss of reliability of direct genomic values (DGV) or genomically enhanced breeding values (GEBV) when using genotypes imputed from a 3,000-marker single nucleotide polymorphism (SNP) panel to a 50,000-marker SNP panel. Data...... of missing markers and prediction of breeding values were performed using 2 different reference populations in each country: either a national reference population or a combined EuroGenomics reference population. Validation for accuracy of imputation and genomic prediction was done based on national test...... with a national reference data set gave an absolute loss of 0.05 in mean reliability of GEBV in the French study, whereas a loss of 0.03 was obtained for reliability of DGV in the Nordic study. When genotypes were imputed using the EuroGenomics reference, a loss of 0.02 in mean reliability of GEBV was detected...

  14. 3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data.

    Luo, Yuan; Szolovits, Peter; Dighe, Anand S; Baron, Jason M

    2018-06-01

    A key challenge in clinical data mining is that most clinical datasets contain missing data. Since many commonly used machine learning algorithms require complete datasets (no missing data), clinical analytic approaches often entail an imputation procedure to "fill in" missing data. However, although most clinical datasets contain a temporal component, most commonly used imputation methods do not adequately accommodate longitudinal time-based data. We sought to develop a new imputation algorithm, 3-dimensional multiple imputation with chained equations (3D-MICE), that can perform accurate imputation of missing clinical time series data. We extracted clinical laboratory test results for 13 commonly measured analytes (clinical laboratory tests). We imputed missing test results for the 13 analytes using 3 imputation methods: multiple imputation with chained equations (MICE), Gaussian process (GP), and 3D-MICE. 3D-MICE utilizes both MICE and GP imputation to integrate cross-sectional and longitudinal information. To evaluate imputation method performance, we randomly masked selected test results and imputed these masked results alongside results missing from our original data. We compared predicted results to measured results for masked data points. 3D-MICE performed significantly better than MICE and GP-based imputation in a composite of all 13 analytes, predicting missing results with a normalized root-mean-square error of 0.342, compared to 0.373 for MICE alone and 0.358 for GP alone. 3D-MICE offers a novel and practical approach to imputing clinical laboratory time series data. 3D-MICE may provide an additional tool for use as a foundation in clinical predictive analytics and intelligent clinical decision support.

  15. Comparing strategies for selection of low-density SNPs for imputation-mediated genomic prediction in U. S. Holsteins.

    He, Jun; Xu, Jiaqi; Wu, Xiao-Lin; Bauck, Stewart; Lee, Jungjae; Morota, Gota; Kachman, Stephen D; Spangler, Matthew L

    2018-04-01

    SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821-0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825-0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction.

  16. Using imputation to provide location information for nongeocoded addresses.

    Frank C Curriero

    2010-02-01

    Full Text Available The importance of geography as a source of variation in health research continues to receive sustained attention in the literature. The inclusion of geographic information in such research often begins by adding data to a map which is predicated by some knowledge of location. A precise level of spatial information is conventionally achieved through geocoding, the geographic information system (GIS process of translating mailing address information to coordinates on a map. The geocoding process is not without its limitations, though, since there is always a percentage of addresses which cannot be converted successfully (nongeocodable. This raises concerns regarding bias since traditionally the practice has been to exclude nongeocoded data records from analysis.In this manuscript we develop and evaluate a set of imputation strategies for dealing with missing spatial information from nongeocoded addresses. The strategies are developed assuming a known zip code with increasing use of collateral information, namely the spatial distribution of the population at risk. Strategies are evaluated using prostate cancer data obtained from the Maryland Cancer Registry. We consider total case enumerations at the Census county, tract, and block group level as the outcome of interest when applying and evaluating the methods. Multiple imputation is used to provide estimated total case counts based on complete data (geocodes plus imputed nongeocodes with a measure of uncertainty. Results indicate that the imputation strategy based on using available population-based age, gender, and race information performed the best overall at the county, tract, and block group levels.The procedure allows for the potentially biased and likely under reported outcome, case enumerations based on only the geocoded records, to be presented with a statistically adjusted count (imputed count with a measure of uncertainty that are based on all the case data, the geocodes and imputed

  17. Comparação de métodos de imputação única e múltipla usando como exemplo um modelo de risco para mortalidade cirúrgica Comparison of simple and multiple imputation methods using a risk model for surgical mortality as example

    Luciana Neves Nunes

    2010-12-01

    Full Text Available INTRODUÇÃO: A perda de informações é um problema frequente em estudos realizados na área da Saúde. Na literatura essa perda é chamada de missing data ou dados faltantes. Através da imputação dos dados faltantes são criados conjuntos de dados artificialmente completos que podem ser analisados por técnicas estatísticas tradicionais. O objetivo desse artigo foi comparar, em um exemplo baseado em dados reais, a utilização de três técnicas de imputações diferentes. MÉTODO: Os dados utilizados referem-se a um estudo de desenvolvimento de modelo de risco cirúrgico, sendo que o tamanho da amostra foi de 450 pacientes. Os métodos de imputação empregados foram duas imputações únicas e uma imputação múltipla (IM, e a suposição sobre o mecanismo de não-resposta foi MAR (Missing at Random. RESULTADOS: A variável com dados faltantes foi a albumina sérica, com 27,1% de perda. Os modelos obtidos pelas imputações únicas foram semelhantes entre si, mas diferentes dos obtidos com os dados imputados pela IM quanto à inclusão de variáveis nos modelos. CONCLUSÕES: Os resultados indicam que faz diferença levar em conta a relação da albumina com outras variáveis observadas, pois foram obtidos modelos diferentes nas imputações única e múltipla. A imputação única subestima a variabilidade, gerando intervalos de confiança mais estreitos. É importante se considerar o uso de métodos de imputação quando há dados faltantes, especialmente a IM que leva em conta a variabilidade entre imputações para as estimativas do modelo.INTRODUCTION: It is common for studies in health to face problems with missing data. Through imputation, complete data sets are built artificially and can be analyzed by traditional statistical analysis. The objective of this paper is to compare three types of imputation based on real data. METHODS: The data used came from a study on the development of risk models for surgical mortality. The

  18. Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets

    Min-Wei Huang

    2018-01-01

    Full Text Available Many real-world medical datasets contain some proportion of missing (attribute values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.

  19. Sequencing of the Chlamydophila psittaci ompA Gene Reveals a New Genotype, E/B, and the Need for a Rapid Discriminatory Genotyping Method

    Geens, Tom; Desplanques, Ann; Van Loock, Marnix; Bönner, Brigitte M.; Kaleta, Erhard F.; Magnino, Simone; Andersen, Arthur A.; Everett, Karin D. E.; Vanrompay, Daisy

    2005-01-01

    Twenty-one avian Chlamydophila psittaci isolates from different European countries were characterized using ompA restriction fragment length polymorphism, ompA sequencing, and major outer membrane protein serotyping. Results reveal the presence of a new genotype, E/B, in several European countries and stress the need for a discriminatory rapid genotyping method. PMID:15872282

  20. Synthetic Multiple-Imputation Procedure for Multistage Complex Samples

    Zhou Hanzhi

    2016-03-01

    Full Text Available Multiple imputation (MI is commonly used when item-level missing data are present. However, MI requires that survey design information be built into the imputation models. For multistage stratified clustered designs, this requires dummy variables to represent strata as well as primary sampling units (PSUs nested within each stratum in the imputation model. Such a modeling strategy is not only operationally burdensome but also inferentially inefficient when there are many strata in the sample design. Complexity only increases when sampling weights need to be modeled. This article develops a generalpurpose analytic strategy for population inference from complex sample designs with item-level missingness. In a simulation study, the proposed procedures demonstrate efficient estimation and good coverage properties. We also consider an application to accommodate missing body mass index (BMI data in the analysis of BMI percentiles using National Health and Nutrition Examination Survey (NHANES III data. We argue that the proposed methods offer an easy-to-implement solution to problems that are not well-handled by current MI techniques. Note that, while the proposed method borrows from the MI framework to develop its inferential methods, it is not designed as an alternative strategy to release multiply imputed datasets for complex sample design data, but rather as an analytic strategy in and of itself.

  1. Imputation-based analysis of association studies: candidate regions and quantitative traits.

    Bertrand Servin

    2007-07-01

    Full Text Available We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest with genotype data at tag SNPs collected on a phenotyped study sample, to estimate ("impute" unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene, the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html.

  2. Methods of developing core collections based on the predicted genotypic value of rice ( Oryza sativa L.).

    Li, C T; Shi, C H; Wu, J G; Xu, H M; Zhang, H Z; Ren, Y L

    2004-04-01

    The selection of an appropriate sampling strategy and a clustering method is important in the construction of core collections based on predicted genotypic values in order to retain the greatest degree of genetic diversity of the initial collection. In this study, methods of developing rice core collections were evaluated based on the predicted genotypic values for 992 rice varieties with 13 quantitative traits. The genotypic values of the traits were predicted by the adjusted unbiased prediction (AUP) method. Based on the predicted genotypic values, Mahalanobis distances were calculated and employed to measure the genetic similarities among the rice varieties. Six hierarchical clustering methods, including the single linkage, median linkage, centroid, unweighted pair-group average, weighted pair-group average and flexible-beta methods, were combined with random, preferred and deviation sampling to develop 18 core collections of rice germplasm. The results show that the deviation sampling strategy in combination with the unweighted pair-group average method of hierarchical clustering retains the greatest degree of genetic diversities of the initial collection. The core collections sampled using predicted genotypic values had more genetic diversity than those based on phenotypic values.

  3. Reducing false-positive incidental findings with ensemble genotyping and logistic regression based variant filtering methods.

    Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choe, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B; Gupta, Neha; Kohane, Isaac S; Green, Robert C; Kong, Sek Won

    2014-08-01

    As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates. © 2014 WILEY PERIODICALS, INC.

  4. Use of a New High Resolution Melting Method for Genotyping Pathogenic Leptospira spp.

    Florence Naze

    Full Text Available Leptospirosis is a worldwide zoonosis that is endemic in tropical areas, such as Reunion Island. The species Leptospira interrogans is the primary agent in human infections, but other pathogenic species, such as L. kirschner and L. borgpetersenii, are also associated with human leptospirosis.In this study, a melting curve analysis of the products that were amplified with the primer pairs lfb1 F/R and G1/G2 facilitated an accurate species classification of Leptospira reference strains. Next, we combined an unsupervised high resolution melting (HRM method with a new statistical approach using primers to amplify a two variable-number tandem-repeat (VNTR for typing at the subspecies level. The HRM analysis, which was performed with ScreenClust Software, enabled the identification of genotypes at the serovar level with high resolution power (Hunter-Gaston index 0.984. This method was also applied to Leptospira DNA from blood samples that were obtained from Reunion Island after 1998. We were able to identify a unique genotype that is identical to that of the L. interrogans serovars Copenhageni and Icterohaemorrhagiae, suggesting that this genotype is the major cause of leptospirosis on Reunion Island.Our simple, rapid, and robust genotyping method enables the identification of Leptospira strains at the species and subspecies levels and supports the direct genotyping of Leptospira in biological samples without requiring cultures.

  5. Helicobacter pylori in dyspepsia: Phenotypic and genotypic methods of diagnosis

    Vignesh Shetty

    2017-01-01

    Full Text Available Background: Helicobacter pylori affects almost half of the world's population and therefore is one of the most frequent and persistent bacterial infections worldwide. H. pylori is associated with chronic gastritis, ulcer disease (gastric and duodenal, mucosa-associated lymphoid tissue lymphoma, and gastric cancer. Several diagnostic methods exist to detect infection and the option of one method or another depends on various genes, such as availability, advantages and disadvantages of each method, monetary value, and the age of patients. Materials and Methods: Patients with complaints of abdominal pain, discomfort, acidity, and loss of appetite were chosen for endoscopy, detailed history was contained, and a physical examination was conducted before endoscopy. Biopsies (antrum + body were received from each patient and subjected to rapid urease test (RUT, histopathological examination (HPE, polymerase chain reaction (PCR, and culture. Results: Of the total 223 biopsy specimens obtained from dyspeptic patients, 122 (54.7% were positive for H. pylori for HPE, 109 (48.9% by RUT, 65 (29.1% by culture, and 117 (52.5% by PCR. The specificity and sensitivity were as follows: RUT (99% and 88.5%, phosphoglucosamine mutase PCR assay (100% and 95.9%, and culture (100% and 53.3%, respectively. Conclusion: In this study, we compared the various diagnostic methods used to identify H. pylori infection indicating that, in comparison with histology as gold standard for detection of H. pylori infection, culture and PCR showed 100% specificity whereas RUT and PCR showed 99% and 100% sensitivity, respectively.

  6. Efficient genome-wide genotyping strategies and data integration in crop plants.

    Torkamaneh, Davoud; Boyle, Brian; Belzile, François

    2018-03-01

    Next-generation sequencing (NGS) has revolutionized plant and animal research by providing powerful genotyping methods. This review describes and discusses the advantages, challenges and, most importantly, solutions to facilitate data processing, the handling of missing data, and cross-platform data integration. Next-generation sequencing technologies provide powerful and flexible genotyping methods to plant breeders and researchers. These methods offer a wide range of applications from genome-wide analysis to routine screening with a high level of accuracy and reproducibility. Furthermore, they provide a straightforward workflow to identify, validate, and screen genetic variants in a short time with a low cost. NGS-based genotyping methods include whole-genome re-sequencing, SNP arrays, and reduced representation sequencing, which are widely applied in crops. The main challenges facing breeders and geneticists today is how to choose an appropriate genotyping method and how to integrate genotyping data sets obtained from various sources. Here, we review and discuss the advantages and challenges of several NGS methods for genome-wide genetic marker development and genotyping in crop plants. We also discuss how imputation methods can be used to both fill in missing data in genotypic data sets and to integrate data sets obtained using different genotyping tools. It is our hope that this synthetic view of genotyping methods will help geneticists and breeders to integrate these NGS-based methods in crop plant breeding and research.

  7. Missing value imputation for microarray gene expression data using histone acetylation information

    Feng Jihua

    2008-05-01

    Full Text Available Abstract Background It is an important pre-processing step to accurately estimate missing values in microarray data, because complete datasets are required in numerous expression profile analysis in bioinformatics. Although several methods have been suggested, their performances are not satisfactory for datasets with high missing percentages. Results The paper explores the feasibility of doing missing value imputation with the help of gene regulatory mechanism. An imputation framework called histone acetylation information aided imputation method (HAIimpute method is presented. It incorporates the histone acetylation information into the conventional KNN(k-nearest neighbor and LLS(local least square imputation algorithms for final prediction of the missing values. The experimental results indicated that the use of acetylation information can provide significant improvements in microarray imputation accuracy. The HAIimpute methods consistently improve the widely used methods such as KNN and LLS in terms of normalized root mean squared error (NRMSE. Meanwhile, the genes imputed by HAIimpute methods are more correlated with the original complete genes in terms of Pearson correlation coefficients. Furthermore, the proposed methods also outperform GOimpute, which is one of the existing related methods that use the functional similarity as the external information. Conclusion We demonstrated that the using of histone acetylation information could greatly improve the performance of the imputation especially at high missing percentages. This idea can be generalized to various imputation methods to facilitate the performance. Moreover, with more knowledge accumulated on gene regulatory mechanism in addition to histone acetylation, the performance of our approach can be further improved and verified.

  8. Method: a single nucleotide polymorphism genotyping method for Wheat streak mosaic virus

    2012-01-01

    Background The September 11, 2001 attacks on the World Trade Center and the Pentagon increased the concern about the potential for terrorist attacks on many vulnerable sectors of the US, including agriculture. The concentrated nature of crops, easily obtainable biological agents, and highly detrimental impacts make agroterrorism a potential threat. Although procedures for an effective criminal investigation and attribution following such an attack are available, important enhancements are still needed, one of which is the capability for fine discrimination among pathogen strains. The purpose of this study was to develop a molecular typing assay for use in a forensic investigation, using Wheat streak mosaic virus (WSMV) as a model plant virus. Method This genotyping technique utilizes single base primer extension to generate a genetic fingerprint. Fifteen single nucleotide polymorphisms (SNPs) within the coat protein and helper component-protease genes were selected as the genetic markers for this assay. Assay optimization and sensitivity testing was conducted using synthetic targets. WSMV strains and field isolates were collected from regions around the world and used to evaluate the assay for discrimination. The assay specificity was tested against a panel of near-neighbors consisting of genetic and environmental near-neighbors. Result Each WSMV strain or field isolate tested produced a unique SNP fingerprint, with the exception of three isolates collected within the same geographic location that produced indistinguishable fingerprints. The results were consistent among replicates, demonstrating the reproducibility of the assay. No SNP fingerprints were generated from organisms included in the near-neighbor panel, suggesting the assay is specific for WSMV. Using synthetic targets, a complete profile could be generated from as low as 7.15 fmoles of cDNA. Conclusion The molecular typing method presented is one tool that could be incorporated into the forensic

  9. Discriminative power of Campylobacter phenotypic and genotypic typing methods.

    Duarte, Alexandra; Seliwiorstow, Tomasz; Miller, William G; De Zutter, Lieven; Uyttendaele, Mieke; Dierick, Katelijne; Botteldoorn, Nadine

    2016-06-01

    The aim of this study was to compare different typing methods, individually and combined, for use in the monitoring of Campylobacter in food. Campylobacter jejuni (n=94) and Campylobacter coli (n=52) isolated from different broiler meat carcasses were characterized using multilocus sequence typing (MLST), flagellin gene A restriction fragment length polymorphism typing (flaA-RFLP), antimicrobial resistance profiling (AMRp), the presence/absence of 5 putative virulence genes; and, exclusively for C. jejuni, the determination of lipooligosaccharide (LOS) class. Discriminatory power was calculated by the Simpson's index of diversity (SID) and the congruence was measured by the adjusted Rand index and adjusted Wallace coefficient. MLST was individually the most discriminative typing method for both C. jejuni (SID=0.981) and C. coli (SID=0.957). The most discriminative combination with a SID of 0.992 for both C. jejuni and C. coli was obtained by combining MLST with flaA-RFLP. The combination of MLST with flaA-RFLP is an easy and feasible typing method for short-term monitoring of Campylobacter in broiler meat carcass. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population

    Danai Jattawa

    2016-04-01

    Full Text Available The objective of this study was to investigate the accuracy of imputation from low density (LDC to moderate density SNP chips (MDC in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244 from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570, GGP26K (n = 540 and GGP80K (n = 134 chips. After checking for single nucleotide polymorphism (SNP quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912 and a test group (n = 332. The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652. The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm, FImpute 2.2 (combined family- and population-based algorithms and Findhap 4 (combined family- and population-based algorithms. Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94% than Findhap (84.64% and Beagle (76.79%. Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73% or low (80% imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart. Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information.

  11. Flexible Modeling of Survival Data with Covariates Subject to Detection Limits via Multiple Imputation.

    Bernhardt, Paul W; Wang, Huixia Judy; Zhang, Daowen

    2014-01-01

    Models for survival data generally assume that covariates are fully observed. However, in medical studies it is not uncommon for biomarkers to be censored at known detection limits. A computationally-efficient multiple imputation procedure for modeling survival data with covariates subject to detection limits is proposed. This procedure is developed in the context of an accelerated failure time model with a flexible seminonparametric error distribution. The consistency and asymptotic normality of the multiple imputation estimator are established and a consistent variance estimator is provided. An iterative version of the proposed multiple imputation algorithm that approximates the EM algorithm for maximum likelihood is also suggested. Simulation studies demonstrate that the proposed multiple imputation methods work well while alternative methods lead to estimates that are either biased or more variable. The proposed methods are applied to analyze the dataset from a recently-conducted GenIMS study.

  12. Stepwise threshold clustering: a new method for genotyping MHC loci using next-generation sequencing technology.

    William E Stutz

    Full Text Available Genes of the vertebrate major histocompatibility complex (MHC are of great interest to biologists because of their important role in immunity and disease, and their extremely high levels of genetic diversity. Next generation sequencing (NGS technologies are quickly becoming the method of choice for high-throughput genotyping of multi-locus templates like MHC in non-model organisms. Previous approaches to genotyping MHC genes using NGS technologies suffer from two problems:1 a "gray zone" where low frequency alleles and high frequency artifacts can be difficult to disentangle and 2 a similar sequence problem, where very similar alleles can be difficult to distinguish as two distinct alleles. Here were present a new method for genotyping MHC loci--Stepwise Threshold Clustering (STC--that addresses these problems by taking full advantage of the increase in sequence data provided by NGS technologies. Unlike previous approaches for genotyping MHC with NGS data that attempt to classify individual sequences as alleles or artifacts, STC uses a quasi-Dirichlet clustering algorithm to cluster similar sequences at increasing levels of sequence similarity. By applying frequency and similarity based criteria to clusters rather than individual sequences, STC is able to successfully identify clusters of sequences that correspond to individual or similar alleles present in the genomes of individual samples. Furthermore, STC does not require duplicate runs of all samples, increasing the number of samples that can be genotyped in a given project. We show how the STC method works using a single sample library. We then apply STC to 295 threespine stickleback (Gasterosteus aculeatus samples from four populations and show that neighboring populations differ significantly in MHC allele pools. We show that STC is a reliable, accurate, efficient, and flexible method for genotyping MHC that will be of use to biologists interested in a variety of downstream applications.

  13. Method: a single nucleotide polymorphism genotyping method for Wheat streak mosaic virus.

    Rogers, Stephanie M; Payton, Mark; Allen, Robert W; Melcher, Ulrich; Carver, Jesse; Fletcher, Jacqueline

    2012-05-17

    The September 11, 2001 attacks on the World Trade Center and the Pentagon increased the concern about the potential for terrorist attacks on many vulnerable sectors of the US, including agriculture. The concentrated nature of crops, easily obtainable biological agents, and highly detrimental impacts make agroterrorism a potential threat. Although procedures for an effective criminal investigation and attribution following such an attack are available, important enhancements are still needed, one of which is the capability for fine discrimination among pathogen strains. The purpose of this study was to develop a molecular typing assay for use in a forensic investigation, using Wheat streak mosaic virus (WSMV) as a model plant virus. This genotyping technique utilizes single base primer extension to generate a genetic fingerprint. Fifteen single nucleotide polymorphisms (SNPs) within the coat protein and helper component-protease genes were selected as the genetic markers for this assay. Assay optimization and sensitivity testing was conducted using synthetic targets. WSMV strains and field isolates were collected from regions around the world and used to evaluate the assay for discrimination. The assay specificity was tested against a panel of near-neighbors consisting of genetic and environmental near-neighbors. Each WSMV strain or field isolate tested produced a unique SNP fingerprint, with the exception of three isolates collected within the same geographic location that produced indistinguishable fingerprints. The results were consistent among replicates, demonstrating the reproducibility of the assay. No SNP fingerprints were generated from organisms included in the near-neighbor panel, suggesting the assay is specific for WSMV. Using synthetic targets, a complete profile could be generated from as low as 7.15 fmoles of cDNA. The molecular typing method presented is one tool that could be incorporated into the forensic science tool box after a thorough

  14. Imputation of missing data in time series for air pollutants

    Junger, W. L.; Ponce de Leon, A.

    2015-02-01

    Missing data are major concerns in epidemiological studies of the health effects of environmental air pollutants. This article presents an imputation-based method that is suitable for multivariate time series data, which uses the EM algorithm under the assumption of normal distribution. Different approaches are considered for filtering the temporal component. A simulation study was performed to assess validity and performance of proposed method in comparison with some frequently used methods. Simulations showed that when the amount of missing data was as low as 5%, the complete data analysis yielded satisfactory results regardless of the generating mechanism of the missing data, whereas the validity began to degenerate when the proportion of missing values exceeded 10%. The proposed imputation method exhibited good accuracy and precision in different settings with respect to the patterns of missing observations. Most of the imputations obtained valid results, even under missing not at random. The methods proposed in this study are implemented as a package called mtsdi for the statistical software system R.

  15. Towards a more efficient representation of imputation operators in TPOT

    Garciarena, Unai; Mendiburu, Alexander; Santana, Roberto

    2018-01-01

    Automated Machine Learning encompasses a set of meta-algorithms intended to design and apply machine learning techniques (e.g., model selection, hyperparameter tuning, model assessment, etc.). TPOT, a software for optimizing machine learning pipelines based on genetic programming (GP), is a novel example of this kind of applications. Recently we have proposed a way to introduce imputation methods as part of TPOT. While our approach was able to deal with problems with missing data, it can prod...

  16. DTW-APPROACH FOR UNCORRELATED MULTIVARIATE TIME SERIES IMPUTATION

    Phan , Thi-Thu-Hong; Poisson Caillault , Emilie; Bigand , André; Lefebvre , Alain

    2017-01-01

    International audience; Missing data are inevitable in almost domains of applied sciences. Data analysis with missing values can lead to a loss of efficiency and unreliable results, especially for large missing sub-sequence(s). Some well-known methods for multivariate time series imputation require high correlations between series or their features. In this paper , we propose an approach based on the shape-behaviour relation in low/un-correlated multivariate time series under an assumption of...

  17. Kernel machine methods for integrative analysis of genome-wide methylation and genotyping studies.

    Zhao, Ni; Zhan, Xiang; Huang, Yen-Tsung; Almli, Lynn M; Smith, Alicia; Epstein, Michael P; Conneely, Karen; Wu, Michael C

    2018-03-01

    Many large GWAS consortia are expanding to simultaneously examine the joint role of DNA methylation in addition to genotype in the same subjects. However, integrating information from both data types is challenging. In this paper, we propose a composite kernel machine regression model to test the joint epigenetic and genetic effect. Our approach works at the gene level, which allows for a common unit of analysis across different data types. The model compares the pairwise similarities in the phenotype to the pairwise similarities in the genotype and methylation values; and high correspondence is suggestive of association. A composite kernel is constructed to measure the similarities in the genotype and methylation values between pairs of samples. We demonstrate through simulations and real data applications that the proposed approach can correctly control type I error, and is more robust and powerful than using only the genotype or methylation data in detecting trait-associated genes. We applied our method to investigate the genetic and epigenetic regulation of gene expression in response to stressful life events using data that are collected from the Grady Trauma Project. Within the kernel machine testing framework, our methods allow for heterogeneity in effect sizes, nonlinear, and interactive effects, as well as rapid P-value computation. © 2017 WILEY PERIODICALS, INC.

  18. A nonparametric multiple imputation approach for missing categorical data

    Muhan Zhou

    2017-06-01

    Full Text Available Abstract Background Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness probabilities. Methods We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model and the other fits a logistic regression for predicting missingness probabilities (the missingness model. A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. Results The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. Conclusions We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with

  19. Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa.

    Katya L Masconi

    Full Text Available Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation.Data from the Cape Town Bellville-South cohort served as the basis for this study. Imputation methods and models were identified via recent systematic reviews. Models' discrimination was assessed and compared using C-statistic and non-parametric methods, before and after recalibration through simple intercept adjustment.The study sample consisted of 1256 individuals, of whom 173 were excluded due to previously diagnosed diabetes. Of the final 1083 individuals, 329 (30.4% had missing data. Family history had the highest proportion of missing data (25%. Imputation of the outcome, undiagnosed diabetes, was highest in stochastic regression imputation (163 individuals. Overall, deletion resulted in the lowest model performances while simple imputation yielded the highest C-statistic for the Cambridge Diabetes Risk model, Kuwaiti Risk model, Omani Diabetes Risk model and Rotterdam Predictive model. Multiple imputation only yielded the highest C-statistic for the Rotterdam Predictive model, which were matched by simpler imputation methods.Deletion was confirmed as a poor technique for handling missing data. However, despite the emphasized disadvantages of simpler imputation methods, this study showed that implementing these methods results in similar predictive utility for undiagnosed diabetes when compared to multiple imputation.

  20. GACT: a Genome build and Allele definition Conversion Tool for SNP imputation and meta-analysis in genetic association studies.

    Sulovari, Arvis; Li, Dawei

    2014-07-19

    Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions. In this study, we have developed a tool, GACT, which stands for Genome build and Allele definition Conversion Tool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate > 0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs. GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of genotypes from SNP-arrays or deep

  1. Optimization of a method for the profiling and quantification of saponins in different green asparagus genotypes.

    Vázquez-Castilla, Sara; Jaramillo-Carmona, Sara; Fuentes-Alventosa, Jose María; Jiménez-Araujo, Ana; Rodriguez-Arcos, Rocío; Cermeño-Sacristán, Pedro; Espejo-Calvo, Juan Antonio; Guillén-Bejarano, Rafael

    2013-07-03

    The main goal of this study was the optimization of a HPLC-MS method for the qualitative and quantitative analysis of asparagus saponins. The method includes extraction with aqueous ethanol, cleanup by solid phase extraction, separation by reverse phase chromatography, electrospray ionization, and detection in a single quadrupole mass analyzer. The method was used for the comparison of selected genotypes of Huétor-Tájar asparagus landrace and selected varieties of commercial diploid hybrids of green asparagus. The results showed that while protodioscin was almost the only saponin detected in the commercial hybrids, eight different saponins were detected in the Huétor-Tájar asparagus genotypes. The mass spectra indicated that HT saponins are derived from a furostan type steroidal genin having a single bond between carbons 5 and 6 of the B ring. The total concentration of saponins was found to be higher in triguero asparagus than in commercial hybrids.

  2. Data driven estimation of imputation error-a strategy for imputation with a reject option

    Bak, Nikolaj; Hansen, Lars Kai

    2016-01-01

    Missing data is a common problem in many research fields and is a challenge that always needs careful considerations. One approach is to impute the missing values, i.e., replace missing values with estimates. When imputation is applied, it is typically applied to all records with missing values i...

  3. An accurate and efficient method for large-scale SSR genotyping and applications.

    Li, Lun; Fang, Zhiwei; Zhou, Junfei; Chen, Hong; Hu, Zhangfeng; Gao, Lifen; Chen, Lihong; Ren, Sheng; Ma, Hongyu; Lu, Long; Zhang, Weixiong; Peng, Hai

    2017-06-02

    Accurate and efficient genotyping of simple sequence repeats (SSRs) constitutes the basis of SSRs as an effective genetic marker with various applications. However, the existing methods for SSR genotyping suffer from low sensitivity, low accuracy, low efficiency and high cost. In order to fully exploit the potential of SSRs as genetic marker, we developed a novel method for SSR genotyping, named as AmpSeq-SSR, which combines multiplexing polymerase chain reaction (PCR), targeted deep sequencing and comprehensive analysis. AmpSeq-SSR is able to genotype potentially more than a million SSRs at once using the current sequencing techniques. In the current study, we simultaneously genotyped 3105 SSRs in eight rice varieties, which were further validated experimentally. The results showed that the accuracies of AmpSeq-SSR were nearly 100 and 94% with a single base resolution for homozygous and heterozygous samples, respectively. To demonstrate the power of AmpSeq-SSR, we adopted it in two applications. The first was to construct discriminative fingerprints of the rice varieties using 3105 SSRs, which offer much greater discriminative power than the 48 SSRs commonly used for rice. The second was to map Xa21, a gene that confers persistent resistance to rice bacterial blight. We demonstrated that genome-scale fingerprints of an organism can be efficiently constructed and candidate genes, such as Xa21 in rice, can be accurately and efficiently mapped using an innovative strategy consisting of multiplexing PCR, targeted sequencing and computational analysis. While the work we present focused on rice, AmpSeq-SSR can be readily extended to animals and micro-organisms. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Imputation of variants from the 1000 Genomes Project modestly improves known associations and can identify low-frequency variant-phenotype associations undetected by HapMap based imputation.

    Wood, Andrew R; Perry, John R B; Tanaka, Toshiko; Hernandez, Dena G; Zheng, Hou-Feng; Melzer, David; Gibbs, J Raphael; Nalls, Michael A; Weedon, Michael N; Spector, Tim D; Richards, J Brent; Bandinelli, Stefania; Ferrucci, Luigi; Singleton, Andrew B; Frayling, Timothy M

    2013-01-01

    Genome-wide association (GWA) studies have been limited by the reliance on common variants present on microarrays or imputable from the HapMap Project data. More recently, the completion of the 1000 Genomes Project has provided variant and haplotype information for several million variants derived from sequencing over 1,000 individuals. To help understand the extent to which more variants (including low frequency (1% ≤ MAF 1000 Genomes imputation, respectively, and 9 and 11 that reached a stricter, likely conservative, threshold of P1000 Genomes genotype data modestly improved the strength of known associations. Of 20 associations detected at P1000 Genomes imputed data and one was nominally more strongly associated in HapMap imputed data. We also detected an association between a low frequency variant and phenotype that was previously missed by HapMap based imputation approaches. An association between rs112635299 and alpha-1 globulin near the SERPINA gene represented the known association between rs28929474 (MAF = 0.007) and alpha1-antitrypsin that predisposes to emphysema (P = 2.5×10(-12)). Our data provide important proof of principle that 1000 Genomes imputation will detect novel, low frequency-large effect associations.

  5. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    Huang, Jie; Howie, Bryan; Mccarthy, Shane

    2015-01-01

    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low de...

  6. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    J. Huang (Jie); B. Howie (Bryan); S. McCarthy (Shane); Y. Memari (Yasin); K. Walter (Klaudia); J.L. Min (Josine L.); P. Danecek (Petr); G. Malerba (Giovanni); E. Trabetti (Elisabetta); H.-F. Zheng (Hou-Feng); G. Gambaro (Giovanni); J.B. Richards (Brent); R. Durbin (Richard); N.J. Timpson (Nicholas); J. Marchini (Jonathan); N. Soranzo (Nicole); S.H. Al Turki (Saeed); A. Amuzu (Antoinette); C. Anderson (Carl); R. Anney (Richard); D. Antony (Dinu); M.S. Artigas; M. Ayub (Muhammad); S. Bala (Senduran); J.C. Barrett (Jeffrey); I.E. Barroso (Inês); P.L. Beales (Philip); M. Benn (Marianne); J. Bentham (Jamie); S. Bhattacharya (Shoumo); E. Birney (Ewan); D.H.R. Blackwood (Douglas); M. Bobrow (Martin); E. Bochukova (Elena); P.F. Bolton (Patrick F.); R. Bounds (Rebecca); C. Boustred (Chris); G. Breen (Gerome); M. Calissano (Mattia); K. Carss (Keren); J.P. Casas (Juan Pablo); J.C. Chambers (John C.); R. Charlton (Ruth); K. Chatterjee (Krishna); L. Chen (Lu); A. Ciampi (Antonio); S. Cirak (Sebahattin); P. Clapham (Peter); G. Clement (Gail); G. Coates (Guy); M. Cocca (Massimiliano); D.A. Collier (David); C. Cosgrove (Catherine); T. Cox (Tony); N.J. Craddock (Nick); L. Crooks (Lucy); S. Curran (Sarah); D. Curtis (David); A. Daly (Allan); I.N.M. Day (Ian N.M.); A.G. Day-Williams (Aaron); G.V. Dedoussis (George); T. Down (Thomas); Y. Du (Yuanping); C.M. van Duijn (Cornelia); I. Dunham (Ian); T. Edkins (Ted); R. Ekong (Rosemary); P. Ellis (Peter); D.M. Evans (David); I.S. Farooqi (I. Sadaf); D.R. Fitzpatrick (David R.); P. Flicek (Paul); J. Floyd (James); A.R. Foley (A. Reghan); C.S. Franklin (Christopher S.); M. Futema (Marta); L. Gallagher (Louise); P. Gasparini (Paolo); T.R. Gaunt (Tom); M. Geihs (Matthias); D. Geschwind (Daniel); C.M.T. Greenwood (Celia); H. Griffin (Heather); D. Grozeva (Detelina); X. Guo (Xiaosen); X. Guo (Xueqin); H. Gurling (Hugh); D. Hart (Deborah); A.E. Hendricks (Audrey E.); P.A. Holmans (Peter A.); L. Huang (Liren); T. Hubbard (Tim); S.E. Humphries (Steve E.); M.E. Hurles (Matthew); P.G. Hysi (Pirro); V. Iotchkova (Valentina); A. Isaacs (Aaron); D.K. Jackson (David K.); Y. Jamshidi (Yalda); J. Johnson (Jon); C. Joyce (Chris); K.J. Karczewski (Konrad); J. Kaye (Jane); T. Keane (Thomas); J.P. Kemp (John); K. Kennedy (Karen); A. Kent (Alastair); J. Keogh (Julia); F. Khawaja (Farrah); M.E. Kleber (Marcus); M. Van Kogelenberg (Margriet); A. Kolb-Kokocinski (Anja); J.S. Kooner (Jaspal S.); G. Lachance (Genevieve); C. Langenberg (Claudia); C. Langford (Cordelia); D. Lawson (Daniel); I. Lee (Irene); E.M. van Leeuwen (Elisa); M. Lek (Monkol); R. Li (Rui); Y. Li (Yingrui); J. Liang (Jieqin); H. Lin (Hong); R. Liu (Ryan); J. Lönnqvist (Jouko); L.R. Lopes (Luis R.); M.C. Lopes (Margarida); J. Luan; D.G. MacArthur (Daniel G.); M. Mangino (Massimo); G. Marenne (Gaëlle); W. März (Winfried); J. Maslen (John); A. Matchan (Angela); I. Mathieson (Iain); P. McGuffin (Peter); A.M. McIntosh (Andrew); A.G. McKechanie (Andrew G.); A. McQuillin (Andrew); S. Metrustry (Sarah); N. Migone (Nicola); H.M. Mitchison (Hannah M.); A. Moayyeri (Alireza); J. Morris (James); R. Morris (Richard); D. Muddyman (Dawn); F. Muntoni; B.G. Nordestgaard (Børge G.); K. Northstone (Kate); M.C. O'donovan (Michael); S. O'Rahilly (Stephen); A. Onoufriadis (Alexandros); K. Oualkacha (Karim); M.J. Owen (Michael J.); A. Palotie (Aarno); K. Panoutsopoulou (Kalliope); V. Parker (Victoria); J.R. Parr (Jeremy R.); L. Paternoster (Lavinia); T. Paunio (Tiina); F. Payne (Felicity); S.J. Payne (Stewart J.); J.R.B. Perry (John); O.P.H. Pietiläinen (Olli); V. Plagnol (Vincent); R.C. Pollitt (Rebecca C.); S. Povey (Sue); M.A. Quail (Michael A.); L. Quaye (Lydia); L. Raymond (Lucy); K. Rehnström (Karola); C.K. Ridout (Cheryl K.); S.M. Ring (Susan); G.R.S. Ritchie (Graham R.S.); N. Roberts (Nicola); R.L. Robinson (Rachel L.); D.B. Savage (David); P.J. Scambler (Peter); S. Schiffels (Stephan); M. Schmidts (Miriam); N. Schoenmakers (Nadia); R.H. Scott (Richard H.); R.A. Scott (Robert); R.K. Semple (Robert K.); E. Serra (Eva); S.I. Sharp (Sally I.); A.C. Shaw (Adam C.); H.A. Shihab (Hashem A.); S.-Y. Shin (So-Youn); D. Skuse (David); K.S. Small (Kerrin); C. Smee (Carol); G.D. Smith; L. Southam (Lorraine); O. Spasic-Boskovic (Olivera); T.D. Spector (Timothy); D. St. Clair (David); B. St Pourcain (Beate); J. Stalker (Jim); E. Stevens (Elizabeth); J. Sun (Jianping); G. Surdulescu (Gabriela); J. Suvisaari (Jaana); P. Syrris (Petros); I. Tachmazidou (Ioanna); R. Taylor (Rohan); J. Tian (Jing); M.D. Tobin (Martin); D. Toniolo (Daniela); M. Traglia (Michela); A. Tybjaerg-Hansen; A.M. Valdes; A.M. Vandersteen (Anthony M.); A. Varbo (Anette); P. Vijayarangakannan (Parthiban); P.M. Visscher (Peter); L.V. Wain (Louise); J.T. Walters (James); G. Wang (Guangbiao); J. Wang (Jun); Y. Wang (Yu); K. Ward (Kirsten); E. Wheeler (Eleanor); P.H. Whincup (Peter); T. Whyte (Tamieka); H.J. Williams (Hywel J.); K.A. Williamson (Kathleen); C. Wilson (Crispian); S.G. Wilson (Scott); K. Wong (Kim); C. Xu (Changjiang); J. Yang (Jian); G. Zaza (Gianluigi); E. Zeggini (Eleftheria); F. Zhang (Feng); P. Zhang (Pingbo); W. Zhang (Weihua)

    2015-01-01

    textabstractImputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced

  7. Construction and application of a Korean reference panel for imputing classical alleles and amino acids of human leukocyte antigen genes.

    Kim, Kwangwoo; Bang, So-Young; Lee, Hye-Soon; Bae, Sang-Cheol

    2014-01-01

    Genetic variations of human leukocyte antigen (HLA) genes within the major histocompatibility complex (MHC) locus are strongly associated with disease susceptibility and prognosis for many diseases, including many autoimmune diseases. In this study, we developed a Korean HLA reference panel for imputing classical alleles and amino acid residues of several HLA genes. An HLA reference panel has potential for use in identifying and fine-mapping disease associations with the MHC locus in East Asian populations, including Koreans. A total of 413 unrelated Korean subjects were analyzed for single nucleotide polymorphisms (SNPs) at the MHC locus and six HLA genes, including HLA-A, -B, -C, -DRB1, -DPB1, and -DQB1. The HLA reference panel was constructed by phasing the 5,858 MHC SNPs, 233 classical HLA alleles, and 1,387 amino acid residue markers from 1,025 amino acid positions as binary variables. The imputation accuracy of the HLA reference panel was assessed by measuring concordance rates between imputed and genotyped alleles of the HLA genes from a subset of the study subjects and East Asian HapMap individuals. Average concordance rates were 95.6% and 91.1% at 2-digit and 4-digit allele resolutions, respectively. The imputation accuracy was minimally affected by SNP density of a test dataset for imputation. In conclusion, the Korean HLA reference panel we developed was highly suitable for imputing HLA alleles and amino acids from MHC SNPs in East Asians, including Koreans.

  8. Construction and application of a Korean reference panel for imputing classical alleles and amino acids of human leukocyte antigen genes.

    Kwangwoo Kim

    Full Text Available Genetic variations of human leukocyte antigen (HLA genes within the major histocompatibility complex (MHC locus are strongly associated with disease susceptibility and prognosis for many diseases, including many autoimmune diseases. In this study, we developed a Korean HLA reference panel for imputing classical alleles and amino acid residues of several HLA genes. An HLA reference panel has potential for use in identifying and fine-mapping disease associations with the MHC locus in East Asian populations, including Koreans. A total of 413 unrelated Korean subjects were analyzed for single nucleotide polymorphisms (SNPs at the MHC locus and six HLA genes, including HLA-A, -B, -C, -DRB1, -DPB1, and -DQB1. The HLA reference panel was constructed by phasing the 5,858 MHC SNPs, 233 classical HLA alleles, and 1,387 amino acid residue markers from 1,025 amino acid positions as binary variables. The imputation accuracy of the HLA reference panel was assessed by measuring concordance rates between imputed and genotyped alleles of the HLA genes from a subset of the study subjects and East Asian HapMap individuals. Average concordance rates were 95.6% and 91.1% at 2-digit and 4-digit allele resolutions, respectively. The imputation accuracy was minimally affected by SNP density of a test dataset for imputation. In conclusion, the Korean HLA reference panel we developed was highly suitable for imputing HLA alleles and amino acids from MHC SNPs in East Asians, including Koreans.

  9. The utility of imputed matched sets. Analyzing probabilistically linked databases in a low information setting.

    Thomas, A M; Cook, L J; Dean, J M; Olson, L M

    2014-01-01

    To compare results from high probability matched sets versus imputed matched sets across differing levels of linkage information. A series of linkages with varying amounts of available information were performed on two simulated datasets derived from multiyear motor vehicle crash (MVC) and hospital databases, where true matches were known. Distributions of high probability and imputed matched sets were compared against the true match population for occupant age, MVC county, and MVC hour. Regression models were fit to simulated log hospital charges and hospitalization status. High probability and imputed matched sets were not significantly different from occupant age, MVC county, and MVC hour in high information settings (p > 0.999). In low information settings, high probability matched sets were significantly different from occupant age and MVC county (p sets were not (p > 0.493). High information settings saw no significant differences in inference of simulated log hospital charges and hospitalization status between the two methods. High probability and imputed matched sets were significantly different from the outcomes in low information settings; however, imputed matched sets were more robust. The level of information available to a linkage is an important consideration. High probability matched sets are suitable for high to moderate information settings and for situations involving case-specific analysis. Conversely, imputed matched sets are preferable for low information settings when conducting population-based analyses.

  10. Analysis of Case-Control Association Studies: SNPs, Imputation and Haplotypes

    Chatterjee, Nilanjan

    2009-11-01

    Although prospective logistic regression is the standard method of analysis for case-control data, it has been recently noted that in genetic epidemiologic studies one can use the "retrospective" likelihood to gain major power by incorporating various population genetics model assumptions such as Hardy-Weinberg-Equilibrium (HWE), gene-gene and gene-environment independence. In this article we review these modern methods and contrast them with the more classical approaches through two types of applications (i) association tests for typed and untyped single nucleotide polymorphisms (SNPs) and (ii) estimation of haplotype effects and haplotype-environment interactions in the presence of haplotype-phase ambiguity. We provide novel insights to existing methods by construction of various score-tests and pseudo-likelihoods. In addition, we describe a novel two-stage method for analysis of untyped SNPs that can use any flexible external algorithm for genotype imputation followed by a powerful association test based on the retrospective likelihood. We illustrate applications of the methods using simulated and real data. © Institute of Mathematical Statistics, 2009.

  11. Analysis of Case-Control Association Studies: SNPs, Imputation and Haplotypes

    Chatterjee, Nilanjan; Chen, Yi-Hau; Luo, Sheng; Carroll, Raymond J.

    2009-01-01

    Although prospective logistic regression is the standard method of analysis for case-control data, it has been recently noted that in genetic epidemiologic studies one can use the "retrospective" likelihood to gain major power by incorporating various population genetics model assumptions such as Hardy-Weinberg-Equilibrium (HWE), gene-gene and gene-environment independence. In this article we review these modern methods and contrast them with the more classical approaches through two types of applications (i) association tests for typed and untyped single nucleotide polymorphisms (SNPs) and (ii) estimation of haplotype effects and haplotype-environment interactions in the presence of haplotype-phase ambiguity. We provide novel insights to existing methods by construction of various score-tests and pseudo-likelihoods. In addition, we describe a novel two-stage method for analysis of untyped SNPs that can use any flexible external algorithm for genotype imputation followed by a powerful association test based on the retrospective likelihood. We illustrate applications of the methods using simulated and real data. © Institute of Mathematical Statistics, 2009.

  12. A Multipoint Method for Detecting Genotyping Errors and Mutations in Sibling-Pair Linkage Data

    Douglas, Julie A.; Boehnke, Michael; Lange, Kenneth

    2000-01-01

    The identification of genes contributing to complex diseases and quantitative traits requires genetic data of high fidelity, because undetected errors and mutations can profoundly affect linkage information. The recent emphasis on the use of the sibling-pair design eliminates or decreases the likelihood of detection of genotyping errors and marker mutations through apparent Mendelian incompatibilities or close double recombinants. In this article, we describe a hidden Markov method for detect...

  13. Development of a bead-based multiplex genotyping method for diagnostic characterization of HPV infection.

    Mee Young Chung

    Full Text Available The accurate genotyping of human papillomavirus (HPV is clinically important because the oncogenic potential of HPV is dependent on specific genotypes. Here, we described the development of a bead-based multiplex HPV genotyping (MPG method which is able to detect 20 types of HPV (15 high-risk HPV types 16, 18, 31, 33, 35, 39, 45, 51, 52, 53, 56, 58, 59, 66, 68 and 5 low-risk HPV types 6, 11, 40, 55, 70 and evaluated its accuracy with sequencing. A total of 890 clinical samples were studied. Among these samples, 484 were HPV positive and 406 were HPV negative by consensus primer (PGMY09/11 directed PCR. The genotyping of 484 HPV positive samples was carried out by the bead-based MPG method. The accuracy was 93.5% (95% CI, 91.0-96.0, 80.1% (95% CI, 72.3-87.9 for single and multiple infections, respectively, while a complete type mismatch was observed only in one sample. The MPG method indiscriminately detected dysplasia of several cytological grades including 71.8% (95% CI, 61.5-82.3 of ASCUS (atypical squamous cells of undetermined significance and more specific for high grade lesions. For women with HSIL (high grade squamous intraepithelial lesion and SCC diagnosis, 32 women showed a PPV (positive predictive value of 77.3% (95% CI, 64.8-89.8. Among women >40 years of age, 22 women with histological cervical cancer lesions showed a PPV of 88% (95% CI, 75.3-100. Of the highest risk HPV types including HPV-16, 18 and 31 positive women of the same age groups, 34 women with histological cervical cancer lesions showed a PPV of 77.3% (95% CI, 65.0-89.6. Taken together, the bead-based MPG method could successfully detect high-grade lesions and high-risk HPV types with a high degree of accuracy in clinical samples.

  14. Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation.

    Adikaram, K K L B; Hussein, M A; Effenberger, M; Becker, T

    2015-01-01

    Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2/n to identify linear fit, where n is number of terms in a series. The ratio Rmax of amax - amin and Sn - amin*n and that of Rmin of amax - amin and amax*n - Sn are always equal to 2/n, where amax is the maximum element, amin is the minimum element and Sn is the sum of all elements. If any series expected to follow y = c consists of data that do not agree with y = c form, Rmax > 2/n and Rmin > 2/n imply that the maximum and minimum elements, respectively, do not agree with linear fit. We define threshold values for outliers and noise detection as 2/n * (1 + k1) and 2/n * (1 + k2), respectively, where k1 > k2 and 0 ≤ k1 ≤ n/2 - 1. Given this relation and transformation technique, which transforms data into the form y = c, we show that removing all data that do not agree with linear fit is possible. Furthermore, the method is independent of the number of data points, missing data, removed data points and nature of distribution (Gaussian or non-Gaussian) of outliers, noise and clean data. These are major advantages over the existing linear fit methods. Since having a perfect linear relation between two variables in the real world is impossible, we used artificial data sets with extreme conditions to verify the method. The method detects the correct linear fit when the percentage of data agreeing with linear fit is less than 50%, and the deviation of data that do not agree with linear fit is very small, of the order of ±10-4%. The method results in incorrect detections only when numerical accuracy is insufficient in the calculation process.

  15. Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation.

    K K L B Adikaram

    Full Text Available Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2/n to identify linear fit, where n is number of terms in a series. The ratio Rmax of amax - amin and Sn - amin*n and that of Rmin of amax - amin and amax*n - Sn are always equal to 2/n, where amax is the maximum element, amin is the minimum element and Sn is the sum of all elements. If any series expected to follow y = c consists of data that do not agree with y = c form, Rmax > 2/n and Rmin > 2/n imply that the maximum and minimum elements, respectively, do not agree with linear fit. We define threshold values for outliers and noise detection as 2/n * (1 + k1 and 2/n * (1 + k2, respectively, where k1 > k2 and 0 ≤ k1 ≤ n/2 - 1. Given this relation and transformation technique, which transforms data into the form y = c, we show that removing all data that do not agree with linear fit is possible. Furthermore, the method is independent of the number of data points, missing data, removed data points and nature of distribution (Gaussian or non-Gaussian of outliers, noise and clean data. These are major advantages over the existing linear fit methods. Since having a perfect linear relation between two variables in the real world is impossible, we used artificial data sets with extreme conditions to verify the method. The method detects the correct linear fit when the percentage of data agreeing with linear fit is less than 50%, and the deviation of data that do not agree with linear fit is very small, of the order of ±10-4%. The method results in incorrect detections only when numerical accuracy is insufficient in the calculation process.

  16. Design of a bovine low-density SNP array optimized for imputation.

    Didier Boichard

    Full Text Available The Illumina BovineLD BeadChip was designed to support imputation to higher density genotypes in dairy and beef breeds by including single-nucleotide polymorphisms (SNPs that had a high minor allele frequency as well as uniform spacing across the genome except at the ends of the chromosome where densities were increased. The chip also includes SNPs on the Y chromosome and mitochondrial DNA loci that are useful for determining subspecies classification and certain paternal and maternal breed lineages. The total number of SNPs was 6,909. Accuracy of imputation to Illumina BovineSNP50 genotypes using the BovineLD chip was over 97% for most dairy and beef populations. The BovineLD imputations were about 3 percentage points more accurate than those from the Illumina GoldenGate Bovine3K BeadChip across multiple populations. The improvement was greatest when neither parent was genotyped. The minor allele frequencies were similar across taurine beef and dairy breeds as was the proportion of SNPs that were polymorphic. The new BovineLD chip should facilitate low-cost genomic selection in taurine beef and dairy cattle.

  17. Development of a rapid HRM genotyping method for detection of dog-derived Giardia lamblia.

    Tan, Liping; Yu, Xingang; Abdullahi, Auwalu Yusuf; Wu, Sheng; Zheng, Guochao; Hu, Wei; Song, Meiran; Wang, Zhen; Jiang, Biao; Li, Guoqing

    2015-11-01

    Giardia lamblia is a zoonotic flagellate protozoan in the intestine of human and many mammals including dogs. To assess a threat of dog-derived G. lamblia to humans, the common dog-derived G. lamblia assemblages A, C, and D were genotyped by high-resolution melting (HRM) technology. According to β-giardin gene sequence, the qPCR-HRM primers BG5 and BG7 were designed. A series of experiments on the stability, sensitivity, and accuracy of the HRM method were also tested. Results showed that the primers BG5 and BG7 could distinguish among three assemblages A, C, and D, which Tm value differences were about 1 °C to each other. The melting curves of intra-assay reproducibility were almost coincided, and those of inter-assay reproducibility were much the same shape. The lowest detection concentration was about 5 × 10(-6)-ng/μL sample. The genotyping results from 21 G. lamblia samples by the HRM method were in complete accordance with sequencing results. It is concluded that the HRM genotyping method is rapid, stable, specific, highly sensitive, and suitable for clinical detection and molecular epidemiological survey of dog-derived G. lamblia.

  18. Dry matter genotypes of Cynodon by microwave and conventional oven methods

    Euclides Reuter de Oliveira

    2013-02-01

    Full Text Available The aimed of this work was to comparing the drying process in a microwave oven and forced air ventilation, as well as their effects on the chemical composition of different genotypes of the genus Cynodon (Tifton 85, Jiggs, Russell, Tifton 68 and Vaquero collected at different ages cutting (28, 48, 63 and 79 days. The experimental design was a randomized block in a split-plot design, with 4 replicates. There was no difference (P>0.05 between the methods analyzed on the chemical composition of the genotypes studied. Increasing age cutoff negatively influenced (P<0.05 the crude protein content of the different plant parts. A significant increase (P<0.05 of dry matter, neutral detergent fiber, acid detergent fiber and dry matter production was observed with increasing age cut. The use of the microwave oven is a quick and precise method obtain the dry matter content of the fodder showing efficiency similar to the method of drying in an oven with forced air circulation. The genotypes showed better chemical composition results when handled at age 28 days.

  19. Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation

    Adikaram, K. K. L. B.; Becker, T.

    2015-01-01

    Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2/n to identify linear fit, where n is number of terms in a series. The ratio R max of a max − a min and S n − a min *n and that of R min of a max − a min and a max *n − S n are always equal to 2/n, where a max is the maximum element, a min is the minimum element and S n is the sum of all elements. If any series expected to follow y = c consists of data that do not agree with y = c form, R max > 2/n and R min > 2/n imply that the maximum and minimum elements, respectively, do not agree with linear fit. We define threshold values for outliers and noise detection as 2/n * (1 + k 1 ) and 2/n * (1 + k 2 ), respectively, where k 1 > k 2 and 0 ≤ k 1 ≤ n/2 − 1. Given this relation and transformation technique, which transforms data into the form y = c, we show that removing all data that do not agree with linear fit is possible. Furthermore, the method is independent of the number of data points, missing data, removed data points and nature of distribution (Gaussian or non-Gaussian) of outliers, noise and clean data. These are major advantages over the existing linear fit methods. Since having a perfect linear relation between two variables in the real world is impossible, we used artificial data sets with extreme conditions to verify the method. The method detects the correct linear fit when the percentage of data agreeing with linear fit is less than 50%, and the deviation of data that do not agree with linear fit is very small, of the order of ±10−4%. The method results in incorrect detections only when numerical accuracy is insufficient in the calculation process. PMID:26571035

  20. A rapid genotyping method for an obligate fungal pathogen, Puccinia striiformis f.sp. tritici, based on DNA extraction from infected leaf and Multiplex PCR genotyping

    Enjalbert Jérôme

    2011-07-01

    Full Text Available Abstract Background Puccinia striiformis f.sp. tritici (PST, an obligate fungal pathogen causing wheat yellow/stripe rust, a serious disease, has been used to understand the evolution of crop pathogen using molecular markers. However, numerous questions regarding its evolutionary history and recent migration routes still remains to be addressed, which need the genotyping of a large number of isolates, a process that is limited by both DNA extraction and genotyping methods. To address the two issues, we developed here a method for direct DNA extraction from infected leaves combined with optimized SSR multiplexing. Findings We report here an efficient protocol for direct fungal DNA extraction from infected leaves, avoiding the costly and time consuming step of spore multiplication. The genotyping strategy we propose, amplified a total of 20 SSRs in three Multiplex PCR reactions, which were highly polymorphic and were able to differentiate different PST populations with high efficiency and accuracy. Conclusion These two developments enabled a genotyping strategy that could contribute to the development of molecular epidemiology of yellow rust disease, both at a regional or worldwide scale.

  1. Cost reduction for web-based data imputation

    Li, Zhixu; Shang, Shuo; Xie, Qing; Zhang, Xiangliang

    2014-01-01

    Web-based Data Imputation enables the completion of incomplete data sets by retrieving absent field values from the Web. In particular, complete fields can be used as keywords in imputation queries for absent fields. However, due to the ambiguity

  2. Multiple imputation by chained equations for systematically and sporadically missing multilevel data.

    Resche-Rigon, Matthieu; White, Ian R

    2018-06-01

    In multilevel settings such as individual participant data meta-analysis, a variable is 'systematically missing' if it is wholly missing in some clusters and 'sporadically missing' if it is partly missing in some clusters. Previously proposed methods to impute incomplete multilevel data handle either systematically or sporadically missing data, but frequently both patterns are observed. We describe a new multiple imputation by chained equations (MICE) algorithm for multilevel data with arbitrary patterns of systematically and sporadically missing variables. The algorithm is described for multilevel normal data but can easily be extended for other variable types. We first propose two methods for imputing a single incomplete variable: an extension of an existing method and a new two-stage method which conveniently allows for heteroscedastic data. We then discuss the difficulties of imputing missing values in several variables in multilevel data using MICE, and show that even the simplest joint multilevel model implies conditional models which involve cluster means and heteroscedasticity. However, a simulation study finds that the proposed methods can be successfully combined in a multilevel MICE procedure, even when cluster means are not included in the imputation models.

  3. Evaluation of DNA Extraction Methods Suitable for PCR-based Detection and Genotyping of Clostridium botulinum

    Auricchio, Bruna; Anniballi, Fabrizio; Fiore, Alfonsina

    2013-01-01

    in terms of cost, time, labor, and supplies. Eleven botulinum toxin–producing clostridia strains and 25 samples (10 food, 13 clinical, and 2 environmental samples) naturally contaminated with botulinum toxin–producing clostridia were used to compare 4 DNA extraction procedures: Chelex® 100 matrix, Phenol......Sufficient quality and quantity of extracted DNA is critical to detecting and performing genotyping of Clostridium botulinum by means of PCR-based methods. An ideal extraction method has to optimize DNA yield, minimize DNA degradation, allow multiple samples to be extracted, and be efficient...

  4. Comparison of results from different imputation techniques for missing data from an anti-obesity drug trial

    Jørgensen, Anders W.; Lundstrøm, Lars H; Wetterslev, Jørn

    2014-01-01

    BACKGROUND: In randomised trials of medical interventions, the most reliable analysis follows the intention-to-treat (ITT) principle. However, the ITT analysis requires that missing outcome data have to be imputed. Different imputation techniques may give different results and some may lead to bias...... of handling missing data in a 60-week placebo controlled anti-obesity drug trial on topiramate. METHODS: We compared an analysis of complete cases with datasets where missing body weight measurements had been replaced using three different imputation methods: LOCF, baseline carried forward (BOCF) and MI...

  5. iVAR: a program for imputing missing data in multivariate time series using vector autoregressive models.

    Liu, Siwei; Molenaar, Peter C M

    2014-12-01

    This article introduces iVAR, an R program for imputing missing data in multivariate time series on the basis of vector autoregressive (VAR) models. We conducted a simulation study to compare iVAR with three methods for handling missing data: listwise deletion, imputation with sample means and variances, and multiple imputation ignoring time dependency. The results showed that iVAR produces better estimates for the cross-lagged coefficients than do the other three methods. We demonstrate the use of iVAR with an empirical example of time series electrodermal activity data and discuss the advantages and limitations of the program.

  6. HRM and SNaPshot as alternative forensic SNP genotyping methods.

    Mehta, Bhavik; Daniel, Runa; McNevin, Dennis

    2017-09-01

    Single nucleotide polymorphisms (SNPs) have been widely used in forensics for prediction of identity, biogeographical ancestry (BGA) and externally visible characteristics (EVCs). Single base extension (SBE) assays, most notably SNaPshot® (Thermo Fisher Scientific), are commonly used for forensic SNP genotyping as they can be employed on standard instrumentation in forensic laboratories (e.g. capillary electrophoresis). High resolution melt (HRM) analysis is an alternative method and is a simple, fast, single tube assay for low throughput SNP typing. This study compares HRM and SNaPshot®. HRM produced reproducible and concordant genotypes at 500 pg, however, difficulties were encountered when genotyping SNPs with high GC content in flanking regions and differentiating variants of symmetrical SNPs. SNaPshot® was reproducible at 100 pg and is less dependent on SNP choice. HRM has a shorter processing time in comparison to SNaPshot®, avoids post PCR contamination risk and has potential as a screening tool for many forensic applications.

  7. Pyroprinting: a rapid and flexible genotypic fingerprinting method for typing bacterial strains.

    Black, Michael W; VanderKelen, Jennifer; Montana, Aldrin; Dekhtyar, Alexander; Neal, Emily; Goodman, Anya; Kitts, Christopher L

    2014-10-01

    Bacterial strain typing is commonly employed in studies involving epidemiology, population ecology, and microbial source tracking to identify sources of fecal contamination. Methods for differentiating strains generally use either a collection of phenotypic traits or rely on some interrogation of the bacterial genotype. This report introduces pyroprinting, a novel genotypic strain typing method that is rapid, inexpensive, and discriminating compared to the most sensitive methods already in use. Pyroprinting relies on the simultaneous pyrosequencing of polymorphic multicopy loci, such as the intergenic transcribed spacer regions of rRNA operons in bacterial genomes. Data generated by sequencing combinations of variable templates are reproducible and intrinsically digitized. The theory and development of pyroprinting in Escherichia coli, including the selection of similarity thresholds to define matches between isolates, are presented. The pyroprint-based strain differentiation limits and phylogenetic relevance compared to other typing methods are also explored. Pyroprinting is unique in its simplicity and, paradoxically, in its intrinsic complexity. This new approach serves as an excellent alternative to more cumbersome or less phylogenetically relevant strain typing methods. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. Time Series Imputation via L1 Norm-Based Singular Spectrum Analysis

    Kalantari, Mahdi; Yarmohammadi, Masoud; Hassani, Hossein; Silva, Emmanuel Sirimal

    Missing values in time series data is a well-known and important problem which many researchers have studied extensively in various fields. In this paper, a new nonparametric approach for missing value imputation in time series is proposed. The main novelty of this research is applying the L1 norm-based version of Singular Spectrum Analysis (SSA), namely L1-SSA which is robust against outliers. The performance of the new imputation method has been compared with many other established methods. The comparison is done by applying them to various real and simulated time series. The obtained results confirm that the SSA-based methods, especially L1-SSA can provide better imputation in comparison to other methods.

  9. Impact of enumeration method on diversity of Escherichia coli genotypes isolated from surface water.

    Martin, E C; Gentry, T J

    2016-11-01

    There are numerous regulatory-approved Escherichia coli enumeration methods, but it is not known whether differences in media composition and incubation conditions impact the diversity of E. coli populations detected by these methods. A study was conducted to determine if three standard water quality assessments, Colilert ® , USEPA Method 1603, (modified mTEC) and USEPA Method 1604 (MI), detect different populations of E. coli. Samples were collected from six watersheds and analysed using the three enumeration approaches followed by E. coli isolation and genotyping. Results indicated that the three methods generally produced similar enumeration data across the sites, although there were some differences on a site-by-site basis. The Colilert ® method consistently generated the least diverse collection of E. coli genotypes as compared to modified mTEC and MI, with those two methods being roughly equal to each other. Although the three media assessed in this study were designed to enumerate E. coli, the differences in the media composition, incubation temperature, and growth platform appear to have a strong selective influence on the populations of E. coli isolated. This study suggests that standardized methods of enumeration and isolation may be warranted if researchers intend to obtain individual E. coli isolates for further characterization. This study characterized the impact of three USEPA-approved Escherichia coli enumeration methods on observed E. coli population diversity in surface water samples. Results indicated that these methods produced similar E. coli enumeration data but were more variable in the diversity of E. coli genotypes observed. Although the three methods enumerate the same species, differences in media composition, growth platform, and incubation temperature likely contribute to the selection of different cultivable populations of E. coli, and thus caution should be used when implementing these methods interchangeably for

  10. UniFIeD Univariate Frequency-based Imputation for Time Series Data

    Friese, Martina; Stork, Jörg; Ramos Guerra, Ricardo; Bartz-Beielstein, Thomas; Thaker, Soham; Flasch, Oliver; Zaefferer, Martin

    2013-01-01

    This paper introduces UniFIeD, a new data preprocessing method for time series. UniFIeD can cope with large intervals of missing data. A scalable test function generator, which allows the simulation of time series with different gap sizes, is presented additionally. An experimental study demonstrates that (i) UniFIeD shows a significant better performance than simple imputation methods and (ii) UniFIeD is able to handle situations, where advanced imputation methods fail. The results are indep...

  11. A Simple PCR Method for Rapid Genotype Analysis of Mycobacterium ulcerans

    Stinear, Timothy; Davies, John K.; Jenkin, Grant A.; Portaels, Françoise; Ross, Bruce C.; OppEdIsano, Frances; Purcell, Maria; Hayman, John A.; Johnson, Paul D. R.

    2000-01-01

    Two high-copy-number insertion sequences, IS2404 and IS2606, were recently identified in Mycobacterium ulcerans and were shown by Southern hybridization to possess restriction fragment length polymorphism between strains from different geographic origins. We have designed a simple genotyping method that captures these differences by PCR amplification of the region between adjacent copies of IS2404 and IS2606. We have called this system 2426 PCR. The method is rapid, reproducible, sensitive, and specific for M. ulcerans, and it has confirmed previous studies suggesting a clonal population structure of M. ulcerans within a geographic region. M. ulcerans isolates from Australia, Papua New Guinea, Malaysia, Surinam, Mexico, Japan, China, and several countries in Africa were easily differentiated based on an array of 4 to 14 PCR products ranging in size from 200 to 900 bp. Numerical analysis of the banding patterns suggested a close evolutionary link between M. ulcerans isolates from Africa and southeast Asia. The application of 2426 PCR to total DNA, extracted directly from M. ulcerans-infected tissue specimens without culture, demonstrated the sensitivity and specificity of this method and confirmed for the first time that both animal and human isolates from areas of endemicity in southeast Australia have the same genotype. PMID:10747130

  12. Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.

    Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A

    2016-01-01

    Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.

  13. VIGAN: Missing View Imputation with Generative Adversarial Networks.

    Shang, Chao; Palmer, Aaron; Sun, Jiangwen; Chen, Ko-Shin; Lu, Jin; Bi, Jinbo

    2017-01-01

    In an era when big data are becoming the norm, there is less concern with the quantity but more with the quality and completeness of the data. In many disciplines, data are collected from heterogeneous sources, resulting in multi-view or multi-modal datasets. The missing data problem has been challenging to address in multi-view data analysis. Especially, when certain samples miss an entire view of data, it creates the missing view problem. Classic multiple imputations or matrix completion methods are hardly effective here when no information can be based on in the specific view to impute data for such samples. The commonly-used simple method of removing samples with a missing view can dramatically reduce sample size, thus diminishing the statistical power of a subsequent analysis. In this paper, we propose a novel approach for view imputation via generative adversarial networks (GANs), which we name by VIGAN. This approach first treats each view as a separate domain and identifies domain-to-domain mappings via a GAN using randomly-sampled data from each view, and then employs a multi-modal denoising autoencoder (DAE) to reconstruct the missing view from the GAN outputs based on paired data across the views. Then, by optimizing the GAN and DAE jointly, our model enables the knowledge integration for domain mappings and view correspondences to effectively recover the missing view. Empirical results on benchmark datasets validate the VIGAN approach by comparing against the state of the art. The evaluation of VIGAN in a genetic study of substance use disorders further proves the effectiveness and usability of this approach in life science.

  14. A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers

    Concepción Crespo Turrado

    2015-12-01

    Full Text Available Nowadays, data collection is a key process in the study of electrical power networks when searching for harmonics and a lack of balance among phases. In this context, the lack of data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, and current in each phase and power factor adversely affects any time series study performed. When this occurs, a data imputation process must be accomplished in order to substitute the data that is missing for estimated values. This paper presents a novel missing data imputation method based on multivariate adaptive regression splines (MARS and compares it with the well-known technique called multivariate imputation by chained equations (MICE. The results obtained demonstrate how the proposed method outperforms the MICE algorithm.

  15. Genotyping common and rare variation using overlapping pool sequencing

    Pasaniuc Bogdan

    2011-07-01

    Full Text Available Abstract Background Recent advances in sequencing technologies set the stage for large, population based studies, in which the ANA or RNA of thousands of individuals will be sequenced. Currently, however, such studies are still infeasible using a straightforward sequencing approach; as a result, recently a few multiplexing schemes have been suggested, in which a small number of ANA pools are sequenced, and the results are then deconvoluted using compressed sensing or similar approaches. These methods, however, are limited to the detection of rare variants. Results In this paper we provide a new algorithm for the deconvolution of DNA pools multiplexing schemes. The presented algorithm utilizes a likelihood model and linear programming. The approach allows for the addition of external data, particularly imputation data, resulting in a flexible environment that is suitable for different applications. Conclusions Particularly, we demonstrate that both low and high allele frequency SNPs can be accurately genotyped when the DNA pooling scheme is performed in conjunction with microarray genotyping and imputation. Additionally, we demonstrate the use of our framework for the detection of cancer fusion genes from RNA sequences.

  16. Imputation by the mean score should be avoided when validating a Patient Reported Outcomes questionnaire by a Rasch model in presence of informative missing data

    Hardouin, Jean-Benoit

    2011-07-14

    Abstract Background Nowadays, more and more clinical scales consisting in responses given by the patients to some items (Patient Reported Outcomes - PRO), are validated with models based on Item Response Theory, and more specifically, with a Rasch model. In the validation sample, presence of missing data is frequent. The aim of this paper is to compare sixteen methods for handling the missing data (mainly based on simple imputation) in the context of psychometric validation of PRO by a Rasch model. The main indexes used for validation by a Rasch model are compared. Methods A simulation study was performed allowing to consider several cases, notably the possibility for the missing values to be informative or not and the rate of missing data. Results Several imputations methods produce bias on psychometrical indexes (generally, the imputation methods artificially improve the psychometric qualities of the scale). In particular, this is the case with the method based on the Personal Mean Score (PMS) which is the most commonly used imputation method in practice. Conclusions Several imputation methods should be avoided, in particular PMS imputation. From a general point of view, it is important to use an imputation method that considers both the ability of the patient (measured for example by his\\/her score), and the difficulty of the item (measured for example by its rate of favourable responses). Another recommendation is to always consider the addition of a random process in the imputation method, because such a process allows reducing the bias. Last, the analysis realized without imputation of the missing data (available case analyses) is an interesting alternative to the simple imputation in this context.

  17. A suggested approach for imputation of missing dietary data for young children in daycare.

    Stevens, June; Ou, Fang-Shu; Truesdale, Kimberly P; Zeng, Donglin; Vaughn, Amber E; Pratt, Charlotte; Ward, Dianne S

    2015-01-01

    Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult. The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method. Data were from children aged 2-5 years in the My Parenting SOS project (n=308; 870 24-h diet recalls). Mixed models were used to simultaneously predict breakfast, dinner, and evening snacks (B+D+ES); lunch; and daytime snacks for all children after adjusting for age, sex, and body mass index (BMI). From these models, we imputed the missing weekday daycare lunches by interpolation using the mean lunch to B+D+ES [L/(B+D+ES)] ratio among non-daycare children on weekdays and the L/(B+D+ES) ratio for all children on weekends. Daytime snack data were used to impute snacks. The reported mean (± standard deviation) weekday intake was lower for daycare children [725 (±324) kcal] compared to non-daycare children [1,048 (±463) kcal]. Weekend intake for all children was 1,173 (±427) kcal. After imputation, weekday caloric intake for daycare children was 1,230 (±409) kcal. Daily intakes that included imputed data were associated with age and sex but not with BMI. This work indicates that imputation is a promising method for improving the precision of daily nutrient data from young children.

  18. A suggested approach for imputation of missing dietary data for young children in daycare

    June Stevens

    2015-12-01

    Full Text Available Background: Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult. Objective: The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method. Design: Data were from children aged 2-5 years in the My Parenting SOS project (n=308; 870 24-h diet recalls. Mixed models were used to simultaneously predict breakfast, dinner, and evening snacks (B+D+ES; lunch; and daytime snacks for all children after adjusting for age, sex, and body mass index (BMI. From these models, we imputed the missing weekday daycare lunches by interpolation using the mean lunch to B+D+ES [L/(B+D+ES] ratio among non-daycare children on weekdays and the L/(B+D+ES ratio for all children on weekends. Daytime snack data were used to impute snacks. Results: The reported mean (± standard deviation weekday intake was lower for daycare children [725 (±324 kcal] compared to non-daycare children [1,048 (±463 kcal]. Weekend intake for all children was 1,173 (±427 kcal. After imputation, weekday caloric intake for daycare children was 1,230 (±409 kcal. Daily intakes that included imputed data were associated with age and sex but not with BMI. Conclusion: This work indicates that imputation is a promising method for improving the precision of daily nutrient data from young children.

  19. A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation.

    Välikangas, Tommi; Suomi, Tomi; Elo, Laura L

    2017-05-31

    Label-free mass spectrometry (MS) has developed into an important tool applied in various fields of biological and life sciences. Several software exist to process the raw MS data into quantified protein abundances, including open source and commercial solutions. Each software includes a set of unique algorithms for different tasks of the MS data processing workflow. While many of these algorithms have been compared separately, a thorough and systematic evaluation of their overall performance is missing. Moreover, systematic information is lacking about the amount of missing values produced by the different proteomics software and the capabilities of different data imputation methods to account for them.In this study, we evaluated the performance of five popular quantitative label-free proteomics software workflows using four different spike-in data sets. Our extensive testing included the number of proteins quantified and the number of missing values produced by each workflow, the accuracy of detecting differential expression and logarithmic fold change and the effect of different imputation and filtering methods on the differential expression results. We found that the Progenesis software performed consistently well in the differential expression analysis and produced few missing values. The missing values produced by the other software decreased their performance, but this difference could be mitigated using proper data filtering or imputation methods. Among the imputation methods, we found that the local least squares (lls) regression imputation consistently increased the performance of the software in the differential expression analysis, and a combination of both data filtering and local least squares imputation increased performance the most in the tested data sets. © The Author 2017. Published by Oxford University Press.

  20. Assessment of soy genotype and processing method on quality of soybean tofu.

    Stanojevic, Sladjana P; Barac, Miroljub B; Pesic, Mirjana B; Vucelic-Radovic, Biljana V

    2011-07-13

    Protein quality in six soybean varieties, based on subunit composition of their protein, was correlated with quality of the produced tofu. Also, protein changes due to a pilot plant processing method involving high temperature/pressure and commercial rennet as coagulant were assessed. In each soybean variety, glycinin (11S) and β-conglycinin (7S) as well as 11S/7S ratio significantly changed from beans to tofu. Between varieties, the 11S/7S protein ratio in seed indicated genotypic influence on tofu yield and gel hardness (r = 0.91 and r = 0.99, respectively; p soybean β'-subunit of 7S protein negatively influenced tofu hardness (r = -0.91, p Seed protein composition and proportion of 7S protein subunits under the applied production method had an important role in defining tofu quality.

  1. An NS5A single optimized method to determine genotype, subtype and resistance profiles of Hepatitis C strains.

    Elisabeth Andre-Garnier

    Full Text Available The objective was to develop a method of HCV genome sequencing that allowed simultaneous genotyping and NS5A inhibitor resistance profiling. In order to validate the use of a unique RT-PCR for genotypes 1-5, 142 plasma samples from patients infected with HCV were analysed. The NS4B-NS5A partial region was successfully amplified and sequenced in all samples. In parallel, partial NS3 sequences were analyzed obtained for genotyping. Phylogenetic analysis showed concordance of genotypes and subtypes with a bootstrap >95% for each type cluster. NS5A resistance mutations were analyzed using the Geno2pheno [hcv] v0.92 tool and compared to the list of known Resistant Associated Substitutions recently published. In conclusion, this tool allows determination of HCV genotypes, subtypes and identification of NS5A resistance mutations. This single method can be used to detect pre-existing resistance mutations in NS5A before treatment and to check the emergence of resistant viruses while undergoing treatment in major HCV genotypes (G1-5 in the EU and the US.

  2. Learning-Based Adaptive Imputation Methodwith kNN Algorithm for Missing Power Data

    Minkyung Kim

    2017-10-01

    Full Text Available This paper proposes a learning-based adaptive imputation method (LAI for imputing missing power data in an energy system. This method estimates the missing power data by using the pattern that appears in the collected data. Here, in order to capture the patterns from past power data, we newly model a feature vector by using past data and its variations. The proposed LAI then learns the optimal length of the feature vector and the optimal historical length, which are significant hyper parameters of the proposed method, by utilizing intentional missing data. Based on a weighted distance between feature vectors representing a missing situation and past situation, missing power data are estimated by referring to the k most similar past situations in the optimal historical length. We further extend the proposed LAI to alleviate the effect of unexpected variation in power data and refer to this new approach as the extended LAI method (eLAI. The eLAI selects a method between linear interpolation (LI and the proposed LAI to improve accuracy under unexpected variations. Finally, from a simulation under various energy consumption profiles, we verify that the proposed eLAI achieves about a 74% reduction of the average imputation error in an energy system, compared to the existing imputation methods.

  3. Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species.

    Brant K Peterson

    Full Text Available The ability to efficiently and accurately determine genotypes is a keystone technology in modern genetics, crucial to studies ranging from clinical diagnostics, to genotype-phenotype association, to reconstruction of ancestry and the detection of selection. To date, high capacity, low cost genotyping has been largely achieved via "SNP chip" microarray-based platforms which require substantial prior knowledge of both genome sequence and variability, and once designed are suitable only for those targeted variable nucleotide sites. This method introduces substantial ascertainment bias and inherently precludes detection of rare or population-specific variants, a major source of information for both population history and genotype-phenotype association. Recent developments in reduced-representation genome sequencing experiments on massively parallel sequencers (commonly referred to as RAD-tag or RADseq have brought direct sequencing to the problem of population genotyping, but increased cost and procedural and analytical complexity have limited their widespread adoption. Here, we describe a complete laboratory protocol, including a custom combinatorial indexing method, and accompanying software tools to facilitate genotyping across large numbers (hundreds or more of individuals for a range of markers (hundreds to hundreds of thousands. Our method requires no prior genomic knowledge and achieves per-site and per-individual costs below that of current SNP chip technology, while requiring similar hands-on time investment, comparable amounts of input DNA, and downstream analysis times on the order of hours. Finally, we provide empirical results from the application of this method to both genotyping in a laboratory cross and in wild populations. Because of its flexibility, this modified RADseq approach promises to be applicable to a diversity of biological questions in a wide range of organisms.

  4. An efficient genotyping method for genome-modified animals and human cells generated with CRISPR/Cas9 system.

    Zhu, Xiaoxiao; Xu, Yajie; Yu, Shanshan; Lu, Lu; Ding, Mingqin; Cheng, Jing; Song, Guoxu; Gao, Xing; Yao, Liangming; Fan, Dongdong; Meng, Shu; Zhang, Xuewen; Hu, Shengdi; Tian, Yong

    2014-09-19

    The rapid generation of various species and strains of laboratory animals using CRISPR/Cas9 technology has dramatically accelerated the interrogation of gene function in vivo. So far, the dominant approach for genotyping of genome-modified animals has been the T7E1 endonuclease cleavage assay. Here, we present a polyacrylamide gel electrophoresis-based (PAGE) method to genotype mice harboring different types of indel mutations. We developed 6 strains of genome-modified mice using CRISPR/Cas9 system, and utilized this approach to genotype mice from F0 to F2 generation, which included single and multiplexed genome-modified mice. We also determined the maximal detection sensitivity for detecting mosaic DNA using PAGE-based assay as 0.5%. We further applied PAGE-based genotyping approach to detect CRISPR/Cas9-mediated on- and off-target effect in human 293T and induced pluripotent stem cells (iPSCs). Thus, PAGE-based genotyping approach meets the rapidly increasing demand for genotyping of the fast-growing number of genome-modified animals and human cell lines created using CRISPR/Cas9 system or other nuclease systems such as TALEN or ZFN.

  5. Yield and nutritional quality of greenhouse lettuce (Lactuca sativa L. as affected by genotype and production methods

    Govedarica-Lučić Aleksandra

    2014-01-01

    Full Text Available Greenhouse experiments were conducted in winter growing seasons in order to evaluate the effects of genotype and production methods on yield and nutritional quality of lettuce (Lactuca sativa L.. A three-year (2009-2011 study was conducted by randomized block system in a greenhouse without additional heating. The trial included three genotypes of lettuce (Archimedes RZ, Santoro RZ, Kibou RZ. Each row with these genotypes was exposed to the following variants of covering: control-planting on bare soil, mulching before sowing with PE-black foil, agro textile-covering plants after planting with agro textile (17 g, a combination of mulching + agro textile. Throughout of all the three years of the trial, it was continuously evidenced that the genotype “Santoro RZ” had the biggest heads and the highest yield (15.33 kg 10 m-2, which leads to conclusion that the yield of lettuce is a genotype characteristics. Moreover, the nutritional value (ascorbic acid concentration has shown that, depending on the method of production, in average, the combination of mulching + agro textile (26.77 mg 100 g-1 had the highest content while the control variant had significantly lower vitamin C content (21.10 mg 100 g-1. The three-year researches have shown that the production method and genotype significantly affect the nitrate content. An average nitrate content was 2196.33 mg kg-1 on the control variant, and 2526.24 mg kg-1 on agro textile. Leafy lettuce of genotyp „Kibou RZ“ had lower nitrate content (2176.85 mg kg-1 compared to „Archimedes RZ“ (2843.05 mg kg-1 and „Santoro RZ“ (2221.37 mg kg-1. However nitrate concentration in all treatments remained within the European Union’s permissible levels.

  6. Improved Correction of Misclassification Bias With Bootstrap Imputation.

    van Walraven, Carl

    2018-07-01

    Diagnostic codes used in administrative database research can create bias due to misclassification. Quantitative bias analysis (QBA) can correct for this bias, requires only code sensitivity and specificity, but may return invalid results. Bootstrap imputation (BI) can also address misclassification bias but traditionally requires multivariate models to accurately estimate disease probability. This study compared misclassification bias correction using QBA and BI. Serum creatinine measures were used to determine severe renal failure status in 100,000 hospitalized patients. Prevalence of severe renal failure in 86 patient strata and its association with 43 covariates was determined and compared with results in which renal failure status was determined using diagnostic codes (sensitivity 71.3%, specificity 96.2%). Differences in results (misclassification bias) were then corrected with QBA or BI (using progressively more complex methods to estimate disease probability). In total, 7.4% of patients had severe renal failure. Imputing disease status with diagnostic codes exaggerated prevalence estimates [median relative change (range), 16.6% (0.8%-74.5%)] and its association with covariates [median (range) exponentiated absolute parameter estimate difference, 1.16 (1.01-2.04)]. QBA produced invalid results 9.3% of the time and increased bias in estimates of both disease prevalence and covariate associations. BI decreased misclassification bias with increasingly accurate disease probability estimates. QBA can produce invalid results and increase misclassification bias. BI avoids invalid results and can importantly decrease misclassification bias when accurate disease probability estimates are used.

  7. Comparing simple root phenotyping methods on a core set of rice genotypes.

    Shrestha, R; Al-Shugeairy, Z; Al-Ogaidi, F; Munasinghe, M; Radermacher, M; Vandenhirtz, J; Price, A H

    2014-05-01

    Interest in belowground plant growth is increasing, especially in relation to arguments that shallow-rooted cultivars are efficient at exploiting soil phosphorus while deep-rooted ones will access water at depth. However, methods for assessing roots in large numbers of plants are diverse and direct comparisons of methods are rare. Three methods for measuring root growth traits were evaluated for utility in discriminating rice cultivars: soil-filled rhizotrons, hydroponics and soil-filled pots whose bottom was sealed with a non-woven fabric (a potential method for assessing root penetration ability). A set of 38 rice genotypes including the OryzaSNP set of 20 cultivars, additional parents of mapping populations and products of marker-assisted selection for root QTLs were assessed. A novel method of image analysis for assessing rooting angles from rhizotron photographs was employed. The non-woven fabric was the easiest yet least discriminatory method, while the rhizotron was highly discriminatory and allowed the most traits to be measured but required more than three times the labour of the other methods. The hydroponics was both easy and discriminatory, allowed temporal measurements, but is most likely to suffer from artefacts. Image analysis of rhizotrons compared favourably to manual methods for discriminating between cultivars. Previous observations that cultivars from the indica subpopulation have shallower rooting angles than aus or japonica cultivars were confirmed in the rhizotrons, and indica and temperate japonicas had lower maximum root lengths in rhizotrons and hydroponics. It is concluded that rhizotrons are the preferred method for root screening, particularly since root angles can be assessed. © 2013 German Botanical Society and The Royal Botanical Society of the Netherlands.

  8. Refining QTL with high-density SNP genotyping and whole genome sequence in three cattle breeds

    Sahana, Goutam; Guldbrandtsen, Bernt; Lund, Mogens Sandø

    2012-01-01

    Genome-wide association study was carried out in Nordic Holsteins, Nordic Red and Jersey breeds for functional traits using BovineHD Genotyping BreadChip (Illumina, San Diego, CA). The association analyses were carried out using both linear mixed model approach and a Bayesian variable selection...... method. Principal components were used to account for population structure. The QTL segregating in all three breeds were selected and a few of the most significant ones were followed in further analyses. The polymorphisms in the identified QTL regions were imputed using 90 whole genome sequences...

  9. Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success.

    Humble, Emily; Thorne, Michael A S; Forcada, Jaume; Hoffman, Joseph I

    2016-08-26

    Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of 'putative' SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be

  10. Molecular Diagnosis of Brettanomyces bruxellensis’ Sulfur Dioxide Sensitivity Through Genotype Specific Method

    Marta Avramova

    2018-06-01

    Full Text Available The yeast species Brettanomyces bruxellensis is associated with important economic losses due to red wine spoilage. The most common method to prevent and/or control B. bruxellensis spoilage in winemaking is the addition of sulfur dioxide into must and wine. However, recently, it was reported that some B. bruxellensis strains could be tolerant to commonly used doses of SO2. In this work, B. bruxellensis response to SO2 was assessed in order to explore the relationship between SO2 tolerance and genotype. We selected 145 isolates representative of the genetic diversity of the species, and from different fermentation niches (roughly 70% from grape wine fermentation environment, and 30% from beer, ethanol, tequila, kombucha, etc.. These isolates were grown in media harboring increasing sulfite concentrations, from 0 to 0.6 mg.L-1 of molecular SO2. Three behaviors were defined: sensitive strains showed longer lag phase and slower growth rate and/or lower maximum population size in presence of increasing concentrations of SO2. Tolerant strains displayed increased lag phase, but maximal growth rate and maximal population size remained unchanged. Finally, resistant strains showed no growth variation whatever the SO2 concentrations. 36% (52/145 of B. bruxellensis isolates were resistant or tolerant to sulfite, and up to 43% (46/107 when considering only wine isolates. Moreover, most of the resistant/tolerant strains belonged to two specific genetic groups, allowing the use of microsatellite genotyping to predict the risk of sulfur dioxide resistance/tolerance with high reliability (>90%. Such molecular diagnosis could help the winemakers to adjust antimicrobial techniques and efficient spoilage prevention with minimal intervention.

  11. Comparison of Four Human Papillomavirus Genotyping Methods: Next-generation Sequencing, INNO-LiPA, Electrochemical DNA Chip, and Nested-PCR.

    Nilyanimit, Pornjarim; Chansaenroj, Jira; Poomipak, Witthaya; Praianantathavorn, Kesmanee; Payungporn, Sunchai; Poovorawan, Yong

    2018-03-01

    Human papillomavirus (HPV) infection causes cervical cancer, thus necessitating early detection by screening. Rapid and accurate HPV genotyping is crucial both for the assessment of patients with HPV infection and for surveillance studies. Fifty-eight cervicovaginal samples were tested for HPV genotypes using four methods in parallel: nested-PCR followed by conventional sequencing, INNO-LiPA, electrochemical DNA chip, and next-generation sequencing (NGS). Seven HPV genotypes (16, 18, 31, 33, 45, 56, and 58) were identified by all four methods. Nineteen HPV genotypes were detected by NGS, but not by nested-PCR, INNO-LiPA, or electrochemical DNA chip. Although NGS is relatively expensive and complex, it may serve as a sensitive HPV genotyping method. Because of its highly sensitive detection of multiple HPV genotypes, NGS may serve as an alternative for diagnostic HPV genotyping in certain situations. © The Korean Society for Laboratory Medicine

  12. Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data.

    Rahman, Shah Atiqur; Huang, Yuxiao; Claassen, Jan; Heintzman, Nathaniel; Kleinberg, Samantha

    2015-12-01

    Most clinical and biomedical data contain missing values. A patient's record may be split across multiple institutions, devices may fail, and sensors may not be worn at all times. While these missing values are often ignored, this can lead to bias and error when the data are mined. Further, the data are not simply missing at random. Instead the measurement of a variable such as blood glucose may depend on its prior values as well as that of other variables. These dependencies exist across time as well, but current methods have yet to incorporate these temporal relationships as well as multiple types of missingness. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on three biological datasets (simulated and actual Type 1 diabetes datasets, and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Detection of HCV genotypes using molecular and radio-isotopic methods

    Ahmad, N.; Baig, S.M.; Shah, W.A.; Khattak, K.F.; Khan, B.; Qureshi, J.A.

    2004-01-01

    Hepatitis C virus (HCV) accounts for most cases of acute and chronic non-A and non-B liver diseases. Persistent HCV infection may lead to liver cirrhosis and hepatocellular carcinoma. Six major HCV genotypes have been recognized. Infection with different genotypes results in different clinical pictures and responses to antiviral therapy. In the area of Faisalabad (Punjab province of Pakistan), the prevalence and molecular epidemiology of Hepatitis C virus infection had never been investigated before. In this study, we have made an attempt to determine the prevalence, distribution and clinical significance of HCV infection in 1100 suspected patients of liver disease by nested reverse transcriptase polymerase chain reaction (RTPCR) over a period of four years. HCV genotypes of isolates were determined by dot-blot hybridization with genotype specific radiolabeled probes in 337 subjects. The proportion of patients with HCV genotypes 1,2,3 and 4 were 37.38%, 1.86%, 16.16% and 0.29% respectively. Mixed infection of HCV genotype was detected in 120 (35.6%) patients, whereas 31 (9.1%) samples remained unclassified. This study revealed changing epidemiology of hepatitis C virus genotype 1 and 3 in the patients. Multiple infection of HCV genotype in the same patient may be of great clinical and pathological importance and interest. (author)

  14. Similar predictions of etravirine sensitivity regardless of genotypic testing method used: comparison of available scoring systems.

    Vingerhoets, Johan; Nijs, Steven; Tambuyzer, Lotke; Hoogstoel, Annemie; Anderson, David; Picchio, Gaston

    2012-01-01

    The aims of this study were to compare various genotypic scoring systems commonly used to predict virological outcome to etravirine, and examine their concordance with etravirine phenotypic susceptibility. Six etravirine genotypic scoring systems were assessed: Tibotec 2010 (based on 20 mutations; TBT 20), Monogram, Stanford HIVdb, ANRS, Rega (based on 37, 30, 27 and 49 mutations, respectively) and virco(®)TYPE HIV-1 (predicted fold change based on genotype). Samples from treatment-experienced patients who participated in the DUET trials and with both genotypic and phenotypic data (n=403) were assessed using each scoring system. Results were retrospectively correlated with virological response in DUET. κ coefficients were calculated to estimate the degree of correlation between the different scoring systems. Correlation between the five scoring systems and the TBT 20 system was approximately 90%. Virological response by etravirine susceptibility was comparable regardless of which scoring system was utilized, with 70-74% of DUET patients determined as susceptible to etravirine by the different scoring systems achieving plasma viral load <50 HIV-1 RNA copies/ml. In samples classed as phenotypically susceptible to etravirine (fold change in 50% effective concentration ≤3), correlations with genotypic score were consistently high across scoring systems (≥70%). In general, the etravirine genotypic scoring systems produced similar results, and genotype-phenotype concordance was high. As such, phenotypic interpretations, and in their absence all genotypic scoring systems investigated, may be used to reliably predict the activity of etravirine.

  15. Mendel Meets CSI: Forensic Genotyping as a Method to Teach Genetics & DNA Science

    Kurowski, Scotia; Reiss, Rebecca

    2007-01-01

    This article describes a forensic DNA science laboratory exercise for advanced high school and introductory college level biology courses. Students use a commercial genotyping kit and genetic analyzer or gene sequencer to analyze DNA recovered from a fictitious crime scene. DNA profiling and STR genotyping are outlined. DNA extraction, PCR, and…

  16. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.

    Lazar, Cosmin; Gatto, Laurent; Ferro, Myriam; Bruley, Christophe; Burger, Thomas

    2016-04-01

    Missing values are a genuine issue in label-free quantitative proteomics. Recent works have surveyed the different statistical methods to conduct imputation and have compared them on real or simulated data sets and recommended a list of missing value imputation methods for proteomics application. Although insightful, these comparisons do not account for two important facts: (i) depending on the proteomics data set, the missingness mechanism may be of different natures and (ii) each imputation method is devoted to a specific type of missingness mechanism. As a result, we believe that the question at stake is not to find the most accurate imputation method in general but instead the most appropriate one. We describe a series of comparisons that support our views: For instance, we show that a supposedly "under-performing" method (i.e., giving baseline average results), if applied at the "appropriate" time in the data-processing pipeline (before or after peptide aggregation) on a data set with the "appropriate" nature of missing values, can outperform a blindly applied, supposedly "better-performing" method (i.e., the reference method from the state-of-the-art). This leads us to formulate few practical guidelines regarding the choice and the application of an imputation method in a proteomics context.

  17. The use of multiple imputation for the accurate measurements of individual feed intake by electronic feeders.

    Jiao, S; Tiezzi, F; Huang, Y; Gray, K A; Maltecca, C

    2016-02-01

    Obtaining accurate individual feed intake records is the key first step in achieving genetic progress toward more efficient nutrient utilization in pigs. Feed intake records collected by electronic feeding systems contain errors (erroneous and abnormal values exceeding certain cutoff criteria), which are due to feeder malfunction or animal-feeder interaction. In this study, we examined the use of a novel data-editing strategy involving multiple imputation to minimize the impact of errors and missing values on the quality of feed intake data collected by an electronic feeding system. Accuracy of feed intake data adjustment obtained from the conventional linear mixed model (LMM) approach was compared with 2 alternative implementations of multiple imputation by chained equation, denoted as MI (multiple imputation) and MICE (multiple imputation by chained equation). The 3 methods were compared under 3 scenarios, where 5, 10, and 20% feed intake error rates were simulated. Each of the scenarios was replicated 5 times. Accuracy of the alternative error adjustment was measured as the correlation between the true daily feed intake (DFI; daily feed intake in the testing period) or true ADFI (the mean DFI across testing period) and the adjusted DFI or adjusted ADFI. In the editing process, error cutoff criteria are used to define if a feed intake visit contains errors. To investigate the possibility that the error cutoff criteria may affect any of the 3 methods, the simulation was repeated with 2 alternative error cutoff values. Multiple imputation methods outperformed the LMM approach in all scenarios with mean accuracies of 96.7, 93.5, and 90.2% obtained with MI and 96.8, 94.4, and 90.1% obtained with MICE compared with 91.0, 82.6, and 68.7% using LMM for DFI. Similar results were obtained for ADFI. Furthermore, multiple imputation methods consistently performed better than LMM regardless of the cutoff criteria applied to define errors. In conclusion, multiple imputation

  18. Evaluation of a high resolution genotyping method for Chlamydia trachomatis using routine clinical samples.

    Yibing Wang

    2011-02-01

    Full Text Available Genital chlamydia infection is the most commonly diagnosed sexually transmitted infection in the UK. C. trachomatis genital infections are usually caused by strains which fall into two pathovars: lymphogranuloma venereum (LGV and the genitourinary genotypes D-K. Although these genotypes can be discriminated by outer membrane protein gene (ompA sequencing or multi-locus sequence typing (MLST, neither protocol affords the high-resolution genotyping required for local epidemiology and accurate contact-tracing.We evaluated variable number tandem repeat (VNTR and ompA sequencing (now called multi-locus VNTR analysis and ompA or "MLVA-ompA" to study local epidemiology in Southampton over a period of six months. One hundred and fifty seven endocervical swabs that tested positive for C. trachomatis from both the Southampton genitourinary medicine (GUM clinic and local GP surgeries were tested by COBAS Taqman 48 (Roche PCR for the presence of C. trachomatis. Samples tested as positive by the commercial NAATs test were genotyped, where possible, by a MLVA-ompA sequencing technique. Attempts were made to isolate C. trachomatis from all 157 samples in cell culture, and 68 (43% were successfully recovered by repeatable passage in culture. Of the 157 samples, 93 (i.e. 59% were fully genotyped by MLVA-ompA. Only one mixed infection (E & D in a single sample was confirmed. There were two distinct D genotypes for the ompA gene. Most frequent ompA genotypes were D, E and F, comprising 20%, 41% and 16% of the type-able samples respectively. Within all genotypes we detected numerous MLVA sub-types.Amongst the common genotypes, there are a significant number of defined MLVA sub-types, which may reflect particular background demographics including age group, geography, high-risk sexual behavior, and sexual networks.

  19. Nonparametric autocovariance estimation from censored time series by Gaussian imputation.

    Park, Jung Wook; Genton, Marc G; Ghosh, Sujit K

    2009-02-01

    One of the most frequently used methods to model the autocovariance function of a second-order stationary time series is to use the parametric framework of autoregressive and moving average models developed by Box and Jenkins. However, such parametric models, though very flexible, may not always be adequate to model autocovariance functions with sharp changes. Furthermore, if the data do not follow the parametric model and are censored at a certain value, the estimation results may not be reliable. We develop a Gaussian imputation method to estimate an autocovariance structure via nonparametric estimation of the autocovariance function in order to address both censoring and incorrect model specification. We demonstrate the effectiveness of the technique in terms of bias and efficiency with simulations under various rates of censoring and underlying models. We describe its application to a time series of silicon concentrations in the Arctic.

  20. Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.

    Sehgal, Muhammad Shoaib B; Gondal, Iqbal; Dooley, Laurence S

    2005-05-15

    Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods. The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE

  1. Numerical taxonomy of the genus Pestivirus: new software for genotyping based on the palindromic nucleotide substitutions method.

    Giangaspero, Massimo; Apicella, Claudio; Harasawa, Ryô

    2013-09-01

    The genus Pestivirus from the family Flaviviridae is represented by four established species; Bovine viral diarrhea virus 1 (BVDV-1); Bovine viral diarrhea virus 2 (BVDV-2); Border disease virus (BDV); and Classical swine fever virus (CSFV); as well a tentative species from a Giraffe. The palindromic nucleotide substitutions (PNS) in the 5' untranslated region (UTR) of Pestivirus RNA has been described as a new, simple and practical method for genotyping. New software is described, also named PNS, that was prepared specifically for this PNS genotyping procedure. Pestivirus identification using PNS was evaluated on five hundred and forty-three sequences at genus, species and genotype level using this software. The software is freely available at www.pns-software.com. Copyright © 2013 Elsevier B.V. All rights reserved.

  2. Factors associated with low birth weight in Nepal using multiple imputation

    Usha Singh

    2017-02-01

    Full Text Available Abstract Background Survey data from low income countries on birth weight usually pose a persistent problem. The studies conducted on birth weight have acknowledged missing data on birth weight, but they are not included in the analysis. Furthermore, other missing data presented on determinants of birth weight are not addressed. Thus, this study tries to identify determinants that are associated with low birth weight (LBW using multiple imputation to handle missing data on birth weight and its determinants. Methods The child dataset from Nepal Demographic and Health Survey (NDHS, 2011 was utilized in this study. A total of 5,240 children were born between 2006 and 2011, out of which 87% had at least one measured variable missing and 21% had no recorded birth weight. All the analyses were carried out in R version 3.1.3. Transform-then impute method was applied to check for interaction between explanatory variables and imputed missing data. Survey package was applied to each imputed dataset to account for survey design and sampling method. Survey logistic regression was applied to identify the determinants associated with LBW. Results The prevalence of LBW was 15.4% after imputation. Women with the highest autonomy on their own health compared to those with health decisions involving husband or others (adjusted odds ratio (OR 1.87, 95% confidence interval (95% CI = 1.31, 2.67, and husband and women together (adjusted OR 1.57, 95% CI = 1.05, 2.35 were less likely to give birth to LBW infants. Mothers using highly polluting cooking fuels (adjusted OR 1.49, 95% CI = 1.03, 2.22 were more likely to give birth to LBW infants than mothers using non-polluting cooking fuels. Conclusion The findings of this study suggested that obtaining the prevalence of LBW from only the sample of measured birth weight and ignoring missing data results in underestimation.

  3. Multiple imputation to account for missing data in a survey: estimating the prevalence of osteoporosis.

    Kmetic, Andrew; Joseph, Lawrence; Berger, Claudie; Tenenhouse, Alan

    2002-07-01

    Nonresponse bias is a concern in any epidemiologic survey in which a subset of selected individuals declines to participate. We reviewed multiple imputation, a widely applicable and easy to implement Bayesian methodology to adjust for nonresponse bias. To illustrate the method, we used data from the Canadian Multicentre Osteoporosis Study, a large cohort study of 9423 randomly selected Canadians, designed in part to estimate the prevalence of osteoporosis. Although subjects were randomly selected, only 42% of individuals who were contacted agreed to participate fully in the study. The study design included a brief questionnaire for those invitees who declined further participation in order to collect information on the major risk factors for osteoporosis. These risk factors (which included age, sex, previous fractures, family history of osteoporosis, and current smoking status) were then used to estimate the missing osteoporosis status for nonparticipants using multiple imputation. Both ignorable and nonignorable imputation models are considered. Our results suggest that selection bias in the study is of concern, but only slightly, in very elderly (age 80+ years), both women and men. Epidemiologists should consider using multiple imputation more often than is current practice.

  4. Two-pass imputation algorithm for missing value estimation in gene expression time series.

    Tsiporkova, Elena; Boeva, Veselka

    2007-10-01

    Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different

  5. Differential network analysis with multiply imputed lipidomic data.

    Maiju Kujala

    Full Text Available The importance of lipids for cell function and health has been widely recognized, e.g., a disorder in the lipid composition of cells has been related to atherosclerosis caused cardiovascular disease (CVD. Lipidomics analyses are characterized by large yet not a huge number of mutually correlated variables measured and their associations to outcomes are potentially of a complex nature. Differential network analysis provides a formal statistical method capable of inferential analysis to examine differences in network structures of the lipids under two biological conditions. It also guides us to identify potential relationships requiring further biological investigation. We provide a recipe to conduct permutation test on association scores resulted from partial least square regression with multiple imputed lipidomic data from the LUdwigshafen RIsk and Cardiovascular Health (LURIC study, particularly paying attention to the left-censored missing values typical for a wide range of data sets in life sciences. Left-censored missing values are low-level concentrations that are known to exist somewhere between zero and a lower limit of quantification. To make full use of the LURIC data with the missing values, we utilize state of the art multiple imputation techniques and propose solutions to the challenges that incomplete data sets bring to differential network analysis. The customized network analysis helps us to understand the complexities of the underlying biological processes by identifying lipids and lipid classes that interact with each other, and by recognizing the most important differentially expressed lipids between two subgroups of coronary artery disease (CAD patients, the patients that had a fatal CVD event and the ones who remained stable during two year follow-up.

  6. Partial F-tests with multiply imputed data in the linear regression framework via coefficient of determination.

    Chaurasia, Ashok; Harel, Ofer

    2015-02-10

    Tests for regression coefficients such as global, local, and partial F-tests are common in applied research. In the framework of multiple imputation, there are several papers addressing tests for regression coefficients. However, for simultaneous hypothesis testing, the existing methods are computationally intensive because they involve calculation with vectors and (inversion of) matrices. In this paper, we propose a simple method based on the scalar entity, coefficient of determination, to perform (global, local, and partial) F-tests with multiply imputed data. The proposed method is evaluated using simulated data and applied to suicide prevention data. Copyright © 2014 John Wiley & Sons, Ltd.

  7. Optimized Use of Low-Depth Genotyping-by-Sequencing for Genomic Prediction Among Multi-Parental Family Pools and Single Plants in Perennial Ryegrass (Lolium perenne L.

    Fabio Cericola

    2018-03-01

    Full Text Available Ryegrass single plants, bi-parental family pools, and multi-parental family pools are often genotyped, based on allele-frequencies using genotyping-by-sequencing (GBS assays. GBS assays can be performed at low-coverage depth to reduce costs. However, reducing the coverage depth leads to a higher proportion of missing data, and leads to a reduction in accuracy when identifying the allele-frequency at each locus. As a consequence of the latter, genomic relationship matrices (GRMs will be biased. This bias in GRMs affects variance estimates and the accuracy of GBLUP for genomic prediction (GBLUP-GP. We derived equations that describe the bias from low-coverage sequencing as an effect of binomial sampling of sequence reads, and allowed for any ploidy level of the sample considered. This allowed us to combine individual and pool genotypes in one GRM, treating pool-genotypes as a polyploid genotype, equal to the total ploidy-level of the parents of the pool. Using simulated data, we verified the magnitude of the GRM bias at different coverage depths for three different kinds of ryegrass breeding material: individual genotypes from single plants, pool-genotypes from F2 families, and pool-genotypes from synthetic varieties. To better handle missing data, we also tested imputation procedures, which are suited for analyzing allele-frequency genomic data. The relative advantages of the bias-correction and the imputation of missing data were evaluated using real data. We examined a large dataset, including single plants, F2 families, and synthetic varieties genotyped in three GBS assays, each with a different coverage depth, and evaluated them for heading date, crown rust resistance, and seed yield. Cross validations were used to test the accuracy using GBLUP approaches, demonstrating the feasibility of predicting among different breeding material. Bias-corrected GRMs proved to increase predictive accuracies when compared with standard approaches to

  8. Multiple imputation in the presence of non-normal data.

    Lee, Katherine J; Carlin, John B

    2017-02-20

    Multiple imputation (MI) is becoming increasingly popular for handling missing data. Standard approaches for MI assume normality for continuous variables (conditionally on the other variables in the imputation model). However, it is unclear how to impute non-normally distributed continuous variables. Using simulation and a case study, we compared various transformations applied prior to imputation, including a novel non-parametric transformation, to imputation on the raw scale and using predictive mean matching (PMM) when imputing non-normal data. We generated data from a range of non-normal distributions, and set 50% to missing completely at random or missing at random. We then imputed missing values on the raw scale, following a zero-skewness log, Box-Cox or non-parametric transformation and using PMM with both type 1 and 2 matching. We compared inferences regarding the marginal mean of the incomplete variable and the association with a fully observed outcome. We also compared results from these approaches in the analysis of depression and anxiety symptoms in parents of very preterm compared with term-born infants. The results provide novel empirical evidence that the decision regarding how to impute a non-normal variable should be based on the nature of the relationship between the variables of interest. If the relationship is linear in the untransformed scale, transformation can introduce bias irrespective of the transformation used. However, if the relationship is non-linear, it may be important to transform the variable to accurately capture this relationship. A useful alternative is to impute the variable using PMM with type 1 matching. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  9. Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey.

    Peyre, Hugo; Leplège, Alain; Coste, Joël

    2011-03-01

    Missing items are common in quality of life (QoL) questionnaires and present a challenge for research in this field. It remains unclear which of the various methods proposed to deal with missing data performs best in this context. We compared personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques using various realistic simulation scenarios of item missingness in QoL questionnaires constructed within the framework of classical test theory. Samples of 300 and 1,000 subjects were randomly drawn from the 2003 INSEE Decennial Health Survey (of 23,018 subjects representative of the French population and having completed the SF-36) and various patterns of missing data were generated according to three different item non-response rates (3, 6, and 9%) and three types of missing data (Little and Rubin's "missing completely at random," "missing at random," and "missing not at random"). The missing data methods were evaluated in terms of accuracy and precision for the analysis of one descriptive and one association parameter for three different scales of the SF-36. For all item non-response rates and types of missing data, multiple imputation and full information maximum likelihood appeared superior to the personal mean score and especially to hot deck in terms of accuracy and precision; however, the use of personal mean score was associated with insignificant bias (relative bias personal mean score appears nonetheless appropriate for dealing with items missing from completed SF-36 questionnaires in most situations of routine use. These results can reasonably be extended to other questionnaires constructed according to classical test theory.

  10. A genotypic method for determining HIV-2 coreceptor usage enables epidemiological studies and clinical decision support.

    Döring, Matthias; Borrego, Pedro; Büch, Joachim; Martins, Andreia; Friedrich, Georg; Camacho, Ricardo Jorge; Eberle, Josef; Kaiser, Rolf; Lengauer, Thomas; Taveira, Nuno; Pfeifer, Nico

    2016-12-20

    CCR5-coreceptor antagonists can be used for treating HIV-2 infected individuals. Before initiating treatment with coreceptor antagonists, viral coreceptor usage should be determined to ensure that the virus can use only the CCR5 coreceptor (R5) and cannot evade the drug by using the CXCR4 coreceptor (X4-capable). However, until now, no online tool for the genotypic identification of HIV-2 coreceptor usage had been available. Furthermore, there is a lack of knowledge on the determinants of HIV-2 coreceptor usage. Therefore, we developed a data-driven web service for the prediction of HIV-2 coreceptor usage from the V3 loop of the HIV-2 glycoprotein and used the tool to identify novel discriminatory features of X4-capable variants. Using 10 runs of tenfold cross validation, we selected a linear support vector machine (SVM) as the model for geno2pheno[coreceptor-hiv2], because it outperformed the other SVMs with an area under the ROC curve (AUC) of 0.95. We found that SVMs were highly accurate in identifying HIV-2 coreceptor usage, attaining sensitivities of 73.5% and specificities of 96% during tenfold nested cross validation. The predictive performance of SVMs was not significantly different (p value 0.37) from an existing rules-based approach. Moreover, geno2pheno[coreceptor-hiv2] achieved a predictive accuracy of 100% and outperformed the existing approach on an independent data set containing nine new isolates with corresponding phenotypic measurements of coreceptor usage. geno2pheno[coreceptor-hiv2] could not only reproduce the established markers of CXCR4-usage, but also revealed novel markers: the substitutions 27K, 15G, and 8S were significantly predictive of CXCR4 usage. Furthermore, SVMs trained on the amino-acid sequences of the V1 and V2 loops were also quite accurate in predicting coreceptor usage (AUCs of 0.84 and 0.65, respectively). In this study, we developed geno2pheno[coreceptor-hiv2], the first online tool for the prediction of HIV-2 coreceptor

  11. Missing Data Imputation of Solar Radiation Data under Different Atmospheric Conditions

    Turrado, Concepción Crespo; López, María del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; de Cos Juez, Francisco Javier

    2014-01-01

    Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW. PMID:25356644

  12. Missing Data Imputation of Solar Radiation Data under Different Atmospheric Conditions

    Concepción Crespo Turrado

    2014-10-01

    Full Text Available Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE. This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW and Multiple Linear Regression (MLR. The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW.

  13. Missing data imputation of solar radiation data under different atmospheric conditions.

    Turrado, Concepción Crespo; López, María Del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; Juez, Francisco Javier de Cos

    2014-10-29

    Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW.

  14. Criteria of GenCall score to edit marker data and methods to handle missing markers have an influence on accuracy of genomic predictions

    Edriss, Vahid; Guldbrandtsen, Bernt; Lund, Mogens Sandø

    2013-01-01

    The aim of this study was to investigate the effect of different strategies for handling low-quality or missing data on prediction accuracy for direct genomic values of protein yield, mastitis and fertility using a Bayesian variable model and a GBLUP model in the Danish Jersey population. The data...... contained 1071 Jersey bulls that were genotyped with the Illumina Bovine 50K chip. After preliminary editing, 39227 SNP remained in the dataset. Four methods to handle missing genotypes were: 1) BEAGLE: missing markers were imputed using Beagle 3.3 software, 2) COMMON: missing genotypes at a locus were...

  15. Data Editing and Imputation in Business Surveys Using “R”

    Elena Romascanu

    2014-06-01

    Full Text Available Purpose – Missing data are a recurring problem that can cause bias or lead to inefficient analyses. The objective of this paper is a direct comparison between the two statistical software features R and SPSS, in order to take full advantage of the existing automated methods for data editing process and imputation in business surveys (with a proper design of consistency rules as a partial alternative to the manual editing of data. Approach – The comparison of different methods on editing surveys data, in R with the ‘editrules’ and ‘survey’ packages because inside those, exist commonly used transformations in official statistics, as visualization of missing values pattern using ‘Amelia’ and ‘VIM’ packages, imputation approaches for longitudinal data using ‘VIMGUI’ and a comparison of another statistical software performance on the same features, such as SPSS. Findings – Data on business statistics received by NIS’s (National Institute of Statistics are not ready to be used for direct analysis due to in-record inconsistencies, errors and missing values from the collected data sets. The appropriate automatic methods from R packages, offers the ability to set the erroneous fields in edit-violating records, to verify the results after the imputation of missing values providing for users a flexible, less time consuming approach and easy to perform automation in R than in SPSS Macros syntax situations, when macros are very handy.

  16. Inference for multivariate regression model based on multiply imputed synthetic data generated via posterior predictive sampling

    Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.

    2017-06-01

    The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.

  17. TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION.

    Allen, Genevera I; Tibshirani, Robert

    2010-06-01

    Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable , meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal , in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.

  18. Evaluation of a cost effective in-house method for HIV-1 drug resistance genotyping using plasma samples.

    Devidas N Chaturbhuj

    Full Text Available OBJECTIVES: Validation of a cost effective in-house method for HIV-1 drug resistance genotyping using plasma samples. DESIGN: The validation includes the establishment of analytical performance characteristics such as accuracy, reproducibility, precision and sensitivity. METHODS: The accuracy was assessed by comparing 26 paired Virological Quality Assessment (VQA proficiency testing panel sequences generated by in-house and ViroSeq Genotyping System 2.0 (Celera Diagnostics, US as a gold standard. The reproducibility and precision were carried out on five samples with five replicates representing multiple HIV-1 subtypes (A, B, C and resistance patterns. The amplification sensitivity was evaluated on HIV-1 positive plasma samples (n = 88 with known viral loads ranges from 1000-1.8 million RNA copies/ml. RESULTS: Comparison of the nucleotide sequences generated by ViroSeq and in-house method showed 99.41±0.46 and 99.68±0.35% mean nucleotide and amino acid identity respectively. Out of 135 Stanford HIVdb listed HIV-1 drug resistance mutations, partial discordance was observed at 15 positions and complete discordance was absent. The reproducibility and precision study showed high nucleotide sequence identities i.e. 99.88±0.10 and 99.82±0.20 respectively. The in-house method showed 100% analytical sensitivity on the samples with HIV-1 viral load >1000 RNA copies/ml. The cost of running the in-house method is only 50% of that for ViroSeq method (112$ vs 300$, thus making it cost effective. CONCLUSIONS: The validated cost effective in-house method may be used to collect surveillance data on the emergence and transmission of HIV-1 drug resistance in resource limited countries. Moreover, the wide applications of a cost effective and validated in-house method for HIV-1 drug resistance testing will facilitate the decision making for the appropriate management of HIV infected patients.

  19. Genetic evaluation with major genes and polygenic inheritance when some animals are not genotyped using gene content multiple-trait BLUP.

    Legarra, Andrés; Vitezica, Zulma G

    2015-11-17

    In pedigreed populations with a major gene segregating for a quantitative trait, it is not clear how to use pedigree, genotype and phenotype information when some individuals are not genotyped. We propose to consider gene content at the major gene as a second trait correlated to the quantitative trait, in a gene content multiple-trait best linear unbiased prediction (GCMTBLUP) method. The genetic covariance between the trait and gene content at the major gene is a function of the substitution effect of the gene. This genetic covariance can be written in a multiple-trait form that accommodates any pattern of missing values for either genotype or phenotype data. Effects of major gene alleles and the genetic covariance between genotype at the major gene and the phenotype can be estimated using standard EM-REML or Gibbs sampling. Prediction of breeding values with genotypes at the major gene can use multiple-trait BLUP software. Major genes with more than two alleles can be considered by including negative covariances between gene contents at each different allele. We simulated two scenarios: a selected and an unselected trait with heritabilities of 0.05 and 0.5, respectively. In both cases, the major gene explained half the genetic variation. Competing methods used imputed gene contents derived by the method of Gengler et al. or by iterative peeling. Imputed gene contents, in contrast to GCMTBLUP, do not consider information on the quantitative trait for genotype prediction. GCMTBLUP gave unbiased estimates of the gene effect, in contrast to the other methods, with less bias and better or equal accuracy of prediction. GCMTBLUP improved estimation of genotypes in non-genotyped individuals, in particular if these individuals had own phenotype records and the trait had a high heritability. Ignoring the major gene in genetic evaluation led to serious biases and decreased prediction accuracy. CGMTBLUP is the best linear predictor of additive genetic merit including

  20. TRIP: An interactive retrieving-inferring data imputation approach

    Li, Zhixu

    2016-06-25

    Data imputation aims at filling in missing attribute values in databases. Existing imputation approaches to nonquantitive string data can be roughly put into two categories: (1) inferring-based approaches [2], and (2) retrieving-based approaches [1]. Specifically, the inferring-based approaches find substitutes or estimations for the missing ones from the complete part of the data set. However, they typically fall short in filling in unique missing attribute values which do not exist in the complete part of the data set [1]. The retrieving-based approaches resort to external resources for help by formulating proper web search queries to retrieve web pages containing the missing values from the Web, and then extracting the missing values from the retrieved web pages [1]. This webbased retrieving approach reaches a high imputation precision and recall, but on the other hand, issues a large number of web search queries, which brings a large overhead [1]. © 2016 IEEE.

  1. TRIP: An interactive retrieving-inferring data imputation approach

    Li, Zhixu; Qin, Lu; Cheng, Hong; Zhang, Xiangliang; Zhou, Xiaofang

    2016-01-01

    Data imputation aims at filling in missing attribute values in databases. Existing imputation approaches to nonquantitive string data can be roughly put into two categories: (1) inferring-based approaches [2], and (2) retrieving-based approaches [1]. Specifically, the inferring-based approaches find substitutes or estimations for the missing ones from the complete part of the data set. However, they typically fall short in filling in unique missing attribute values which do not exist in the complete part of the data set [1]. The retrieving-based approaches resort to external resources for help by formulating proper web search queries to retrieve web pages containing the missing values from the Web, and then extracting the missing values from the retrieved web pages [1]. This webbased retrieving approach reaches a high imputation precision and recall, but on the other hand, issues a large number of web search queries, which brings a large overhead [1]. © 2016 IEEE.

  2. Imputed prices of greenhouse gases and land forests

    Uzawa, Hirofumi

    1993-01-01

    The theory of dynamic optimum formulated by Maeler gives us the basic theoretical framework within which it is possible to analyse the economic and, possibly, political circumstances under which the phenomenon of global warming occurs, and to search for the policy and institutional arrangements whereby it would be effectively arrested. The analysis developed here is an application of Maeler's theory to atmospheric quality. In the analysis a central role is played by the concept of imputed price in the dynamic context. Our determination of imputed prices of atmospheric carbon dioxide and land forests takes into account the difference in the stages of economic development. Indeed, the ratios of the imputed prices of atmospheric carbon dioxide and land forests over the per capita level of real national income are identical for all countries involved. (3 figures, 2 tables) (Author)

  3. A suggested approach for imputation of missing dietary data for young children in daycare

    Stevens, June; Ou, Fang-Shu; Truesdale, Kimberly P.; Zeng, Donglin; Vaughn, Amber E.; Pratt, Charlotte; Ward, Dianne S.

    2015-01-01

    Background: Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult.Objective: The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method.Design: Data were from children aged 2-5 years in the My Parenting...

  4. Multiple imputation of missing passenger boarding data in the national census of ferry operators

    2008-08-01

    This report presents findings from the 2006 National Census of Ferry Operators (NCFO) augmented with imputed values for passengers and passenger miles. Due to the imputation procedures used to calculate missing data, totals in Table 1 may not corresp...

  5. A Novel Method for Assessing Sex-Specific and Genotype-Specific Response to Injury in Astrocyte Culture

    Liu, Mingyue; Oyarzabal, Esteban; Yang, Rui; Murphy, Stephanie J; Hurn, Patricia D.

    2008-01-01

    Female astrocytes sustain less cell death from oxygen-glucose deprivation (OGD) than male astrocytes. Arimidex, an aromatase inhibitor, abolishes these sex differences. To verify sex-dependent differences in P450 aromatase function in astrocyte cell death following OGD, we developed a novel method that uses sex-specific and genotype-specific single pup primary astrocyte cultures from wild-type (WT) and aromatase-knockout (ArKO) mice. After determining sex by external and internal examination as well as PCR and genotype by PCR amplification of tail cDNA, we established cultures from 1−3 day-old male and female, WT and ArKO mice pups and grew them to confluence in estrogen-free media. Cell death was measured by lactate dehydrogenase (LDH) assay. Our study shows that, while WT female astrocytes are more resistant to OGD than WT male cells, sex differences disappear in ArKO cells. Cell death is significantly increased in ArKO compared to WT in female astrocytes but not male cells. Therefore, P450 aromatase appears to be essential in endogenous neuroprotection in females, and this finding may have clinical implications. This innovative technique may also be applied to other in vitro studies of sex-related functional differences. PMID:18436308

  6. An Accurate Method for Inferring Relatedness in Large Datasets of Unphased Genotypes via an Embedded Likelihood-Ratio Test

    Rodriguez, Jesse M.

    2013-01-01

    Studies that map disease genes rely on accurate annotations that indicate whether individuals in the studied cohorts are related to each other or not. For example, in genome-wide association studies, the cohort members are assumed to be unrelated to one another. Investigators can correct for individuals in a cohort with previously-unknown shared familial descent by detecting genomic segments that are shared between them, which are considered to be identical by descent (IBD). Alternatively, elevated frequencies of IBD segments near a particular locus among affected individuals can be indicative of a disease-associated gene. As genotyping studies grow to use increasingly large sample sizes and meta-analyses begin to include many data sets, accurate and efficient detection of hidden relatedness becomes a challenge. To enable disease-mapping studies of increasingly large cohorts, a fast and accurate method to detect IBD segments is required. We present PARENTE, a novel method for detecting related pairs of individuals and shared haplotypic segments within these pairs. PARENTE is a computationally-efficient method based on an embedded likelihood ratio test. As demonstrated by the results of our simulations, our method exhibits better accuracy than the current state of the art, and can be used for the analysis of large genotyped cohorts. PARENTE\\'s higher accuracy becomes even more significant in more challenging scenarios, such as detecting shorter IBD segments or when an extremely low false-positive rate is required. PARENTE is publicly and freely available at http://parente.stanford.edu/. © 2013 Springer-Verlag.

  7. Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes

    Puett Robin C

    2009-10-01

    Full Text Available Abstract Background There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals and at higher group-levels (e.g., spatial distribution. Methods We evaluated the accuracy of eight geo-imputation methods for address allocation from ZIP codes to census tracts at the individual and group level. The spatial apportioning approaches underlying the imputation methods included four fixed (deterministic and four random (stochastic allocation methods using land area, total population, population under age 20, and race/ethnicity as weighting factors. Data included more than 2,000 geocoded cases of diabetes mellitus among youth aged 0-19 in four U.S. regions. The imputed distribution of cases across tracts was compared to the true distribution using a chi-squared statistic. Results At the individual level, population-weighted (total or under age 20 fixed allocation showed the greatest level of accuracy, with correct census tract assignments averaging 30.01% across all regions, followed by the race/ethnicity-weighted random method (23.83%. The true distribution of cases across census tracts was that 58.2% of tracts exhibited no cases, 26.2% had one case, 9.5% had two cases, and less than 3% had three or more. This distribution was best captured by random allocation methods, with no significant differences (p-value > 0.90. However, significant differences in distributions based on fixed allocation methods were found (p-value Conclusion Fixed imputation methods seemed to yield greatest accuracy at the individual level, suggesting use for studies on area-level environmental exposures. Fixed methods result in artificial clusters in single census tracts. For studies

  8. Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research

    Hardt Jochen

    2012-12-01

    Full Text Available Abstract Background Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. Methods A simulation study of a linear regression with a response Y and two predictors X1 and X2 was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10 vs. moderate correlations (r=.50 with X’s and Y. Results The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful. Conclusion More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.

  9. Age at menopause: imputing age at menopause for women with a hysterectomy with application to risk of postmenopausal breast cancer

    Rosner, Bernard; Colditz, Graham A.

    2011-01-01

    Purpose Age at menopause, a major marker in the reproductive life, may bias results for evaluation of breast cancer risk after menopause. Methods We follow 38,948 premenopausal women in 1980 and identify 2,586 who reported hysterectomy without bilateral oophorectomy, and 31,626 who reported natural menopause during 22 years of follow-up. We evaluate risk factors for natural menopause, impute age at natural menopause for women reporting hysterectomy without bilateral oophorectomy and estimate the hazard of reaching natural menopause in the next 2 years. We apply this imputed age at menopause to both increase sample size and to evaluate the relation between postmenopausal exposures and risk of breast cancer. Results Age, cigarette smoking, age at menarche, pregnancy history, body mass index, history of benign breast disease, and history of breast cancer were each significantly related to age at natural menopause; duration of oral contraceptive use and family history of breast cancer were not. The imputation increased sample size substantially and although some risk factors after menopause were weaker in the expanded model (height, and alcohol use), use of hormone therapy is less biased. Conclusions Imputing age at menopause increases sample size, broadens generalizability making it applicable to women with hysterectomy, and reduces bias. PMID:21441037

  10. A method for genotyping elite breeding stocks of leaf chicory (Cichorium intybus L.) by assaying mapped microsatellite marker loci.

    Ghedina, Andrea; Galla, Giulio; Cadalen, Thierry; Hilbert, Jean-Louis; Caenazzo, Silvano Tiozzo; Barcaccia, Gianni

    2015-12-30

    Leaf chicory (Cichorium intybus subsp. intybus var. foliosum L.) is a diploid plant species (2n = 18) of the Asteraceae family. The term "chicory" specifies at least two types of cultivated plants: a leafy vegetable, which is highly differentiated with respect to several cultural types, and a root crop, whose current industrial utilization primarily addresses the extraction of inulin or the production of a coffee substitute. The populations grown are generally represented by local varieties (i.e., landraces) with high variation and adaptation to the natural and anthropological environment where they originated, and have been yearly selected and multiplied by farmers. Currently, molecular genetics and biotechnology are widely utilized in marker-assisted breeding programs in this species. In particular, molecular markers are becoming essential tools for developing parental lines with traits of interest and for assessing the specific combining ability of these lines to breed F1 hybrids. The present research deals with the implementation of an efficient method for genotyping elite breeding stocks developed from old landraces of leaf chicory, Radicchio of Chioggia, which are locally dominant in the Veneto region, using 27 microsatellite (SSR) marker loci scattered throughout the linkage groups. Information on the genetic diversity across molecular markers and plant accessions was successfully assessed along with descriptive statistics over all marker loci and inbred lines. Our overall data support an efficient method for assessing a multi-locus genotype of plant individuals and lineages that is useful for the selection of new varieties and the certification of local products derived from Radicchio of Chioggia. This method proved to be useful for assessing the observed degree of homozygosity of the inbred lines as a measure of their genetic stability; plus it allowed an estimate of the specific combining ability (SCA) between maternal and paternal inbred lines on the

  11. Improved Methods of Carnivore Faecal Sample Preservation, DNA Extraction and Quantification for Accurate Genotyping of Wild Tigers

    Harika, Katakam; Mahla, Ranjeet Singh; Shivaji, Sisinthy

    2012-01-01

    Background Non-invasively collected samples allow a variety of genetic studies on endangered and elusive species. However due to low amplification success and high genotyping error rates fewer samples can be identified up to the individual level. Number of PCRs needed to obtain reliable genotypes also noticeably increase. Methods We developed a quantitative PCR assay to measure and grade amplifiable nuclear DNA in feline faecal extracts. We determined DNA degradation in experimentally aged faecal samples and tested a suite of pre-PCR protocols to considerably improve DNA retrieval. Results Average DNA concentrations of Grade I, II and III extracts were 982pg/µl, 9.5pg/µl and 0.4pg/µl respectively. Nearly 10% of extracts had no amplifiable DNA. Microsatellite PCR success and allelic dropout rates were 92% and 1.5% in Grade I, 79% and 5% in Grade II, and 54% and 16% in Grade III respectively. Our results on experimentally aged faecal samples showed that ageing has a significant effect on quantity and quality of amplifiable DNA (pDNA degradation occurs within 3 days of exposure to direct sunlight. DNA concentrations of Day 1 samples stored by ethanol and silica methods for a month varied significantly from fresh Day 1 extracts (p0.05). DNA concentrations of fresh tiger and leopard faecal extracts without addition of carrier RNA were 816.5pg/µl (±115.5) and 690.1pg/µl (±207.1), while concentrations with addition of carrier RNA were 49414.5pg/µl (±9370.6) and 20982.7pg/µl (±6835.8) respectively. Conclusions Our results indicate that carnivore faecal samples should be collected as freshly as possible, are better preserved by two-step method and should be extracted with addition of carrier RNA. We recommend quantification of template DNA as this facilitates several downstream protocols. PMID:23071624

  12. Sensitivity analysis in multiple imputation in effectiveness studies of psychotherapy.

    Crameri, Aureliano; von Wyl, Agnes; Koemeda, Margit; Schulthess, Peter; Tschuschke, Volker

    2015-01-01

    The importance of preventing and treating incomplete data in effectiveness studies is nowadays emphasized. However, most of the publications focus on randomized clinical trials (RCT). One flexible technique for statistical inference with missing data is multiple imputation (MI). Since methods such as MI rely on the assumption of missing data being at random (MAR), a sensitivity analysis for testing the robustness against departures from this assumption is required. In this paper we present a sensitivity analysis technique based on posterior predictive checking, which takes into consideration the concept of clinical significance used in the evaluation of intra-individual changes. We demonstrate the possibilities this technique can offer with the example of irregular longitudinal data collected with the Outcome Questionnaire-45 (OQ-45) and the Helping Alliance Questionnaire (HAQ) in a sample of 260 outpatients. The sensitivity analysis can be used to (1) quantify the degree of bias introduced by missing not at random data (MNAR) in a worst reasonable case scenario, (2) compare the performance of different analysis methods for dealing with missing data, or (3) detect the influence of possible violations to the model assumptions (e.g., lack of normality). Moreover, our analysis showed that ratings from the patient's and therapist's version of the HAQ could significantly improve the predictive value of the routine outcome monitoring based on the OQ-45. Since analysis dropouts always occur, repeated measurements with the OQ-45 and the HAQ analyzed with MI are useful to improve the accuracy of outcome estimates in quality assurance assessments and non-randomized effectiveness studies in the field of outpatient psychotherapy.

  13. Common genotypes of hepatitis B virus

    Idrees, M.; Khan, S.; Riazuddin, S.

    2004-01-01

    Objective: To find out the frequency of common genotypes of hepatitis-B virus (HBV). Subjects and Methods: HBV genotypes were determined in 112 HBV DNA positive sera by a simple and precise molecular genotyping system base on PCR using type-specific primers for the determination of genotypes of HBV A through H. Results: Four genotypes (A,B,C and D) out of total eight reported genotypes so far were identified. Genotypes A, B and C were predominant. HBV genotype C was the most predominant in this collection, appearing in 46 samples (41.7%). However, the genotypes of a total of 5 (4.46%) samples could not be determined with the present genotyping system. Mixed genotypes were seen in 8(7.14% HBV) isolates. Five of these were infected with genotypes A/D whereas two were with genotypes C/D. One patient was infected with 4 genotypes (A/B/C/D). Genotype A (68%) was predominant in Sindh genotype C was most predominant in North West Frontier Province (NWFP) (68.96) whereas genotype C and B were dominant in Punjab (39.65% and 25.86% respectively). Conclusion: All the four common genotypes of HBV found worldwide (A,B,C and D) were isolated. Genotype C is the predominant Genotypes B and C are predominant in Punjab and N.W.F.P. whereas genotype A is predominant in Sindh. (author)

  14. ALIS-FLP: Amplified ligation selected fragment-length polymorphism method for microbial genotyping

    Brillowska-Dabrowska, A.; Wianecka, M.; Dabrowski, Slawomir

    2008-01-01

    A DNA fingerprinting method known as ALIS-FLP (amplified ligation selected fragment-length polymorphism) has been developed for selective and specific amplification of restriction fragments from TspRI restriction endonuclease digested genomic DNA. The method is similar to AFLP, but differs...

  15. Sequence imputation of HPV16 genomes for genetic association studies.

    Benjamin Smith

    Full Text Available Human Papillomavirus type 16 (HPV16 causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs is unknown. In addition, we neither know which nucleotides vary across and within HPV types and lineages, nor which of the single nucleotide polymorphisms (SNPs determine oncogenicity.A reference set of 62 HPV16 complete genome sequences was established and used to examine patterns of evolutionary relatedness amongst variants using a pairwise identity heatmap and HPV16 phylogeny. A BLAST-based algorithm was developed to impute complete genome data from partial sequence information using the reference database. To interrogate the oncogenic risk of determined and imputed HPV16 SNPs, odds-ratios for each SNP were calculated in a case-control viral genome-wide association study (VWAS using biopsy confirmed high-grade cervix neoplasia and self-limited HPV16 infections from Guanacaste, Costa Rica.HPV16 variants display evolutionarily stable lineages that contain conserved diagnostic SNPs. The imputation algorithm indicated that an average of 97.5±1.03% of SNPs could be accurately imputed. The VWAS revealed specific HPV16 viral SNPs associated with variant lineages and elevated odds ratios; however, individual causal SNPs could not be distinguished with certainty due to the nature of HPV evolution.Conserved and lineage-specific SNPs can be imputed with a high degree of accuracy from limited viral polymorphic data due to the lack of recombination and the stochastic mechanism of variation accumulation in the HPV genome. However, to determine the role of novel variants or non-lineage-specific SNPs by VWAS will require direct sequence analysis. The investigation of patterns of genetic variation and the identification of diagnostic SNPs for lineages of HPV16 variants provides a valuable resource for future studies of HPV16

  16. A fast and easy real-time PCR genotyping method for the HLA-G 14-bp insertion/deletion polymorphism in the 3' untranslated region

    Djurisic, S; Sørensen, A E; Hviid, T V F

    2012-01-01

    and reliable method to screen for the HLA-G 14-bp insertion/deletion polymorphism using an optimized real-time polymerase chain reaction protocol. The genotyping assay has been validated by comparison with conventional methods. As results can be obtained within a few hours, the assay will have a potential...

  17. Comparative study using phenotypic, genotypic and proteomics methods for identification of coagulase-negative staphylococci

    Dr. P.F.G. Wolffs; Ing M. Valkenburg; Dr. A.J.C. van den Brule, van den; M.Sc. A. Jansz; Drs A.J.M. Loonen; Ing J.N.B. Bergland

    2012-01-01

    Five methods were compared to determine the best technique for accurate identification of coagulase-negative staphylococci (CoNS) (n=142 strains). MALDI-TOF MS showed the best results for rapid and accurate CoNS differentiation (correct identity in 99.3%). An alternative to this approach could be

  18. Identification of Candida albicans and Candida dubliniensis Species Isolated from Bronchoalveolar Lavage Samples Using Genotypic and Phenotypic Methods

    Sahar Kianipour

    2018-01-01

    Full Text Available Background: Candida dubliniensis is a newly diagnosed species very similar to Candida albicans phenotypically and first discovered in the mouth of people with AIDS in 1995. Among the different phenotypic and genotypic methods, a cost-effective method should be selected which makes it possible to differentiate these similar species. Materials and Methods: Polymerase chain reaction (PCR-restriction fragment length polymorphism with MspI enzyme and the Duplex-PCR method were done by DNA extraction using boiling. The sequencing of the amplified ribosomal region was used to confirm the C. dubliniensis species. Direct examination and colony count of the yeasts were applied for bronchoalveolar lavage (BAL samples and the growth rate of the yeasts were studied at 45°C. To understand the ability formation of chlamydoconidia in yeast isolates, they were separately cultured on the sunflower seed agar, wheat flour agar, and corn meal agar media. Results: Fifty-nine (49.2% yeast colonies were identified from the total of 120 BAL specimens. Twenty-nine isolated yeasts; including 17 (58.6% of C. albicans/dubliniensis complex and 12 (41.4% of nonalbicans isolates produced pseudohypha or blastoconidia in direct smear with a mean colony count of 42000 CFU/mL. C. albicans with the frequency of 15 (42.9% were the most common isolated yeasts, whereas C. dubliniensis was identified in two nonHIV patients. Conclusion: Sequencing of the replicated gene fragment is the best method for identifying the yeasts, but the determination of the species by phenotypic methods such as the creation of chlamydoconidia in sunflower seeds agar and wheat flour agar media can be cost-effective, have sensitivity and acceptable quality.

  19. Multiple imputation to account for measurement error in marginal structural models

    Edwards, Jessie K.; Cole, Stephen R.; Westreich, Daniel; Crane, Heidi; Eron, Joseph J.; Mathews, W. Christopher; Moore, Richard; Boswell, Stephen L.; Lesko, Catherine R.; Mugavero, Michael J.

    2015-01-01

    Background Marginal structural models are an important tool for observational studies. These models typically assume that variables are measured without error. We describe a method to account for differential and non-differential measurement error in a marginal structural model. Methods We illustrate the method estimating the joint effects of antiretroviral therapy initiation and current smoking on all-cause mortality in a United States cohort of 12,290 patients with HIV followed for up to 5 years between 1998 and 2011. Smoking status was likely measured with error, but a subset of 3686 patients who reported smoking status on separate questionnaires composed an internal validation subgroup. We compared a standard joint marginal structural model fit using inverse probability weights to a model that also accounted for misclassification of smoking status using multiple imputation. Results In the standard analysis, current smoking was not associated with increased risk of mortality. After accounting for misclassification, current smoking without therapy was associated with increased mortality [hazard ratio (HR): 1.2 (95% CI: 0.6, 2.3)]. The HR for current smoking and therapy (0.4 (95% CI: 0.2, 0.7)) was similar to the HR for no smoking and therapy (0.4; 95% CI: 0.2, 0.6). Conclusions Multiple imputation can be used to account for measurement error in concert with methods for causal inference to strengthen results from observational studies. PMID:26214338

  20. Multiplexed genome engineering and genotyping methods applications for synthetic biology and metabolic engineering.

    Wang, Harris H; Church, George M

    2011-01-01

    Engineering at the scale of whole genomes requires fundamentally new molecular biology tools. Recent advances in recombineering using synthetic oligonucleotides enable the rapid generation of mutants at high efficiency and specificity and can be implemented at the genome scale. With these techniques, libraries of mutants can be generated, from which individuals with functionally useful phenotypes can be isolated. Furthermore, populations of cells can be evolved in situ by directed evolution using complex pools of oligonucleotides. Here, we discuss ways to utilize these multiplexed genome engineering methods, with special emphasis on experimental design and implementation. Copyright © 2011 Elsevier Inc. All rights reserved.

  1. Native nucleic acid electrophoresis as an efficient alternative for genotyping method of influenza virus.

    Pajak, Beata; Lepek, Krzysztof

    2014-01-01

    Influenza viruses are the worldwide major causative agents of human and animal acute respiratory infections. Some of the influenza subtypes have caused epidemics and pandemics among humans. The varieties of methods are available for the rapid isolation and identification of influenza viruses in clinical and environmental samples. Since nucleic acids amplification techniques such as RT-PCR have been adapted, fast and sensitive influenza type and subtype determination is possible. However, in some ambiguous cases other, more detailed assay might be desired. The genetic material of influenza virus is highly unstable and constantly mutates. It is known that single nucleotide polymorphisms (SNPs) results in resistance to commercially available anti-viral drugs. The genetic drift of the virus could also result in weakening of immune response to infection. Finally, in a substantial number of patients co-infection with various virus strains or types has been confirmed. Although the detection of co-infection or presence of minor genetic variants within flu-infected patients is not a routine procedure, a rapid and wide spectrum diagnostics of influenza virus infections could reveal an accurate picture of the disease and more importantly, is crucial for choosing the appropriate therapeutics and virus monitoring. Herein we present the evidences that native gel electrophoresis and MSSCP--a method based on multitemperature single strand conformation polymorphism could furnish a useful technique for minor variants, which escape discovery by conventional diagnostic assays.

  2. A PCR-based genotyping method to distinguish between wild-type and ornamental varieties of Imperata cylindrica.

    Cseke, Leland J; Talley, Sharon M

    2012-02-20

    Wild-type I. cylindrica (cogongrass) is one of the top ten worst invasive plants in the world, negatively impacting agricultural and natural resources in 73 different countries throughout Africa, Asia, Europe, New Zealand, Oceania and the Americas(1-2). Cogongrass forms rapidly-spreading, monodominant stands that displace a large variety of native plant species and in turn threaten the native animals that depend on the displaced native plant species for forage and shelter. To add to the problem, an ornamental variety [I. cylindrica var. koenigii (Retzius)] is widely marketed under the names of Imperata cylindrica 'Rubra', Red Baron, and Japanese blood grass (JBG). This variety is putatively sterile and noninvasive and is considered a desirable ornamental for its red-colored leaves. However, under the correct conditions, JBG can produce viable seed (Carol Holko, 2009 personal communication) and can revert to a green invasive form that is often indistinguishable from cogongrass as it takes on the distinguishing characteristics of the wild-type invasive variety(4) (Figure 1). This makes identification using morphology a difficult task even for well-trained plant taxonomists. Reversion of JBG to an aggressive green phenotype is also not a rare occurrence. Using sequence comparisons of coding and variable regions in both nuclear and chloroplast DNA, we have confirmed that JBG has reverted to the green invasive within the states of Maryland, South Carolina, and Missouri. JBG has been sold and planted in just about every state in the continental U.S. where there is not an active cogongrass infestation. The extent of the revert problem in not well understood because reverted plants are undocumented and often destroyed. Application of this molecular protocol provides a method to identify JBG reverts and can help keep these varieties from co-occurring and possibly hybridizing. Cogongrass is an obligate outcrosser and, when crossed with a different genotype, can produce

  3. Identification of Candida albicans and Candida dubliniensis Species Isolated from Bronchoalveolar Lavage Samples Using Genotypic and Phenotypic Methods.

    Kianipour, Sahar; Ardestani, Mohammad Emami; Dehghan, Parvin

    2018-01-01

    Candida dubliniensis is a newly diagnosed species very similar to Candida albicans phenotypically and first discovered in the mouth of people with AIDS in 1995. Among the different phenotypic and genotypic methods, a cost-effective method should be selected which makes it possible to differentiate these similar species. Polymerase chain reaction (PCR)-restriction fragment length polymorphism with MspI enzyme and the Duplex-PCR method were done by DNA extraction using boiling. The sequencing of the amplified ribosomal region was used to confirm the C. dubliniensis species. Direct examination and colony count of the yeasts were applied for bronchoalveolar lavage (BAL) samples and the growth rate of the yeasts were studied at 45°C. To understand the ability formation of chlamydoconidia in yeast isolates, they were separately cultured on the sunflower seed agar, wheat flour agar, and corn meal agar media. Fifty-nine (49.2%) yeast colonies were identified from the total of 120 BAL specimens. Twenty-nine isolated yeasts; including 17 (58.6%) of C. albicans / dubliniensis complex and 12 (41.4%) of nonalbicans isolates produced pseudohypha or blastoconidia in direct smear with a mean colony count of 42000 CFU/mL. C. albicans with the frequency of 15 (42.9%) were the most common isolated yeasts, whereas C. dubliniensis was identified in two nonHIV patients. Sequencing of the replicated gene fragment is the best method for identifying the yeasts, but the determination of the species by phenotypic methods such as the creation of chlamydoconidia in sunflower seeds agar and wheat flour agar media can be cost-effective, have sensitivity and acceptable quality.

  4. Inference of haplotypic phase and missing genotypes in polyploid organisms and variable copy number genomic regions

    Balding David J

    2008-12-01

    Full Text Available Abstract Background The power of haplotype-based methods for association studies, identification of regions under selection, and ancestral inference, is well-established for diploid organisms. For polyploids, however, the difficulty of determining phase has limited such approaches. Polyploidy is common in plants and is also observed in animals. Partial polyploidy is sometimes observed in humans (e.g. trisomy 21; Down's syndrome, and it arises more frequently in some human tissues. Local changes in ploidy, known as copy number variations (CNV, arise throughout the genome. Here we present a method, implemented in the software polyHap, for the inference of haplotype phase and missing observations from polyploid genotypes. PolyHap allows each individual to have a different ploidy, but ploidy cannot vary over the genomic region analysed. It employs a hidden Markov model (HMM and a sampling algorithm to infer haplotypes jointly in multiple individuals and to obtain a measure of uncertainty in its inferences. Results In the simulation study, we combine real haplotype data to create artificial diploid, triploid, and tetraploid genotypes, and use these to demonstrate that polyHap performs well, in terms of both switch error rate in recovering phase and imputation error rate for missing genotypes. To our knowledge, there is no comparable software for phasing a large, densely genotyped region of chromosome from triploids and tetraploids, while for diploids we found polyHap to be more accurate than fastPhase. We also compare the results of polyHap to SATlotyper on an experimentally haplotyped tetraploid dataset of 12 SNPs, and show that polyHap is more accurate. Conclusion With the availability of large SNP data in polyploids and CNV regions, we believe that polyHap, our proposed method for inferring haplotypic phase from genotype data, will be useful in enabling researchers analysing such data to exploit the power of haplotype-based analyses.

  5. Combining item response theory with multiple imputation to equate health assessment questionnaires.

    Gu, Chenyang; Gutman, Roee

    2017-09-01

    The assessment of patients' functional status across the continuum of care requires a common patient assessment tool. However, assessment tools that are used in various health care settings differ and cannot be easily contrasted. For example, the Functional Independence Measure (FIM) is used to evaluate the functional status of patients who stay in inpatient rehabilitation facilities, the Minimum Data Set (MDS) is collected for all patients who stay in skilled nursing facilities, and the Outcome and Assessment Information Set (OASIS) is collected if they choose home health care provided by home health agencies. All three instruments or questionnaires include functional status items, but the specific items, rating scales, and instructions for scoring different activities vary between the different settings. We consider equating different health assessment questionnaires as a missing data problem, and propose a variant of predictive mean matching method that relies on Item Response Theory (IRT) models to impute unmeasured item responses. Using real data sets, we simulated missing measurements and compared our proposed approach to existing methods for missing data imputation. We show that, for all of the estimands considered, and in most of the experimental conditions that were examined, the proposed approach provides valid inferences, and generally has better coverages, relatively smaller biases, and shorter interval estimates. The proposed method is further illustrated using a real data set. © 2016, The International Biometric Society.

  6. Multiple Imputation to Account for Measurement Error in Marginal Structural Models.

    Edwards, Jessie K; Cole, Stephen R; Westreich, Daniel; Crane, Heidi; Eron, Joseph J; Mathews, W Christopher; Moore, Richard; Boswell, Stephen L; Lesko, Catherine R; Mugavero, Michael J

    2015-09-01

    Marginal structural models are an important tool for observational studies. These models typically assume that variables are measured without error. We describe a method to account for differential and nondifferential measurement error in a marginal structural model. We illustrate the method estimating the joint effects of antiretroviral therapy initiation and current smoking on all-cause mortality in a United States cohort of 12,290 patients with HIV followed for up to 5 years between 1998 and 2011. Smoking status was likely measured with error, but a subset of 3,686 patients who reported smoking status on separate questionnaires composed an internal validation subgroup. We compared a standard joint marginal structural model fit using inverse probability weights to a model that also accounted for misclassification of smoking status using multiple imputation. In the standard analysis, current smoking was not associated with increased risk of mortality. After accounting for misclassification, current smoking without therapy was associated with increased mortality (hazard ratio [HR]: 1.2 [95% confidence interval [CI] = 0.6, 2.3]). The HR for current smoking and therapy [0.4 (95% CI = 0.2, 0.7)] was similar to the HR for no smoking and therapy (0.4; 95% CI = 0.2, 0.6). Multiple imputation can be used to account for measurement error in concert with methods for causal inference to strengthen results from observational studies.

  7. An Imputation Model for Dropouts in Unemployment Data

    Nilsson Petra

    2016-09-01

    Full Text Available Incomplete unemployment data is a fundamental problem when evaluating labour market policies in several countries. Many unemployment spells end for unknown reasons; in the Swedish Public Employment Service’s register as many as 20 percent. This leads to an ambiguity regarding destination states (employment, unemployment, retired, etc.. According to complete combined administrative data, the employment rate among dropouts was close to 50 for the years 1992 to 2006, but from 2007 the employment rate has dropped to 40 or less. This article explores an imputation approach. We investigate imputation models estimated both on survey data from 2005/2006 and on complete combined administrative data from 2005/2006 and 2011/2012. The models are evaluated in terms of their ability to make correct predictions. The models have relatively high predictive power.

  8. First detection of multiple knockdown resistance (kdr)-like mutations in voltage-gated sodium channel using three new genotyping methods in Anopheles sinensis from Guangxi Province, China.

    Tan, Wei L; Li, Chun X; Wang, Zhong M; Liu, Mei D; Dong, Yan D; Feng, Xiang Y; Wu, Zhi M; Guo, Xiao X; Xing, Dan; Zhang, Ying M; Wang, Zhong C; Zhao, Tong Y

    2012-09-01

    To investigate knockdown resistance (kdr)-like mutations associated with pyrethroid resistance in Anopheles sinensis (Wiedemann, 1828), from Guangxi province, southwest China, a segment of a sodium channel gene was sequenced and genotyped using three new genotyping assays. Direct sequencing revealed the presence of TTG-to-TCG and TG-to-TTT mutations at allele position L1014, which led to L1014S and L1014F substitutions in a few individual and two novel substitutions of N1013S and L1014W in two DNA templates. A low frequency of the kdr allele mostly in the heterozygous state of L1014S and L1014F was observed in this mosquito population. In this study, the genotyping of An. sinensis using three polymerase chain reaction-based methods generated consistent results, which agreed with the results of DNA sequencing. In total, 52 mosquitoes were genotyped using a direct sequencing assay. The number of mosquitoes and their genotypes were as follows: L/L = 24, L/S = 19, L/F = 8, and F/W = 1. The allelic frequency of L1014, 1014S, and 1014F were 72, 18, and 9%, respectively.

  9. Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the Arctic.

    Hopke, P K; Liu, C; Rubin, D B

    2001-03-01

    Many chemical and environmental data sets are complicated by the existence of fully missing values or censored values known to lie below detection thresholds. For example, week-long samples of airborne particulate matter were obtained at Alert, NWT, Canada, between 1980 and 1991, where some of the concentrations of 24 particulate constituents were coarsened in the sense of being either fully missing or below detection limits. To facilitate scientific analysis, it is appealing to create complete data by filling in missing values so that standard complete-data methods can be applied. We briefly review commonly used strategies for handling missing values and focus on the multiple-imputation approach, which generally leads to valid inferences when faced with missing data. Three statistical models are developed for multiply imputing the missing values of airborne particulate matter. We expect that these models are useful for creating multiple imputations in a variety of incomplete multivariate time series data sets.

  10. Detection of Fetomaternal Genotype Associations in Early-Onset Disorders: Evaluation of Different Methods and Their Application to Childhood Leukemia

    Jasmine Healy

    2010-01-01

    Full Text Available Several designs and analytical approaches have been proposed to dissect offspring from maternal genetic contributions to early-onset diseases. However, lack of parental controls halts the direct verification of the assumption of mating symmetry (MS required to assess maternally-mediated effects. In this study, we used simulations to investigate the performance of existing methods under mating asymmetry (MA when parents of controls are missing. Our results show that the log-linear, likelihood-based framework using a case-triad/case-control hybrid design provides valid tests for maternal genetic effects even under MA. Using this approach, we examined fetomaternal associations between 29 SNPs in 12 cell-cycle genes and childhood pre-B acute lymphoblastic leukemia (ALL. We identified putative fetomaternal effects at loci CDKN2A rs36228834 (P=.017 and CDKN2B rs36229158 (P=.022 that modulate the risk of childhood ALL. These data further corroborate the importance of the mother's genotype on the susceptibility to early-onset diseases.

  11. Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans

    Assaf Gottlieb

    2017-11-01

    Full Text Available Abstract Background Genome-wide association studies are useful for discovering genotype–phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into “gene level” effects. Methods Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression—on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. Results We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Conclusions Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort

  12. Defining, evaluating, and removing bias induced by linear imputation in longitudinal clinical trials with MNAR missing data.

    Helms, Ronald W; Reece, Laura Helms; Helms, Russell W; Helms, Mary W

    2011-03-01

    Missing not at random (MNAR) post-dropout missing data from a longitudinal clinical trial result in the collection of "biased data," which leads to biased estimators and tests of corrupted hypotheses. In a full rank linear model analysis the model equation, E[Y] = Xβ, leads to the definition of the primary parameter β = (X'X)(-1)X'E[Y], and the definition of linear secondary parameters of the form θ = Lβ = L(X'X)(-1)X'E[Y], including, for example, a parameter representing a "treatment effect." These parameters depend explicitly on E[Y], which raises the questions: What is E[Y] when some elements of the incomplete random vector Y are not observed and MNAR, or when such a Y is "completed" via imputation? We develop a rigorous, readily interpretable definition of E[Y] in this context that leads directly to definitions of β, Bias(β) = E[β] - β, Bias(θ) = E[θ] - Lβ, and the extent of hypothesis corruption. These definitions provide a basis for evaluating, comparing, and removing biases induced by various linear imputation methods for MNAR incomplete data from longitudinal clinical trials. Linear imputation methods use earlier data from a subject to impute values for post-dropout missing values and include "Last Observation Carried Forward" (LOCF) and "Baseline Observation Carried Forward" (BOCF), among others. We illustrate the methods of evaluating, comparing, and removing biases and the effects of testing corresponding corrupted hypotheses via a hypothetical but very realistic longitudinal analgesic clinical trial.

  13. Accounting for one-channel depletion improves missing value imputation in 2-dye microarray data.

    Ritz, Cecilia; Edén, Patrik

    2008-01-19

    For 2-dye microarray platforms, some missing values may arise from an un-measurably low RNA expression in one channel only. Information of such "one-channel depletion" is so far not included in algorithms for imputation of missing values. Calculating the mean deviation between imputed values and duplicate controls in five datasets, we show that KNN-based imputation gives a systematic bias of the imputed expression values of one-channel depleted spots. Evaluating the correction of this bias by cross-validation showed that the mean square deviation between imputed values and duplicates were reduced up to 51%, depending on dataset. By including more information in the imputation step, we more accurately estimate missing expression values.

  14. Treatments of Missing Values in Large National Data Affect Conclusions: The Impact of Multiple Imputation on Arthroplasty Research.

    Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Su, Edwin P; Grauer, Jonathan N

    2018-03-01

    Despite the advantages of large, national datasets, one continuing concern is missing data values. Complete case analysis, where only cases with complete data are analyzed, is commonly used rather than more statistically rigorous approaches such as multiple imputation. This study characterizes the potential selection bias introduced using complete case analysis and compares the results of common regressions using both techniques following unicompartmental knee arthroplasty. Patients undergoing unicompartmental knee arthroplasty were extracted from the 2005 to 2015 National Surgical Quality Improvement Program. As examples, the demographics of patients with and without missing preoperative albumin and hematocrit values were compared. Missing data were then treated with both complete case analysis and multiple imputation (an approach that reproduces the variation and associations that would have been present in a full dataset) and the conclusions of common regressions for adverse outcomes were compared. A total of 6117 patients were included, of which 56.7% were missing at least one value. Younger, female, and healthier patients were more likely to have missing preoperative albumin and hematocrit values. The use of complete case analysis removed 3467 patients from the study in comparison with multiple imputation which included all 6117 patients. The 2 methods of handling missing values led to differing associations of low preoperative laboratory values with commonly studied adverse outcomes. The use of complete case analysis can introduce selection bias and may lead to different conclusions in comparison with the statistically rigorous multiple imputation approach. Joint surgeons should consider the methods of handling missing values when interpreting arthroplasty research. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. FCMPSO: An Imputation for Missing Data Features in Heart Disease Classification

    Salleh, Mohd Najib Mohd; Ashikin Samat, Nurul

    2017-08-01

    The application of data mining and machine learning in directing clinical research into possible hidden knowledge is becoming greatly influential in medical areas. Heart Disease is a killer disease around the world, and early prevention through efficient methods can help to reduce the mortality number. Medical data may contain many uncertainties, as they are fuzzy and vague in nature. Nonetheless, imprecise features data such as no values and missing values can affect quality of classification results. Nevertheless, the other complete features are still capable to give information in certain features. Therefore, an imputation approach based on Fuzzy C-Means and Particle Swarm Optimization (FCMPSO) is developed in preprocessing stage to help fill in the missing values. Then, the complete dataset is trained in classification algorithm, Decision Tree. The experiment is trained with Heart Disease dataset and the performance is analysed using accuracy, precision, and ROC values. Results show that the performance of Decision Tree is increased after the application of FCMSPO for imputation.

  16. Using beta coefficients to impute missing correlations in meta-analysis research: Reasons for caution.

    Roth, Philip L; Le, Huy; Oh, In-Sue; Van Iddekinge, Chad H; Bobko, Philip

    2018-06-01

    Meta-analysis has become a well-accepted method for synthesizing empirical research about a given phenomenon. Many meta-analyses focus on synthesizing correlations across primary studies, but some primary studies do not report correlations. Peterson and Brown (2005) suggested that researchers could use standardized regression weights (i.e., beta coefficients) to impute missing correlations. Indeed, their beta estimation procedures (BEPs) have been used in meta-analyses in a wide variety of fields. In this study, the authors evaluated the accuracy of BEPs in meta-analysis. We first examined how use of BEPs might affect results from a published meta-analysis. We then developed a series of Monte Carlo simulations that systematically compared the use of existing correlations (that were not missing) to data sets that incorporated BEPs (that impute missing correlations from corresponding beta coefficients). These simulations estimated ρ̄ (mean population correlation) and SDρ (true standard deviation) across a variety of meta-analytic conditions. Results from both the existing meta-analysis and the Monte Carlo simulations revealed that BEPs were associated with potentially large biases when estimating ρ̄ and even larger biases when estimating SDρ. Using only existing correlations often substantially outperformed use of BEPs and virtually never performed worse than BEPs. Overall, the authors urge a return to the standard practice of using only existing correlations in meta-analysis. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  17. Improving genetic evaluation of litter size and piglet mortality for both genotyped and nongenotyped individuals using a single-step method.

    Guo, X; Christensen, O F; Ostersen, T; Wang, Y; Lund, M S; Su, G

    2015-02-01

    A single-step method allows genetic evaluation using information of phenotypes, pedigree, and markers from genotyped and nongenotyped individuals simultaneously. This paper compared genomic predictions obtained from a single-step BLUP (SSBLUP) method, a genomic BLUP (GBLUP) method, a selection index blending (SELIND) method, and a traditional pedigree-based method (BLUP) for total number of piglets born (TNB), litter size at d 5 after birth (LS5), and mortality rate before d 5 (Mort; including stillbirth) in Danish Landrace and Yorkshire pigs. Data sets of 778,095 litters from 309,362 Landrace sows and 472,001 litters from 190,760 Yorkshire sows were used for the analysis. There were 332,795 Landrace and 207,255 Yorkshire animals in the pedigree data, among which 3,445 Landrace pigs (1,366 boars and 2,079 sows) and 3,372 Yorkshire pigs (1,241 boars and 2,131 sows) were genotyped with the Illumina PorcineSNP60 BeadChip. The results showed that the 3 methods with marker information (SSBLUP, GBLUP, and SELIND) produced more accurate predictions for genotyped animals than the pedigree-based method. For genotyped animals, the average of reliabilities for all traits in both breeds using traditional BLUP was 0.091, which increased to 0.171 w+hen using GBLUP and to 0.179 when using SELIND and further increased to 0.209 when using SSBLUP. Furthermore, the average reliability of EBV for nongenotyped animals was increased from 0.091 for traditional BLUP to 0.105 for the SSBLUP. The results indicate that the SSBLUP is a good approach to practical genomic prediction of litter size and piglet mortality in Danish Landrace and Yorkshire populations.

  18. Desmanthus GENOTYPES

    JOSÉ HENRIQUE DE ALBUQUERQUE RANGEL

    2015-01-01

    Full Text Available Desmanthus is a genus of forage legumes with potential to improve pastures and livestock produc-tion on clay soils of dry tropical and subtropical regions such as the existing in Brazil and Australia. Despite this patterns of natural or enforced after-ripening of Desmanthus seeds have not been well established. Four year old seed banks of nine Desmanthus genotypes at James Cook University were accessed for their patterns of seed softe-ning in response to a range of temperatures. Persistent seed banks were found to exist under all of the studied ge-notypes. The largest seeds banks were found in the genotypes CPI 78373 and CPI 78382 and the smallest in the genotypes CPI’s 37143, 67643, and 83563. An increase in the percentage of softened seeds was correlated with higher temperatures, in two patterns of response: in some accessions seeds were not significantly affected by tempe-ratures below 80º C; and in others, seeds become soft when temperature rose to as little as 60 ºC. At 80 °C the heat started to depress germination. High seed production of Desmanthus associated with dependence of seeds on eleva-ted temperatures to softening can be a very important strategy for plants to survive in dry tropical regions.

  19. Development of a rapid multiplex SSR genotyping method to study populations of the fun-gal plant pathogen Zymoseptoria tritici

    Gautier, A.; Marcel, T.C.; Confais, J.; Crane, C.; Kema, G.H.J.; Suffert, F.; Walker, A.S.

    2014-01-01

    Background Zymoseptoria tritici is a hemibiotrophic ascomycete fungus causing leaf blotch of wheat that often decreases yield severely. Populations of the fungus are known to be highly diverse and poorly differentiated from each other. However, a genotyping tool is needed to address further

  20. Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals.

    Stram, Daniel O; Leigh Pearce, Celeste; Bretsky, Phillip; Freedman, Matthew; Hirschhorn, Joel N; Altshuler, David; Kolonel, Laurence N; Henderson, Brian E; Thomas, Duncan C

    2003-01-01

    The US National Cancer Institute has recently sponsored the formation of a Cohort Consortium (http://2002.cancer.gov/scpgenes.htm) to facilitate the pooling of data on very large numbers of people, concerning the effects of genes and environment on cancer incidence. One likely goal of these efforts will be generate a large population-based case-control series for which a number of candidate genes will be investigated using SNP haplotype as well as genotype analysis. The goal of this paper is to outline the issues involved in choosing a method of estimating haplotype-specific risk estimates for such data that is technically appropriate and yet attractive to epidemiologists who are already comfortable with odds ratios and logistic regression. Our interest is to develop and evaluate extensions of methods, based on haplotype imputation, that have been recently described (Schaid et al., Am J Hum Genet, 2002, and Zaykin et al., Hum Hered, 2002) as providing score tests of the null hypothesis of no effect of SNP haplotypes upon risk, which may be used for more complex tasks, such as providing confidence intervals, and tests of equivalence of haplotype-specific risks in two or more separate populations. In order to do so we (1) develop a cohort approach towards odds ratio analysis by expanding the E-M algorithm to provide maximum likelihood estimates of haplotype-specific odds ratios as well as genotype frequencies; (2) show how to correct the cohort approach, to give essentially unbiased estimates for population-based or nested case-control studies by incorporating the probability of selection as a case or control into the likelihood, based on a simplified model of case and control selection, and (3) finally, in an example data set (CYP17 and breast cancer, from the Multiethnic Cohort Study) we compare likelihood-based confidence interval estimates from the two methods with each other, and with the use of the single-imputation approach of Zaykin et al. applied under both

  1. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    I. Tachmazidou (Ioanna); Süveges, D. (Dániel); J. Min (Josine); G.R.S. Ritchie (Graham R.S.); Steinberg, J. (Julia); K. Walter (Klaudia); V. Iotchkova (Valentina); J.A. Schwartzentruber (Jeremy); J. Huang (Jian); Y. Memari (Yasin); McCarthy, S. (Shane); Crawford, A.A. (Andrew A.); C. Bombieri (Cristina); M. Cocca (Massimiliano); A.-E. Farmaki (Aliki-Eleni); T.R. Gaunt (Tom); P. Jousilahti (Pekka); M.N. Kooijman (Marjolein ); Lehne, B. (Benjamin); G. Malerba (Giovanni); S. Männistö (Satu); A. Matchan (Angela); M.C. Medina-Gomez (Carolina); S. Metrustry (Sarah); A. Nag (Abhishek); I. Ntalla (Ioanna); L. Paternoster (Lavinia); N.W. Rayner (Nigel William); C. Sala (Cinzia); W.R. Scott (William R.); H.A. Shihab (Hashem A.); L. Southam (Lorraine); B. St Pourcain (Beate); M. Traglia (Michela); K. Trajanoska (Katerina); Zaza, G. (Gialuigi); W. Zhang (Weihua); M.S. Artigas; Bansal, N. (Narinder); M. Benn (Marianne); Chen, Z. (Zhongsheng); P. Danecek (Petr); Lin, W.-Y. (Wei-Yu); A. Locke (Adam); J. Luan (Jian'An); A.K. Manning (Alisa); Mulas, A. (Antonella); C. Sidore (Carlo); A. Tybjaerg-Hansen; A. Varbo (Anette); M. Zoledziewska (Magdalena); C. Finan (Chris); Hatzikotoulas, K. (Konstantinos); A.E. Hendricks (Audrey E.); J.P. Kemp (John); A. Moayyeri (Alireza); Panoutsopoulou, K. (Kalliope); Szpak, M. (Michal); S.G. Wilson (Scott); M. Boehnke (Michael); F. Cucca (Francesco); Di Angelantonio, E. (Emanuele); C. Langenberg (Claudia); C.M. Lindgren (Cecilia M.); McCarthy, M.I. (Mark I.); A.P. Morris (Andrew); B.G. Nordestgaard (Børge); R.A. Scott (Robert); M.D. Tobin (Martin); N.J. Wareham (Nick); P.R. Burton (Paul); J.C. Chambers (John); Smith, G.D. (George Davey); G.V. Dedoussis (George); J.F. Felix (Janine); O.H. Franco (Oscar); Gambaro, G. (Giovanni); P. Gasparini (Paolo); C.J. Hammond (Christopher J.); A. Hofman (Albert); V.W.V. Jaddoe (Vincent); M.E. Kleber (Marcus); J.S. Kooner (Jaspal S.); M. Perola (Markus); C.L. Relton (Caroline); S.M. Ring (Susan); F. Rivadeneira Ramirez (Fernando); V. Salomaa (Veikko); T.D. Spector (Timothy); O. Stegle (Oliver); D. Toniolo (Daniela); A.G. Uitterlinden (André); I.E. Barroso (Inês); C.M.T. Greenwood (Celia); Perry, J.R.B. (John R.B.); Walker, B.R. (Brian R.); A.S. Butterworth (Adam); Y. Xue (Yali); R. Durbin (Richard); K.S. Small (Kerrin); N. Soranzo (Nicole); N.J. Timpson (Nicholas); E. Zeggini (Eleftheria)

    2016-01-01

    textabstractDeep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the

  2. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    Tachmazidou, Ioanna; Süveges, Dániel; Min, Josine L

    2017-01-01

    Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader alleli...

  3. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    2010-10-01

    ... money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS AND... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the representative...

  4. [Rapid, simple genotyping method by the variable numbers of tandem repeats (VNTR) for Mycobacterium tuberculosis isolates in Japan--analytical procedure of JATA (12)-VNTR].

    Maeda, Shinji; Murase, Yoshiro; Mitarai, Satoshi; Sugawara, Isamu; Kato, Seiya

    2008-10-01

    The discriminatory power of each locus in variable numbers of tandem repeats (VNTR) analyses was evaluated for development of the genotyping method of Mycobacterium tuberculosis (TB) in Japan. By using 325 TB strains collected from whole Japan and 24 mass infection cases (74 isolates), IS6110 restriction fragment length polymorphism (RFLP), spoligotyping and VNTR (35 loci) were analyzed. We excluded 4 loci (VNTRs 2163a, 3232, 3820, and 4120) and selected in top 12 loci (VNTRs 0424, 0960, 1955, 2074, 2163b, 2372, 2996, 3155, 3192, 3336, 4052, and 4156). The cluster rate of IS6110 RFLP was higher than that of 12-locus [Japan Anti-Tuberculosis Association (JATA)] VNTR. And in comparison of the discriminatory power of 12-locus JATA VNTR and that of Supply (15)-VNTR, the JATA (12)-VNTR was superior, even though less loci analyses. Therefore, this JATA (12)-VNTR could be used for TB genotyping in areas where Beijing strains are prevalent.

  5. [Imputing missing data in public health: general concepts and application to dichotomous variables].

    Hernández, Gilma; Moriña, David; Navarro, Albert

    The presence of missing data in collected variables is common in health surveys, but the subsequent imputation thereof at the time of analysis is not. Working with imputed data may have certain benefits regarding the precision of the estimators and the unbiased identification of associations between variables. The imputation process is probably still little understood by many non-statisticians, who view this process as highly complex and with an uncertain goal. To clarify these questions, this note aims to provide a straightforward, non-exhaustive overview of the imputation process to enable public health researchers ascertain its strengths. All this in the context of dichotomous variables which are commonplace in public health. To illustrate these concepts, an example in which missing data is handled by means of simple and multiple imputation is introduced. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.

  6. Imputing data that are missing at high rates using a boosting algorithm

    Cauthen, Katherine Regina [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Lambert, Gregory [Apple Inc., Cupertino, CA (United States); Ray, Jaideep [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Lefantzi, Sophia [Sandia National Lab. (SNL-CA), Livermore, CA (United States)

    2016-09-01

    Traditional multiple imputation approaches may perform poorly for datasets with high rates of missingness unless many m imputations are used. This paper implements an alternative machine learning-based approach to imputing data that are missing at high rates. Here, we use boosting to create a strong learner from a weak learner fitted to a dataset missing many observations. This approach may be applied to a variety of types of learners (models). The approach is demonstrated by application to a spatiotemporal dataset for predicting dengue outbreaks in India from meteorological covariates. A Bayesian spatiotemporal CAR model is boosted to produce imputations, and the overall RMSE from a k-fold cross-validation is used to assess imputation accuracy.

  7. Stem Cuttings as a Quick In Vitro Screening Method of Sodium Chloride Tolerance in Potato (Solanum) Genotypes

    Elhag, A. Z.; Mix-Wagnar, G.; Elbassam, N.; Horst, W.

    2008-01-01

    This study was conducted to find how far in vitro explants stem cuttings technique could be suitable for quick screening of NaCl tolerance solanum genotypes and to identify some aspects of their NaCl tolerance. Fifteen solanum genotypes were tested on four NaCl concentrations both in vitro and in vivo, two-node stem cuttings of in vitro produced explants were grown on Murashige and Skoog (MS) salts supplemented with four NaCl concentrations (0,40,80 and 120 mM) for six weeks in vitro. The other part of the in vitro grown explants were transplanted in Kick- Brauck- Manns pots containing sandy loam soil supplemented also with four NaCl concentration (0, 0.1, 0.2 and 0.3 NaCl, w/w) and grown further either for eight weeks or till harvest in a green house. Both experiments were in a completely randomized design with four replicates. The main stem length, shoot dry matter and tuber yield as well as mineral element (Na''+, K + , Ca''2''+ and Cl''-) were measured. The growth of all genotypes was affected by increasing of NaCl. There was a close correlation between growth response (length of explant main stem) in vitro and shoot dry matter and tuber yield in vivo (r=0.81** for dry matter and 0.72** for tuber yield. Na''+ and Cl''- concentrations in shoots wee inversely correlated with the vegetative growth (r=-0.73** for both in vitro and r=-0.89** and r=-0.88** in vivo, respectively). The genotypes showed varied ability to reduce the transport of Na''+ and Cl''- to the shoots, where by NaCl tolerant genotypes showed lower content of both elements than the sensitive ones. K''+ and Ca''2''+ concentrations were decreased with increasing NaCl concentration. The responses for mineral element (Na''+ and Cl - ) accumulation or restriction of explants in vitro and intact plants in vivo were also closely correlated (r=0.79** and 0.71**, respectively) especially at the medium NaCl concentrations (80 mM and 0.2% NaCl). The similar response of the explant and the intact plant

  8. Genotyping-by-Sequencing and Its Exploitation for Forage and Cool-Season Grain Legume Breeding

    Annicchiarico, Paolo; Nazzicari, Nelson; Wei, Yanling; Pecetti, Luciano; Brummer, Edward C.

    2017-01-01

    Genotyping-by-Sequencing (GBS) may drastically reduce genotyping costs compared with single nucleotide polymorphism (SNP) array platforms. However, it may require optimization for specific crops to maximize the number of available markers. Exploiting GBS-generated markers may require optimization, too (e.g., to cope with missing data). This study aimed (i) to compare elements of GBS protocols on legume species that differ for genome size, ploidy, and breeding system, and (ii) to show successful applications and challenges of GBS data on legume species. Preliminary work on alfalfa and Medicago truncatula suggested the greater interest of ApeKI over PstI:MspI DNA digestion. We compared KAPA and NEB Taq polymerases in combination with primer extensions that were progressively more selective on restriction sites, and found greater number of polymorphic SNP loci in pea, white lupin and diploid alfalfa when adopting KAPA with a non-selective primer. This protocol displayed a slight advantage also for tetraploid alfalfa (where SNP calling requires higher read depth). KAPA offered the further advantage of more uniform amplification than NEB over fragment sizes and GC contents. The number of GBS-generated polymorphic markers exceeded 6,500 in two tetraploid alfalfa reference populations and a world collection of lupin genotypes, and 2,000 in different sets of pea or lupin recombinant inbred lines. The predictive ability of GBS-based genomic selection was influenced by the genotype missing data threshold and imputation, as well as by the genomic selection model, with the best model depending on traits and data sets. We devised a simple method for comparing phenotypic vs. genomic selection in terms of predicted yield gain per year for same evaluation costs, whose application to preliminary data for alfalfa and pea in a hypothetical selection scenario for each crop indicated a distinct advantage of genomic selection. PMID:28536584

  9. Genotyping-by-Sequencing and Its Exploitation for Forage and Cool-Season Grain Legume Breeding

    Paolo Annicchiarico

    2017-05-01

    Full Text Available Genotyping-by-Sequencing (GBS may drastically reduce genotyping costs compared with single nucleotide polymorphism (SNP array platforms. However, it may require optimization for specific crops to maximize the number of available markers. Exploiting GBS-generated markers may require optimization, too (e.g., to cope with missing data. This study aimed (i to compare elements of GBS protocols on legume species that differ for genome size, ploidy, and breeding system, and (ii to show successful applications and challenges of GBS data on legume species. Preliminary work on alfalfa and Medicago truncatula suggested the greater interest of ApeKI over PstI:MspI DNA digestion. We compared KAPA and NEB Taq polymerases in combination with primer extensions that were progressively more selective on restriction sites, and found greater number of polymorphic SNP loci in pea, white lupin and diploid alfalfa when adopting KAPA with a non-selective primer. This protocol displayed a slight advantage also for tetraploid alfalfa (where SNP calling requires higher read depth. KAPA offered the further advantage of more uniform amplification than NEB over fragment sizes and GC contents. The number of GBS-generated polymorphic markers exceeded 6,500 in two tetraploid alfalfa reference populations and a world collection of lupin genotypes, and 2,000 in different sets of pea or lupin recombinant inbred lines. The predictive ability of GBS-based genomic selection was influenced by the genotype missing data threshold and imputation, as well as by the genomic selection model, with the best model depending on traits and data sets. We devised a simple method for comparing phenotypic vs. genomic selection in terms of predicted yield gain per year for same evaluation costs, whose application to preliminary data for alfalfa and pea in a hypothetical selection scenario for each crop indicated a distinct advantage of genomic selection.

  10. Rapid and sensitive method to identify Mycobacterium avium subsp. paratuberculosis in cow's milk by DNA methylase genotyping.

    Mundo, Silvia Leonor; Gilardoni, Liliana Rosa; Hoffman, Federico José; Lopez, Osvaldo Jorge

    2013-03-01

    Paratuberculosis is an infectious, chronic, and incurable disease that affects ruminants, caused by Mycobacterium avium subsp. paratuberculosis. This bacterium is shed primarily through feces of infected cows but can be also excreted in colostrum and milk and might survive pasteurization. Since an association of genomic sequences of M. avium subsp. paratuberculosis in patients with Crohn's disease has been described; it is of interest to rapidly detect M. avium subsp. paratuberculosis in milk for human consumption. IS900 insertion is used as a target for PCR amplification to identify the presence of M. avium subsp. paratuberculosis in biological samples. Two target sequences were selected: IS1 (155 bp) and IS2 (94 bp). These fragments have a 100% identity among all M. avium subsp. paratuberculosis strains sequenced. M. avium subsp. paratuberculosis was specifically concentrated from milk samples by immunomagnetic separation prior to performing PCR. The amplicons were characterized using DNA methylase Genotyping, i.e., the amplicons were methylated with 6-methyl-adenine and digested with restriction enzymes to confirm their identity. The methylated amplicons from 100 CFU of M. avium subsp. paratuberculosis can be visualized in a Western blot format using an anti-6-methyl-adenine monoclonal antibody. The use of DNA methyltransferase genotyping coupled to a scintillation proximity assay allows for the detection of up to 10 CFU of M. avium subsp. paratuberculosis per ml of milk. This test is rapid and sensitive and allows for automation and thus multiple samples can be tested at the same time.

  11. [Establishment of a novel HLA genotyping method for preimplantation genetic diagnonis using multiple displacement amplification-polymerase chain reaction-sequencing based technique].

    Zhang, Yinfeng; Luo, Haining; Zhang, Yunshan

    2015-12-01

    To establish a novel HLA genotyping method for preimplantation genetic diagnonis (PGD) using multiple displacement amplification-polymerase chain reaction-sequencing based technique (MDA-PCR-SBT). Peripheral blood samples and 76 1PN, 2PN, 3PN discarded embryos from 9 couples were collected. The alleles of HLA-A, B, DR loci were detected from the MDA product with the PCR-SBT method. The HLA genotypes of the parental peripheral blood samples were analyzed with the same protocol. The genotypes of specific HLA region were evaluated for distinguishing the segregation of haplotypes among the family members, and primary HLA matching was performed between the embryos. The 76 embryos were subjected to MDA and 74 (97.4%) were successfully amplified. For the 34 embryos from the single blastomere group, the amplification rate was 94.1%, and for the 40 embryos in the two blastomeres group, the rate was 100%. The dropout rates for DQ allele and DR allele were 1.3% and 0, respectively. The positive rate for MDA in the single blastomere group was 100%, with the dropout rates for DQ allele and DR allele being 1.5% and 0, respectively. The positive rate of MDA for the two blastomere group was 100%, with the dropout rates for both DQ and DR alleles being 0. The recombination rate of fetal HLA was 20.2% (30/148). Due to the improper classification and abnormal fertilized embryos, the proportion of matched embryos HLA was 20.3% (15/74),which was lower than the theoretical value of 25%. PGD with HLA matching can facilitate creation of a HLA-identical donor (saviour child) for umbilical cord blood or bone marrow stem cells for its affected sibling with a genetic disease. Therefore, preimplantation HLA matching may provide a tool for couples desiring to conceive a potential donor progeny for transplantation for its sibling with a life-threatening disorder.

  12. Application of a high-throughput genotyping method for loci exclusion in non-consanguineous Australian pedigrees with autosomal recessive retinitis pigmentosa.

    Paterson, Rachel L; De Roach, John N; McLaren, Terri L; Hewitt, Alex W; Hoffmann, Ling; Lamey, Tina M

    2012-01-01

    Retinitis pigmentosa (RP) is the most common form of inherited blindness, caused by progressive degeneration of photoreceptor cells in the retina, and affects approximately 1 in 3,000 people. Over the past decade, significant progress has been made in gene therapy for RP and related diseases, making genetic characterization increasingly important. Recently, high-throughput technologies have provided an option for reasonably fast, cost-effective genetic characterization of autosomal recessive RP (arRP). The current study used a single nucleotide polymorphism (SNP) genotyping method to exclude up to 28 possible disease-causing genes in 31 non-consanguineous Australian families affected by arRP. DNA samples were collected from 59 individuals affected with arRP and 74 unaffected family members from 31 Australian families. Five to six SNPs were genotyped for 28 genes known to cause arRP or the related disease Leber congenital amaurosis (LCA). Cosegregation analyses were used to exclude possible causative genes from each of the 31 families. Bidirectional sequencing was used to identify disease-causing mutations in prioritized genes that were not excluded with cosegregation analyses. Two families were excluded from analysis due to identification of false paternity. An average of 28.9% of genes were excluded per family when only one affected individual was available, in contrast to an average of 71.4% or 89.8% of genes when either two, or three or more affected individuals were analyzed, respectively. A statistically significant relationship between the proportion of genes excluded and the number of affected individuals analyzed was identified using a multivariate regression model (pA) and USH2A in two families (c.2276 G>T). This study has shown that SNP genotyping cosegregation analysis can be successfully used to refine and expedite the genetic characterization of arRP in a non-consanguineous population; however, this method is effective only when DNA samples are

  13. Multiple imputation of rainfall missing data in the Iberian Mediterranean context

    Miró, Juan Javier; Caselles, Vicente; Estrela, María José

    2017-11-01

    Given the increasing need for complete rainfall data networks, in recent years have been proposed diverse methods for filling gaps in observed precipitation series, progressively more advanced that traditional approaches to overcome the problem. The present study has consisted in validate 10 methods (6 linear, 2 non-linear and 2 hybrid) that allow multiple imputation, i.e., fill at the same time missing data of multiple incomplete series in a dense network of neighboring stations. These were applied for daily and monthly rainfall in two sectors in the Júcar River Basin Authority (east Iberian Peninsula), which is characterized by a high spatial irregularity and difficulty of rainfall estimation. A classification of precipitation according to their genetic origin was applied as pre-processing, and a quantile-mapping adjusting as post-processing technique. The results showed in general a better performance for the non-linear and hybrid methods, highlighting that the non-linear PCA (NLPCA) method outperforms considerably the Self Organizing Maps (SOM) method within non-linear approaches. On linear methods, the Regularized Expectation Maximization method (RegEM) was the best, but far from NLPCA. Applying EOF filtering as post-processing of NLPCA (hybrid approach) yielded the best results.

  14. RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

    Kim, Ji-Sung; Gao, Xin; Rzhetsky, Andrey

    2018-01-01

    are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race

  15. Simple nuclear norm based algorithms for imputing missing data and forecasting in time series

    Butcher, Holly Louise; Gillard, Jonathan William

    2017-01-01

    There has been much recent progress on the use of the nuclear norm for the so-called matrix completion problem (the problem of imputing missing values of a matrix). In this paper we investigate the use of the nuclear norm for modelling time series, with particular attention to imputing missing data and forecasting. We introduce a simple alternating projections type algorithm based on the nuclear norm for these tasks, and consider a number of practical examples.

  16. Missing Value Imputation Based on Gaussian Mixture Model for the Internet of Things

    Xiaobo Yan

    2015-01-01

    Full Text Available This paper addresses missing value imputation for the Internet of Things (IoT. Nowadays, the IoT has been used widely and commonly by a variety of domains, such as transportation and logistics domain and healthcare domain. However, missing values are very common in the IoT for a variety of reasons, which results in the fact that the experimental data are incomplete. As a result of this, some work, which is related to the data of the IoT, can’t be carried out normally. And it leads to the reduction in the accuracy and reliability of the data analysis results. This paper, for the characteristics of the data itself and the features of missing data in IoT, divides the missing data into three types and defines three corresponding missing value imputation problems. Then, we propose three new models to solve the corresponding problems, and they are model of missing value imputation based on context and linear mean (MCL, model of missing value imputation based on binary search (MBS, and model of missing value imputation based on Gaussian mixture model (MGI. Experimental results showed that the three models can improve the accuracy, reliability, and stability of missing value imputation greatly and effectively.

  17. An Accurate Method for Inferring Relatedness in Large Datasets of Unphased Genotypes via an Embedded Likelihood-Ratio Test

    Rodriguez, Jesse M.; Batzoglou, Serafim; Bercovici, Sivan

    2013-01-01

    , accurate and efficient detection of hidden relatedness becomes a challenge. To enable disease-mapping studies of increasingly large cohorts, a fast and accurate method to detect IBD segments is required. We present PARENTE, a novel method for detecting

  18. Nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks.

    Li, YuanYuan; Parker, Lynne E

    2014-01-01

    Missing data is common in Wireless Sensor Networks (WSNs), especially with multi-hop communications. There are many reasons for this phenomenon, such as unstable wireless communications, synchronization issues, and unreliable sensors. Unfortunately, missing data creates a number of problems for WSNs. First, since most sensor nodes in the network are battery-powered, it is too expensive to have the nodes retransmit missing data across the network. Data re-transmission may also cause time delays when detecting abnormal changes in an environment. Furthermore, localized reasoning techniques on sensor nodes (such as machine learning algorithms to classify states of the environment) are generally not robust enough to handle missing data. Since sensor data collected by a WSN is generally correlated in time and space, we illustrate how replacing missing sensor values with spatially and temporally correlated sensor values can significantly improve the network's performance. However, our studies show that it is important to determine which nodes are spatially and temporally correlated with each other. Simple techniques based on Euclidean distance are not sufficient for complex environmental deployments. Thus, we have developed a novel Nearest Neighbor (NN) imputation method that estimates missing data in WSNs by learning spatial and temporal correlations between sensor nodes. To improve the search time, we utilize a k d-tree data structure, which is a non-parametric, data-driven binary search tree. Instead of using traditional mean and variance of each dimension for k d-tree construction, and Euclidean distance for k d-tree search, we use weighted variances and weighted Euclidean distances based on measured percentages of missing data. We have evaluated this approach through experiments on sensor data from a volcano dataset collected by a network of Crossbow motes, as well as experiments using sensor data from a highway traffic monitoring application. Our experimental

  19. A MITE-based genotyping method to reveal hundreds of DNA polymorphisms in an animal genome after a few generations of artificial selection

    Tetreau Guillaume

    2008-10-01

    Full Text Available Abstract Background For most organisms, developing hundreds of genetic markers spanning the whole genome still requires excessive if not unrealistic efforts. In this context, there is an obvious need for methodologies allowing the low-cost, fast and high-throughput genotyping of virtually any species, such as the Diversity Arrays Technology (DArT. One of the crucial steps of the DArT technique is the genome complexity reduction, which allows obtaining a genomic representation characteristic of the studied DNA sample and necessary for subsequent genotyping. In this article, using the mosquito Aedes aegypti as a study model, we describe a new genome complexity reduction method taking advantage of the abundance of miniature inverted repeat transposable elements (MITEs in the genome of this species. Results Ae. aegypti genomic representations were produced following a two-step procedure: (1 restriction digestion of the genomic DNA and simultaneous ligation of a specific adaptor to compatible ends, and (2 amplification of restriction fragments containing a particular MITE element called Pony using two primers, one annealing to the adaptor sequence and one annealing to a conserved sequence motif of the Pony element. Using this protocol, we constructed a library comprising more than 6,000 DArT clones, of which at least 5.70% were highly reliable polymorphic markers for two closely related mosquito strains separated by only a few generations of artificial selection. Within this dataset, linkage disequilibrium was low, and marker redundancy was evaluated at 2.86% only. Most of the detected genetic variability was observed between the two studied mosquito strains, but individuals of the same strain could still be clearly distinguished. Conclusion The new complexity reduction method was particularly efficient to reveal genetic polymorphisms in Ae. egypti. Overall, our results testify of the flexibility of the DArT genotyping technique and open new

  20. Estimating Classification Errors Under Edit Restrictions in Composite Survey-Register Data Using Multiple Imputation Latent Class Modelling (MILC

    Boeschoten Laura

    2017-12-01

    Full Text Available Both registers and surveys can contain classification errors. These errors can be estimated by making use of a composite data set. We propose a new method based on latent class modelling to estimate the number of classification errors across several sources while taking into account impossible combinations with scores on other variables. Furthermore, the latent class model, by multiply imputing a new variable, enhances the quality of statistics based on the composite data set. The performance of this method is investigated by a simulation study, which shows that whether or not the method can be applied depends on the entropy R2 of the latent class model and the type of analysis a researcher is planning to do. Finally, the method is applied to public data from Statistics Netherlands.

  1. Evaluation of phenotypic and genotypic methods for epidemiological typing of Staphylococcus aureus isolates from bovine mastitis in Denmark

    Aarestrup, Frank Møller; Wegener, H. C.; Rosdahl, V. T.

    1995-01-01

    The value of five different typing methods (antibiogram typing, biotyping, phage typing, plasmid profiling and restriction fragment length polymorphism of the gene encoding 16S and 23S ribosomal RNA (ribotyping)), in discriminating 105 Staphylococcus aureus strains from bovine milk samples obtained....... The combination of phage, bio- or ribotyping or all three methods in combination are considered to be an efficient combination of typing methods for epidemiological investigation of S. aureus mastitis....

  2. Determination of antimicrobial resistance of Enterococcus strains isolated from pigs and their genotypic characterization by method of amplification of DNA fragments surrounding rare restriction sites (ADSRRS fingerprinting).

    Nowakiewicz, Aneta; Ziółkowska, Grażyna; Trościańczyk, Aleksandra; Zięba, Przemysław; Gnat, Sebastian

    2017-03-01

    In this study, we analysed phenotypic resistance profiles and their reflection in the genomic profiles of Enterococcus spp. strains isolated from pigs raised on different farms. Samples were collected from five pig farms (n=90 animals) and tested for Enterococcus. MICs of 12 antimicrobials were determined using the broth microdilution method, and epidemiological molecular analysis of strains belonging to selected species (faecalis, faecium and hirae) was performed using the ADSRRS-fingerprinting (amplification of DNA fragments surrounding rare restriction sites) method with a few modifications. The highest percentage of strains was resistant to tetracycline (73.4 %), erythromycin and tylosin (42.5 %) and rifampin (25.2 %), and a large number of strains exhibited high-level resistance to both kanamycin (25.2 %) and streptomycin (27.6 %). The strains of E. faecalis, E. faecium and E. hirae (n=184) revealed varied phenotypic resistance profiles, among which as many as seven met the criteria for multidrug resistance (30.4 % of strains tested). ADSRRS-fingerprinting analysis produced 17 genotypic profiles of individual strains which were correlated with their phenotypic resistance profiles. Only E. hirae strains susceptible to all of the chemotherapeutics tested had two different ADSRRS profiles. Moreover, eight animals were carriers of more than one genotype belonging to the same Enterococcus spp., mainly E. faecalis. Given the possibility of transmission to humans of the high-resistance/multidrug resistance enterococci and the significant role of pigs as food animals in this process, it is necessary to introduce a multilevel control strategy by carrying out research on the resistance and molecular characteristics of indicator bacterial strains isolated from animals on individual farms.

  3. Multiple imputation for estimating the risk of developing dementia and its impact on survival.

    Yu, Binbing; Saczynski, Jane S; Launer, Lenore

    2010-10-01

    Dementia, Alzheimer's disease in particular, is one of the major causes of disability and decreased quality of life among the elderly and a leading obstacle to successful aging. Given the profound impact on public health, much research has focused on the age-specific risk of developing dementia and the impact on survival. Early work has discussed various methods of estimating age-specific incidence of dementia, among which the illness-death model is popular for modeling disease progression. In this article we use multiple imputation to fit multi-state models for survival data with interval censoring and left truncation. This approach allows semi-Markov models in which survival after dementia depends on onset age. Such models can be used to estimate the cumulative risk of developing dementia in the presence of the competing risk of dementia-free death. Simulations are carried out to examine the performance of the proposed method. Data from the Honolulu Asia Aging Study are analyzed to estimate the age-specific and cumulative risks of dementia and to examine the effect of major risk factors on dementia onset and death.

  4. Evaluation of the Abbott Real Time HCV genotype II assay for Hepatitis C virus genotyping.

    Sariguzel, Fatma Mutlu; Berk, Elife; Gokahmetoglu, Selma; Ercal, Baris Derya; Celik, Ilhami

    2015-01-01

    The determination of HCV genotypes and subtypes is very important for the selection of antiviral therapy and epidemiological studies. The aim of this study was to evaluate the performance of Abbott Real Time HCV Genotype II assay in HCV genotyping of HCV infected patients in Kayseri, Turkey. One hundred patients with chronic hepatitis C admitted to our hospital were evaluated between June 2012 and December 2012, HCV RNA levels were determined by the COBAS® AmpliPrep/COBAS® TaqMan® 48 HCV test. HCV genotyping was investigated by the Abbott Real Time HCV Genotype II assay. With the exception of genotype 1, subtypes of HCV genotypes could not be determined by Abbott assay. Sequencing analysis was used as the reference method. Genotypes 1, 2, 3 and 4 were observed in 70, 4, 2 and 24 of the 100 patients, respectively, by two methods. The concordance between the two systems to determine HCV major genotypes was 100%. Of 70 patients with genotype 1, 66 showed infection with subtype 1b and 4 with subtype 1a by Abbott Real Time HCV Genotype II assay. Using sequence analysis, 61 showed infection with subtype 1b and 9 with subtype 1a. In determining of HCV genotype 1 subtypes, the difference between the two methods was not statistically significant (P>0.05). HCV genotype 4 and 3 samples were found to be subtype 4d and 3a, respectively, by sequence analysis. There were four patients with genotype 2. Sequence analysis revealed that two of these patients had type 2a and the other two had type 2b. The Abbott Real Time HCV Genotype II assay yielded results consistent with sequence analysis. However, further optimization of the Abbott Real Time HCV Genotype II assay for subtype identification of HCV is required.

  5. Determination of Molecular Genotyping of Ureaplasma SPP in Women with Genital Infections by 16S–23S rDNA PCR-RFLP Method

    R. Mirnejad

    2011-04-01

    Full Text Available Introduction & Objective: So far, despite the wide range of methods such as analytic methods used for differentiation of Mycoplasma, the diagnosis of Mycoplasma species is still difficult. Generally the low-level discriminatory power of serological methods because of the rapid changes in size and phase of the dominant antigens in the immune cell surface of Mycoplasmas greatly limits their applicability to the typing of Mycoplasmas. On the contrary,molecular methods do not suffer from these drawbacks and can be used for typing of Mycoplasmas. The aim of this investigation was molecular identification and genotyping of ureaplasma SPP in women with genital infections by 16S–23S rDNA PCR-RFLP.Materials & Methods: Genital swabs were taken from 210 patients who referred to gynecology clinic of Rasool hospital in Tehran, Iran during December 2007 until June 2008. The swabs suspended in PBS, were immediately transferred to laboratory .Following DNA extraction, PCR assay was performed using a genus specific primer pair. These primer sets amplified a 559 bp fragment for Ureaplasma Spp. Samples containing bands of the expected size for Ureaplasma strains were subjected to digestion with different restriction endonuclease enzymes (AluI, Taq I, CacI8, BbsI, EcoRI. Results: Of the 210 samples, Ureaplasma Spp was isolated from 93 patients (44.3% by PCR and 69 samples by culture. In the present study only Biovar 1 (Ureaplasma parvum was isolated from clinical specimens and the results were confirmed using a cutting enzyme TaqI (enzyme specific species of ureaplasma SPP. The results of this analysis using PCR-RFLP and sequencing showed that all had the same genotype and shared identical sequence with the genome sequence of serovar 3 Ureaplasma parvum.Conclusion: Ureaplasma parvum is generally isolated from the genital samples. In this study all isolates were identical and no difference was found among the enzyme patterns of the bacteria after PCR-RFLP .So

  6. RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning.

    Ji-Sung Kim

    2018-04-01

    Full Text Available Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest, support vector machines, and gradient-boosted decision trees. RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error, precision, recall, and area under the curve for receiver operating characteristic plots (all p < 10-9. We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1 a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2 uneven accessibility and subjective importance of prophylactic health, (3 possible variation in lifestyle, such as dietary habits, and (4 differences in background genetic variation which predispose to diseases.

  7. RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

    Kim, Ji-Sung

    2018-04-26

    Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest, support vector machines, and gradient-boosted decision trees). RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error), precision, recall, and area under the curve for receiver operating characteristic plots (all p < 10-9). We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1) a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2) uneven accessibility and subjective importance of prophylactic health, (3) possible variation in lifestyle, such as dietary habits, and (4) differences in background genetic variation which predispose to diseases.

  8. Identification and epidemiological typing of Campylobacter hyointestinalis subspecies by phenotypic and genotypic methods and description of novel subgroups

    On, Stephan L.W.; Vandamme, P.

    1997-01-01

    of this taxon. Two novel, distinct groups of C. hyointestinalis-like bacteria, originally isolated from the cloacae of Canada geese and human diarrhoeic stools, were also identified by each of the methods used. This appears to be the first report confirming the presence of C. hyointestinalis-like strains from...

  9. New insights into the pharmacogenomics of antidepressant response from the GENDEP and STAR*D studies: rare variant analysis and high-density imputation.

    Fabbri, C; Tansey, K E; Perlis, R H; Hauser, J; Henigsberg, N; Maier, W; Mors, O; Placentino, A; Rietschel, M; Souery, D; Breen, G; Curtis, C; Sang-Hyuk, L; Newhouse, S; Patel, H; Guipponi, M; Perroud, N; Bondolfi, G; O'Donovan, M; Lewis, G; Biernacka, J M; Weinshilboum, R M; Farmer, A; Aitchison, K J; Craig, I; McGuffin, P; Uher, R; Lewis, C M

    2017-11-21

    Genome-wide association studies have generally failed to identify polymorphisms associated with antidepressant response. Possible reasons include limited coverage of genetic variants that this study tried to address by exome genotyping and dense imputation. A meta-analysis of Genome-Based Therapeutic Drugs for Depression (GENDEP) and Sequenced Treatment Alternatives to Relieve Depression (STAR*D) studies was performed at the single-nucleotide polymorphism (SNP), gene and pathway levels. Coverage of genetic variants was increased compared with previous studies by adding exome genotypes to previously available genome-wide data and using the Haplotype Reference Consortium panel for imputation. Standard quality control was applied. Phenotypes were symptom improvement and remission after 12 weeks of antidepressant treatment. Significant findings were investigated in NEWMEDS consortium samples and Pharmacogenomic Research Network Antidepressant Medication Pharmacogenomic Study (PGRN-AMPS) for replication. A total of 7062 950 SNPs were analyzed in GENDEP (n=738) and STAR*D (n=1409). rs116692768 (P=1.80e-08, ITGA9 (integrin α9)) and rs76191705 (P=2.59e-08, NRXN3 (neurexin 3)) were significantly associated with symptom improvement during citalopram/escitalopram treatment. At the gene level, no consistent effect was found. At the pathway level, the Gene Ontology (GO) terms GO: 0005694 (chromosome) and GO: 0044427 (chromosomal part) were associated with improvement (corrected P=0.007 and 0.045, respectively). The association between rs116692768 and symptom improvement was replicated in PGRN-AMPS (P=0.047), whereas rs76191705 was not. The two SNPs did not replicate in NEWMEDS. ITGA9 codes for a membrane receptor for neurotrophins and NRXN3 is a transmembrane neuronal adhesion receptor involved in synaptic differentiation. Despite their meaningful biological rationale for being involved in antidepressant effect, replication was partial. Further studies may help in clarifying

  10. A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy.

    Wickland, Daniel P; Battu, Gopal; Hudson, Karen A; Diers, Brian W; Hudson, Matthew E

    2017-12-28

    Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this method produces extensive missing data and requires complex bioinformatics analysis. GBS is most commonly used on crop plant genomes, and because crop plants have highly variable ploidy and repeat content, the performance of GBS analysis software can vary by target organism. Here we focus our analysis on soybean, a polyploid crop with a highly duplicated genome, relatively little public GBS data and few dedicated tools. We compared the performance of five GBS pipelines using low-coverage Illumina sequence data from three soybean populations. To address issues identified with existing methods, we developed GB-eaSy, a GBS bioinformatics workflow that incorporates widely used genomics tools, parallelization and automation to increase the accuracy and accessibility of GBS data analysis. Compared to other GBS pipelines, GB-eaSy rapidly and accurately identified the greatest number of SNPs, with SNP calls closely concordant with whole-genome sequencing of selected lines. Across all five GBS analysis platforms, SNP calls showed unexpectedly low convergence but generally high accuracy, indicating that the workflows arrived at largely complementary sets of valid SNP calls on the low-coverage data analyzed. We show that GB-eaSy is approximately as good as, or better than, other leading software solutions in the accuracy, yield and missing data fraction of variant calling, as tested on low-coverage genomic data from soybean. It also performs well relative to other solutions in terms of the run time and disk space required. In addition, GB-eaSy is built from existing open-source, modular software packages that are regularly updated and commonly used, making it straightforward to install and maintain

  11. Genome-wide association study with 1000 genomes imputation identifies signals for nine sex hormone-related phenotypes.

    Ruth, Katherine S; Campbell, Purdey J; Chew, Shelby; Lim, Ee Mun; Hadlow, Narelle; Stuckey, Bronwyn G A; Brown, Suzanne J; Feenstra, Bjarke; Joseph, John; Surdulescu, Gabriela L; Zheng, Hou Feng; Richards, J Brent; Murray, Anna; Spector, Tim D; Wilson, Scott G; Perry, John R B

    2016-02-01

    Genetic factors contribute strongly to sex hormone levels, yet knowledge of the regulatory mechanisms remains incomplete. Genome-wide association studies (GWAS) have identified only a small number of loci associated with sex hormone levels, with several reproductive hormones yet to be assessed. The aim of the study was to identify novel genetic variants contributing to the regulation of sex hormones. We performed GWAS using genotypes imputed from the 1000 Genomes reference panel. The study used genotype and phenotype data from a UK twin register. We included 2913 individuals (up to 294 males) from the Twins UK study, excluding individuals receiving hormone treatment. Phenotypes were standardised for age, sex, BMI, stage of menstrual cycle and menopausal status. We tested 7,879,351 autosomal SNPs for association with levels of dehydroepiandrosterone sulphate (DHEAS), oestradiol, free androgen index (FAI), follicle-stimulating hormone (FSH), luteinizing hormone (LH), prolactin, progesterone, sex hormone-binding globulin and testosterone. Eight independent genetic variants reached genome-wide significance (P<5 × 10(-8)), with minor allele frequencies of 1.3-23.9%. Novel signals included variants for progesterone (P=7.68 × 10(-12)), oestradiol (P=1.63 × 10(-8)) and FAI (P=1.50 × 10(-8)). A genetic variant near the FSHB gene was identified which influenced both FSH (P=1.74 × 10(-8)) and LH (P=3.94 × 10(-9)) levels. A separate locus on chromosome 7 was associated with both DHEAS (P=1.82 × 10(-14)) and progesterone (P=6.09 × 10(-14)). This study highlights loci that are relevant to reproductive function and suggests overlap in the genetic basis of hormone regulation.

  12. In-cell PCR method for specific genotyping of genomic DNA from one individual in a mixture of cells from two individuals: a model study with specific relevance to prenatal diagnosis based on fetal cells in maternal blood

    Hviid, T Vauvert

    2002-01-01

    only in the male cells, leading to the correct HLA-DPB1 genotyping of the male by DNA sequencing of a nested, linked TSPY-HLA-DPB1 PCR product. CONCLUSION: This approach might be usable on mixed cell populations of fetal and maternal cells obtained after conventional cell-sorting techniques on maternal...... maternal blood samples, the use of such an approach for genotyping by molecular biology techniques in a more routine setting has been hampered by the large contamination of maternal nucleated blood cells in the cell isolates. Therefore, a new method based on in-cell PCR is described, which may overcome...... this problem. Methods and Results: Mixtures of cells from two different individuals were fixed and permeabilized in suspension. After coamplification of a DNA sequence specific for one of the individuals and the DNA sequence to be genotyped, the two PCR products were linked together in the fixed cells positive...

  13. On multivariate imputation and forecasting of decadal wind speed missing data.

    Wesonga, Ronald

    2015-01-01

    This paper demonstrates the application of multiple imputations by chained equations and time series forecasting of wind speed data. The study was motivated by the high prevalence of missing wind speed historic data. Findings based on the fully conditional specification under multiple imputations by chained equations, provided reliable wind speed missing data imputations. Further, the forecasting model shows, the smoothing parameter, alpha (0.014) close to zero, confirming that recent past observations are more suitable for use to forecast wind speeds. The maximum decadal wind speed for Entebbe International Airport was estimated to be 17.6 metres per second at a 0.05 level of significance with a bound on the error of estimation of 10.8 metres per second. The large bound on the error of estimations confirms the dynamic tendencies of wind speed at the airport under study.

  14. A Fast Method for DEFB1-44C/G SNP Genotyping in Brazilian Patients with Periodontitis

    Rafael Rafael Amorim Cavalcanti de Siqueira

    2014-01-01

    Full Text Available Aim: Defensins are cationic antimicrobial peptides expressed in epithelial cells. Such peptides exhibit antibacterial, antifungal and antiviral properties, and are a component of the innate immune response. It has been suggested that they have a protective role in the oral cavity. This study evaluated the DEFB1 polymorphism in diabetic patients with or without periodontitis in comparison to healthy controls. Material and Methods: We used Hairpin-Shaped Primer (HP assay to study the distribution of the -44 C/G SNP (rs1800972 in 119 human DNAs obtained from diabetic patients and healthy control patients. Results: The results indicate that there are no differences in distribution between groups and that in diabetic periodontitis patients the homozygous mutant could be found more frequently. Conclusion: Further studies are necessary in order to investigate the role of DEFB1 polymorphisms in diabetic periodontitis patients and the influence of the peptide in periodontal pathogens.

  15. Analyzing the changing gender wage gap based on multiply imputed right censored wages

    Gartner, Hermann; Rässler, Susanne

    2005-01-01

    "In order to analyze the gender wage gap with the German IAB-employment register we have to solve the problem of censored wages at the upper limit of the social security system. We treat this problem as a missing data problem. We regard the missingness mechanism as not missing at random (NMAR, according to Little and Rubin, 1987, 2002) as well as missing by design. The censored wages are multiply imputed by draws of a random variable from a truncated distribution. The multiple imputation is b...

  16. Genotyping of friesian horses to detect a hydrocephalus-associated c.1423C>T mutation in B3GALNT2 using PCR-RFLP and PCR-PIRA methods: Frequency in stallion horses in México.

    Ayala-Valdovinos, Miguel Angel; Galindo-García, Jorge; Sánchez-Chiprés, David; Duifhuis-Rivera, Theodor

    2017-04-01

    Hydrocephalus in Friesian horses is an autosomal recessive hereditary disease that can result in an abortion, a stillbirth, or euthanization of a newborn foal. Here, the hydrocephalus-associated c.1423C > T mutation in B3GALNT2 gene was detected with PCR-RFLP and PCR-PIRA methods for horse genotyping. A preliminary genotyping survey was performed on 83 randomly selected Friesian stallion horses to determine the current allele frequency in Mexico. The frequency of the mutant T allele was 9.6%. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. Prevalence of ESBL-producing Pseudomonas aeruginosa isolates in Warsaw, Poland, detected by various phenotypic and genotypic methods.

    Agnieszka E Laudy

    Full Text Available Knowledge of the prevalence of ESBL enzymes among P. aeruginosa strains compared to the Enterobacteraiceae family is limited. The phenotypic tests recommended by EUCAST for the detection of ESBL-producing Enterobacteriaceae are not always suited for P. aeruginosa strains. This is mainly due to the presence of other families of ESBLs in P. aeruginosa isolates more often than in Enterobacteriaceae, production of natural AmpC cephalosporinase and its overexpression, and co-production of metallo-β-lactamases. The aim of this study was to determine the occurrence of ESBLs in P. aeruginosa isolated from patients from hospitals in Warsaw, to evaluate the ESBL production of these isolates using currently available phenotypic tests, their modifications, multiplex PCR and molecular typing of ESBL-positive isolates by PFGE. Clinical isolates of P. aeruginosa were collected in 2000-2014 from four Warsaw hospitals. Based on the data obtained in this study, we suggest using three DDST methods with inhibitors, such as clavulanic acid, sulbactam and imipenem, to detect ESBL-producing P. aeruginosa strains. Depending on the appearance of the plates, we suggest a reduction in the distance between discs with antibiotics to 15 mm and the addition of boronic acid at 0.4 mg per disc. The analysed isolates carried genes encoding ESBL from the families VEB (69 isolates with VEB-9, GES (6 with GES-1, 1 GES-5, 5 GES-13 and 2 with GES-15, OXA-2 (12 with OXA-15, 1 OXA-141, 1 OXA-210, 1 OXA-543 and 1 with OXA-544 and OXA-10 (5 isolates with OXA-74 and one with OXA-142. The most important result of this study was the discovery of three new genes, blaGES-15, blaOXA-141 and blaOXA-142; their nucleotide sequences have been submitted to the NCBI GenBank. It is also very important to note that this is the first report on the epidemiological problem of VEB-9-producing bacterial strains, not only in Poland but also worldwide.

  18. Deep sequencing analysis of HBV genotype shift and correlation with antiviral efficiency during adefovir dipivoxil therapy.

    Yuwei Wang

    Full Text Available Viral genotype shift in chronic hepatitis B (CHB patients during antiviral therapy has been reported, but the underlying mechanism remains elusive.38 CHB patients treated with ADV for one year were selected for studying genotype shift by both deep sequencing and Sanger sequencing method.Sanger sequencing method found that 7.9% patients showed mixed genotype before ADV therapy. In contrast, all 38 patients showed mixed genotype before ADV treatment by deep sequencing. 95.5% mixed genotype rate was also obtained from additional 200 treatment-naïve CHB patients. Of the 13 patients with genotype shift, the fraction of the minor genotype in 5 patients (38% increased gradually during the course of ADV treatment. Furthermore, responses to ADV and HBeAg seroconversion were associated with the high rate of genotype shift, suggesting drug and immune pressure may be key factors to induce genotype shift. Interestingly, patients with genotype C had a significantly higher rate of genotype shift than genotype B. In genotype shift group, ADV treatment induced a marked enhancement of genotype B ratio accompanied by a reduction of genotype C ratio, suggesting genotype C may be more sensitive to ADV than genotype B. Moreover, patients with dominant genotype C may have a better therapeutic effect. Finally, genotype shifts was correlated with clinical improvement in terms of ALT.Our findings provided a rational explanation for genotype shift among ADV-treated CHB patients. The genotype and genotype shift might be associated with antiviral efficiency.

  19. Relative efficiency of joint-model and full-conditional-specification multiple imputation when conditional models are compatible: The general location model.

    Seaman, Shaun R; Hughes, Rachael A

    2018-06-01

    Estimating the parameters of a regression model of interest is complicated by missing data on the variables in that model. Multiple imputation is commonly used to handle these missing data. Joint model multiple imputation and full-conditional specification multiple imputation are known to yield imputed data with the same asymptotic distribution when the conditional models of full-conditional specification are compatible with that joint model. We show that this asymptotic equivalence of imputation distributions does not imply that joint model multiple imputation and full-conditional specification multiple imputation will also yield asymptotically equally efficient inference about the parameters of the model of interest, nor that they will be equally robust to misspecification of the joint model. When the conditional models used by full-conditional specification multiple imputation are linear, logistic and multinomial regressions, these are compatible with a restricted general location joint model. We show that multiple imputation using the restricted general location joint model can be substantially more asymptotically efficient than full-conditional specification multiple imputation, but this typically requires very strong associations between variables. When associations are weaker, the efficiency gain is small. Moreover, full-conditional specification multiple imputation is shown to be potentially much more robust than joint model multiple imputation using the restricted general location model to mispecification of that model when there is substantial missingness in the outcome variable.

  20. Identification of strains with phenotypes similar to those of Staphylococcus aureus isolated from table chicken eggs using MALDI-TOF MS and genotyping methods

    Marek Agnieszka

    2015-06-01

    Full Text Available The aim of the study was to identify the affinity of 10 Staphylococcus strains isolated from table chicken eggs to specific species. Preliminary analysis performed by API ID32 Staph test identified these strains as S. aureus, but they exhibited a negative reaction in the tube coagulase test. Thus, the analysed strains were initially characterised as Staphylococcus aureus-like (SAL. Further characterisation was performed by genotypic methods, using restriction fragment length polymorphism (RFLP of the coagulase gene (coa and sequencing of the gene rpoB. An attempt was also made to identify the isolated Staphylococcus strains by MALDI-TOF mass spectrometry. The results indicated that none of the strains tested belonged to the species S. aureus. The rpoB sequences of five isolates showed the highest sequence similarity to S. haemolyticus, three isolates to S. chromogenes, and one isolate to S. epidermidis. One strain (SAL4 remained unidentified in this analysis. The results obtained using mass spectrometry were comparable to those based on gene sequence analysis. Strain SAL4, which could not be identified by sequencing, was identified by MALDI-TOF as Staphylococcus chromogenes.

  1. A combined RT-PCR and dot-blot hybridization method reveals the coexistence of SJNNV and RGNNV betanodavirus genotypes in wild meagre (Argyrosomus regius).

    Lopez-Jimena, B; Cherif, N; Garcia-Rosado, E; Infante, C; Cano, I; Castro, D; Hammami, S; Borrego, J J; Alonso, M C

    2010-10-01

    To detect the possible coexistence of striped jack nervous necrosis virus (SJNNV) and red-spotted grouper nervous necrosis virus (RGNNV) genotypes in a single fish, a methodology based on the combination of PCR amplification and blot hybridization has been developed and applied in this study. The degenerate primers designed for the PCR procedure target the T4 region within the capsid gene, resulting in the amplification of both genotypes. The subsequent hybridization of these amplification products with two different specific digoxigenin-labelled probes resulted in the identification of both genotypes separately. The application of the RT-PCR protocol to analyse blood samples from asymptomatic wild meagre (Argyrosomus regius) specimens has shown a 46.87% of viral nervous necrosis virus carriers. The combination of RT-PCR and blot hybridization increases the detection rate up to 90.62%, and, in addition, it has shown the coexistence of both genotypes in 18 out of the 32 specimens analysed (56.25%). This study reports the coexistence of betanodaviruses belonging to two different genotypes (SJNNV and RGNNV) in wild fish specimens. This is the first report demonstrating the presence of SJNNV and RGNNV genotypes in the same specimen. This study also demonstrates a carrier state in this fish species for the first time. © 2010 The Authors. Journal compilation © 2010 The Society for Applied Microbiology.

  2. Combination of individual tree detection and area-based approach in imputation of forest variables using airborne laser data

    Vastaranta, Mikko; Kankare, Ville; Holopainen, Markus; Yu, Xiaowei; Hyyppä, Juha; Hyyppä, Hannu

    2012-01-01

    The two main approaches to deriving forest variables from laser-scanning data are the statistical area-based approach (ABA) and individual tree detection (ITD). With ITD it is feasible to acquire single tree information, as in field measurements. Here, ITD was used for measuring training data for the ABA. In addition to automatic ITD (ITD auto), we tested a combination of ITD auto and visual interpretation (ITD visual). ITD visual had two stages: in the first, ITD auto was carried out and in the second, the results of the ITD auto were visually corrected by interpreting three-dimensional laser point clouds. The field data comprised 509 circular plots ( r = 10 m) that were divided equally for testing and training. ITD-derived forest variables were used for training the ABA and the accuracies of the k-most similar neighbor ( k-MSN) imputations were evaluated and compared with the ABA trained with traditional measurements. The root-mean-squared error (RMSE) in the mean volume was 24.8%, 25.9%, and 27.2% with the ABA trained with field measurements, ITD auto, and ITD visual, respectively. When ITD methods were applied in acquiring training data, the mean volume, basal area, and basal area-weighted mean diameter were underestimated in the ABA by 2.7-9.2%. This project constituted a pilot study for using ITD measurements as training data for the ABA. Further studies are needed to reduce the bias and to determine the accuracy obtained in imputation of species-specific variables. The method could be applied in areas with sparse road networks or when the costs of fieldwork must be minimized.

  3. Estimation of Tree Lists from Airborne Laser Scanning Using Tree Model Clustering and k-MSN Imputation

    Jörgen Wallerman

    2013-04-01

    Full Text Available Individual tree crowns may be delineated from airborne laser scanning (ALS data by segmentation of surface models or by 3D analysis. Segmentation of surface models benefits from using a priori knowledge about the proportions of tree crowns, which has not yet been utilized for 3D analysis to any great extent. In this study, an existing surface segmentation method was used as a basis for a new tree model 3D clustering method applied to ALS returns in 104 circular field plots with 12 m radius in pine-dominated boreal forest (64°14'N, 19°50'E. For each cluster below the tallest canopy layer, a parabolic surface was fitted to model a tree crown. The tree model clustering identified more trees than segmentation of the surface model, especially smaller trees below the tallest canopy layer. Stem attributes were estimated with k-Most Similar Neighbours (k-MSN imputation of the clusters based on field-measured trees. The accuracy at plot level from the k-MSN imputation (stem density root mean square error or RMSE 32.7%; stem volume RMSE 28.3% was similar to the corresponding results from the surface model (stem density RMSE 33.6%; stem volume RMSE 26.1% with leave-one-out cross-validation for one field plot at a time. Three-dimensional analysis of ALS data should also be evaluated in multi-layered forests since it identified a larger number of small trees below the tallest canopy layer.

  4. Applying an efficient K-nearest neighbor search to forest attribute imputation

    Andrew O. Finley; Ronald E. McRoberts; Alan R. Ek

    2006-01-01

    This paper explores the utility of an efficient nearest neighbor (NN) search algorithm for applications in multi-source kNN forest attribute imputation. The search algorithm reduces the number of distance calculations between a given target vector and each reference vector, thereby, decreasing the time needed to discover the NN subset. Results of five trials show gains...

  5. Limitations in Using Multiple Imputation to Harmonize Individual Participant Data for Meta-Analysis.

    Siddique, Juned; de Chavez, Peter J; Howe, George; Cruden, Gracelyn; Brown, C Hendricks

    2018-02-01

    Individual participant data (IPD) meta-analysis is a meta-analysis in which the individual-level data for each study are obtained and used for synthesis. A common challenge in IPD meta-analysis is when variables of interest are measured differently in different studies. The term harmonization has been coined to describe the procedure of placing variables on the same scale in order to permit pooling of data from a large number of studies. Using data from an IPD meta-analysis of 19 adolescent depression trials, we describe a multiple imputation approach for harmonizing 10 depression measures across the 19 trials by treating those depression measures that were not used in a study as missing data. We then apply diagnostics to address the fit of our imputation model. Even after reducing the scale of our application, we were still unable to produce accurate imputations of the missing values. We describe those features of the data that made it difficult to harmonize the depression measures and provide some guidelines for using multiple imputation for harmonization in IPD meta-analysis.

  6. Mapping change of older forest with nearest-neighbor imputation and Landsat time-series

    Janet L. Ohmann; Matthew J. Gregory; Heather M. Roberts; Warren B. Cohen; Robert E. Kennedy; Zhiqiang. Yang

    2012-01-01

    The Northwest Forest Plan (NWFP), which aims to conserve late-successional and old-growth forests (older forests) and associated species, established new policies on federal lands in the Pacific Northwest USA. As part of monitoring for the NWFP, we tested nearest-neighbor imputation for mapping change in older forest, defined by threshold values for forest attributes...

  7. Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

    Meseck, Kristin; Jankowska, Marta M; Schipperijn, Jasper

    2016-01-01

    The main purpose of the present study was to assess the impact of global positioning system (GPS) signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accurate...

  8. Whole-genome characterization in pedigreed non-human primates using Genotyping-By-Sequencing and imputation.

    Cervera-Juanes, Rita; Vinson, Amanda; Ferguson, Betsy; Carbone, Lucia; Spindel, Eliot; Mccouch, Susan; Spindel, Jennifer; Nevonen, Kimberly; Letaw, John; Raboin, Michael; Bimber, Ben

    2016-01-01

    Background: Rhesus macaques are widely used in biomedical research, but the application of genomic information in this species to better understand human disease is still undeveloped. Whole-genome sequence (WGS) data in pedigreed macaque colonies could provide substantial experimental power, but the collection of WGS data in large cohorts remains a formidable expense. Here, we describe a cost-effective approach that selects the most informative macaques in a pedigree for whole-genome sequenci...

  9. Estimating past hepatitis C infection risk from reported risk factor histories: implications for imputing age of infection and modeling fibrosis progression

    Busch Michael P

    2007-12-01

    Full Text Available Abstract Background Chronic hepatitis C virus infection is prevalent and often causes hepatic fibrosis, which can progress to cirrhosis and cause liver cancer or liver failure. Study of fibrosis progression often relies on imputing the time of infection, often as the reported age of first injection drug use. We sought to examine the accuracy of such imputation and implications for modeling factors that influence progression rates. Methods We analyzed cross-sectional data on hepatitis C antibody status and reported risk factor histories from two large studies, the Women's Interagency HIV Study and the Urban Health Study, using modern survival analysis methods for current status data to model past infection risk year by year. We compared fitted distributions of past infection risk to reported age of first injection drug use. Results Although injection drug use appeared to be a very strong risk factor, models for both studies showed that many subjects had considerable probability of having been infected substantially before or after their reported age of first injection drug use. Persons reporting younger age of first injection drug use were more likely to have been infected after, and persons reporting older age of first injection drug use were more likely to have been infected before. Conclusion In cross-sectional studies of fibrosis progression where date of HCV infection is estimated from risk factor histories, modern methods such as multiple imputation should be used to account for the substantial uncertainty about when infection occurred. The models presented here can provide the inputs needed by such methods. Using reported age of first injection drug use as the time of infection in studies of fibrosis progression is likely to produce a spuriously strong association of younger age of infection with slower rate of progression.

  10. Behavior of durum wheat genotypes under normal irrigation and ...

    Behavior of durum wheat genotypes under normal irrigation and drought stress conditions in the greenhouse. ... African Journal of Biotechnology ... Genotypes were grouped in cluster analysis (using Ward's method) based on Yp, Ys and ...

  11. magnitude of genotype x environment interaction for bacterial leaf

    ACSS

    African Crop Science Journal, Vol. ... effects of treatments into genotype, environment, and genotype x environment (G x E) interactions. Results .... method is economically effective (Niño-Liu et al., ..... This phenomenon indicated differences in.

  12. Comparison of Modified Impedance Whole Blood Platelet Aggregation Method Detecting Platelet Function in ACS Patients with Different CYP2C19 Genotypes.

    Cui, Chanjuan; Qiao, Rui; Zhang, Jie

    2016-01-01

    A reliable laboratory test to monitor onclopidogrel platelet reactivity (PR) is very necessary. In addition, genetic factors also play an important part in onclopidogrel PR. This study aimed to modify the original impedance whole blood platelet aggregation assay associated with the release assay to monitor onclopidogrel PR and assess their relationship with genotype. We adjusted the concentration of calcium in the in vitro reaction system of platelet aggregation to modify the original impedance whole blood platelet aggregation assay. Meanwhile, chronolume, which quantified the adenosine triphosphate (ATP) released from platelet dense granules, is added to this reaction system to reflect the platelet release function. In the modified assay, platelet magnified activation time (MAT) and the maximal platelet ATP release value (RV) were used to reflect platelet function parameters. In the original assay, the electrical resistance (omega) and RV were used to reflect platelet function parameters. Onclopidogrel PR was detected by the original impedance whole blood platelet aggregation assay, modified assay, and flow cytometric vasodilator stimulated phosphoprotein (VASP) assay in 168 patients with acute coronary syndromes (ACS). CYP2C19*2 and CYP2C19*3 polymorphisms were also detected in all of these patients. This modified method showed that when 12.5 microL CaCl2 (0.2 mmol/L) was added to the reaction system, MAT was appropriate (93 +/- 23 seconds). The CVs for the modified impedance assay and release assay were 9.31% and 6.13%, respectively. The mean VASP-PRI in the patient group treated with clopidogrel was significantly lower than that in the control group without antiplatelet therapy (54.88 +/- 16.81% vs. 79.86 +/- 10.24%, p 50% group were shorter than that in the PRI 50% group were higher than that in the PRI omega) and RV of the original method showed no differences between the two groups [0 (0-2) vs. 0 (0-1.25), 0.05 (0-0.25) vs. 0.08 (0-0.24); p > 0.05, p > 0

  13. Accuracy of hemoglobin A1c imputation using fasting plasma glucose in diabetes research using electronic health records data

    Stanley Xu

    2014-05-01

    Full Text Available In studies that use electronic health record data, imputation of important data elements such as Glycated hemoglobin (A1c has become common. However, few studies have systematically examined the validity of various imputation strategies for missing A1c values. We derived a complete dataset using an incident diabetes population that has no missing values in A1c, fasting and random plasma glucose (FPG and RPG, age, and gender. We then created missing A1c values under two assumptions: missing completely at random (MCAR and missing at random (MAR. We then imputed A1c values, compared the imputed values to the true A1c values, and used these data to assess the impact of A1c on initiation of antihyperglycemic therapy. Under MCAR, imputation of A1c based on FPG 1 estimated a continuous A1c within ± 1.88% of the true A1c 68.3% of the time; 2 estimated a categorical A1c within ± one category from the true A1c about 50% of the time. Including RPG in imputation slightly improved the precision but did not improve the accuracy. Under MAR, including gender and age in addition to FPG improved the accuracy of imputed continuous A1c but not categorical A1c. Moreover, imputation of up to 33% of missing A1c values did not change the accuracy and precision and did not alter the impact of A1c on initiation of antihyperglycemic therapy. When using A1c values as a predictor variable, a simple imputation algorithm based only on age, sex, and fasting plasma glucose gave acceptable results.

  14. Molecular characterization of varicella-zoster virus clinical isolates from 2006 to 2008 in a tertiary care hospital, Dublin, Ireland, using different genotyping methods.

    Roycroft, Emma

    2012-10-01

    Varicella-zoster virus (VZV), a herpesvirus, is a ubiquitous organism that causes considerable morbidity worldwide and can cause severe complications on reactivation. Phylogenetic analysis was performed on 19 clinical VZV isolates (16 zoster and 3 varicella) found in Ireland, between December 2006 and November 2008, in order to determine whether previously reported viral heterogeneity was still present and whether viral recombination was evident. Open reading-frames (ORFs) from genes 1, 21, 50, and 54, were sequenced. Clades 1, 2, 3, and 5 were identified. Four putative recombinant isolates were detected (three clade 3\\/1 and one clade 5\\/3\\/1). Further sequencing and examination of ORF 22 and 21\\/50, did not elucidate the putative recombinant genotypes further. These two previously published genotyping schemes were examined in light of the new consensus genotyping scheme proposed in 2010. Remarkable VZV heterogeneity remains prevalent in Ireland. This is the first evidence of putative VZV recombination found in Ireland.

  15. Missing data treatments matter: an analysis of multiple imputation for anterior cervical discectomy and fusion procedures.

    Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Cui, Jonathan J; Basques, Bryce A; Albert, Todd J; Grauer, Jonathan N

    2018-04-09

    The presence of missing data is a limitation of large datasets, including the National Surgical Quality Improvement Program (NSQIP). In addressing this issue, most studies use complete case analysis, which excludes cases with missing data, thus potentially introducing selection bias. Multiple imputation, a statistically rigorous approach that approximates missing data and preserves sample size, may be an improvement over complete case analysis. The present study aims to evaluate the impact of using multiple imputation in comparison with complete case analysis for assessing the associations between preoperative laboratory values and adverse outcomes following anterior cervical discectomy and fusion (ACDF) procedures. This is a retrospective review of prospectively collected data. Patients undergoing one-level ACDF were identified in NSQIP 2012-2015. Perioperative adverse outcome variables assessed included the occurrence of any adverse event, severe adverse events, and hospital readmission. Missing preoperative albumin and hematocrit values were handled using complete case analysis and multiple imputation. These preoperative laboratory levels were then tested for associations with 30-day postoperative outcomes using logistic regression. A total of 11,999 patients were included. Of this cohort, 63.5% of patients had missing preoperative albumin and 9.9% had missing preoperative hematocrit. When using complete case analysis, only 4,311 patients were studied. The removed patients were significantly younger, healthier, of a common body mass index, and male. Logistic regression analysis failed to identify either preoperative hypoalbuminemia or preoperative anemia as significantly associated with adverse outcomes. When employing multiple imputation, all 11,999 patients were included. Preoperative hypoalbuminemia was significantly associated with the occurrence of any adverse event and severe adverse events. Preoperative anemia was significantly associated with the

  16. Imputing forest carbon stock estimates from inventory plots to a nationally continuous coverage

    Wilson Barry Tyler

    2013-01-01

    Full Text Available Abstract The U.S. has been providing national-scale estimates of forest carbon (C stocks and stock change to meet United Nations Framework Convention on Climate Change (UNFCCC reporting requirements for years. Although these currently are provided as national estimates by pool and year to meet greenhouse gas monitoring requirements, there is growing need to disaggregate these estimates to finer scales to enable strategic forest management and monitoring activities focused on various ecosystem services such as C storage enhancement. Through application of a nearest-neighbor imputation approach, spatially extant estimates of forest C density were developed for the conterminous U.S. using the U.S.’s annual forest inventory. Results suggest that an existing forest inventory plot imputation approach can be readily modified to provide raster maps of C density across a range of pools (e.g., live tree to soil organic carbon and spatial scales (e.g., sub-county to biome. Comparisons among imputed maps indicate strong regional differences across C pools. The C density of pools closely related to detrital input (e.g., dead wood is often highest in forests suffering from recent mortality events such as those in the northern Rocky Mountains (e.g., beetle infestations. In contrast, live tree carbon density is often highest on the highest quality forest sites such as those found in the Pacific Northwest. Validation results suggest strong agreement between the estimates produced from the forest inventory plots and those from the imputed maps, particularly when the C pool is closely associated with the imputation model (e.g., aboveground live biomass and live tree basal area, with weaker agreement for detrital pools (e.g., standing dead trees. Forest inventory imputed plot maps provide an efficient and flexible approach to monitoring diverse C pools at national (e.g., UNFCCC and regional scales (e.g., Reducing Emissions from Deforestation and Forest

  17. Core Gene Expression and Association of Genotypes with Viral ...

    Purpose: To determine genotypic distribution, ribonucleic acid (RNA) RNA viral load and express core gene from Hepatitis C Virus (HCV) infected patients in Punjab, Pakistan. Methods: A total of 1690 HCV RNA positive patients were included in the study. HCV genotyping was tested by type-specific genotyping assay, viral ...

  18. Alteração no método centroide de avaliação da adaptabilidade genotípica Alteration of the centroid method to evaluate genotypic adaptability

    Moysés Nascimento

    2009-03-01

    Full Text Available O objetivo deste trabalho foi alterar o método centroide de avaliação da adaptabilidade e estabilidade fenotípica de genótipos, para deixá-lo com maior sentido biológico e melhorar aspectos quantitativos e qualitativos de sua análise. A alteração se deu pela adição de mais três ideótipos, definidos de acordo com valores médios dos genótipos nos ambientes. Foram utilizados dados provenientes de um experimento sobre produção de matéria seca de 92 genótipos de alfafa (Medicago sativa realizado em blocos ao acaso, com duas repetições. Os genótipos foram submetidos a 20 cortes, no período de novembro de 2004 a junho de 2006. Cada corte foi considerado um ambiente. A inclusão dos ideótipos de maior sentido biológico (valores médios nos ambientes resultou em uma dispersão gráfica em forma de uma seta voltada para a direita, na qual os genótipos mais produtivos ficaram próximos à ponta da seta. Com a alteração, apenas cinco genótipos foram classificados nas mesmas classes do método centroide original. A figura em forma de seta proporciona uma comparação direta dos genótipos, por meio da formação de um gradiente de produtividade. A alteração no método mantém a facilidade de interpretação dos resultados para a recomendação dos genótipos presente no método original e não permite duplicidade de interpretação dos resultados.ABSTRACT The objective of this work was to modify the centroid method of evaluation of phenotypic adaptability and the phenotype stability of genotypes in order for the method to make greater biological sense and improve its quantitative and qualitative performance. The method was modified by means of the inclusion of three additional ideotypes defined in accordance with the genotypes' average yield in the environments tested. The alfalfa (Medicago sativa L. forage yield of 92 genotypes was used. The trial had a randomized block design, with two replicates, and the data were used to

  19. Semiautomatic imputation of activity travel diaries : use of global positioning system traces, prompted recall, and context-sensitive learning algorithms

    Moiseeva, A.; Jessurun, A.J.; Timmermans, H.J.P.; Stopher, P.

    2016-01-01

    Anastasia Moiseeva, Joran Jessurun and Harry Timmermans (2010), ‘Semiautomatic Imputation of Activity Travel Diaries: Use of Global Positioning System Traces, Prompted Recall, and Context-Sensitive Learning Algorithms’, Transportation Research Record: Journal of the Transportation Research Board,

  20. Using mi impute chained to fit ANCOVA models in randomized trials with censored dependent and independent variables

    Andersen, Andreas; Rieckmann, Andreas

    2016-01-01

    In this article, we illustrate how to use mi impute chained with intreg to fit an analysis of covariance analysis of censored and nondetectable immunological concentrations measured in a randomized pretest–posttest design.......In this article, we illustrate how to use mi impute chained with intreg to fit an analysis of covariance analysis of censored and nondetectable immunological concentrations measured in a randomized pretest–posttest design....

  1. Imputing historical statistics, soils information, and other land-use data to crop area

    Perry, C. R., Jr.; Willis, R. W.; Lautenschlager, L.

    1982-01-01

    In foreign crop condition monitoring, satellite acquired imagery is routinely used. To facilitate interpretation of this imagery, it is advantageous to have estimates of the crop types and their extent for small area units, i.e., grid cells on a map represent, at 60 deg latitude, an area nominally 25 by 25 nautical miles in size. The feasibility of imputing historical crop statistics, soils information, and other ancillary data to crop area for a province in Argentina is studied.

  2. A web-based approach to data imputation

    Li, Zhixu; Sharaf, Mohamed Abdel Fattah; Sitbon, Laurianne; Sadiq, Shazia Wasim; Indulska, Marta; Zhou, Xiaofang

    2013-01-01

    principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme

  3. APOE Genotyping, Cardiovascular Disease

    ... Resources For Health Professionals Subscribe Search APOE Genotyping, Cardiovascular Disease Send Us Your Feedback Choose Topic At a ... help understand the role of genetic factors in cardiovascular disease . However, the testing is sometimes used in clinical ...

  4. Radiosensitivity of fingermillet genotypes

    Raveendran, T S; Nagarajan, C; Appadurai, R; Prasad, M N; Sundaresan, N [Tamil Nadu Agricultural Univ., Coimbatore (India)

    1984-07-01

    Varietal differences in radiosensitivity were observed in a study involving 4 genotypes of fingermillet (Eleusine coracana (Linn.) Gaertn.) subjected to gamma-irradiation. Harder seeds were found to tolerate a higher dose of the mutagen.

  5. Impact of inter-genotypic recombination and probe cross-reactivity on the performance of the Abbott RealTime HCV Genotype II assay for hepatitis C genotyping.

    Sridhar, Siddharth; Yip, Cyril C Y; Chan, Jasper F W; To, Kelvin K W; Cheng, Vincent C C; Yuen, Kwok-Yung

    2018-05-01

    The Abbott RealTime HCV Genotype II assay (Abbott-RT-HCV assay) is a real-time PCR based genotyping method for hepatitis C virus (HCV). This study measured the impact of inter-genotypic recombination and probe cross-reactivity on the performance of the Abbott-RT-HCV assay. 517 samples were genotyped using the Abbott-RT-HCV assay over a one-year period, 34 (6.6%) were identified as HCV genotype 1 without further subtype designation raising the possibility of inaccurate genotyping. These samples were subjected to confirmatory sequencing. 27 of these 34 (79%) samples were genotype 1b while five (15%) were genotype 6. One HCV isolate was an inter-genotypic 1a/4o recombinant. This is a novel natural HCV recombinant that has never been reported. Inter-genotypic recombination and probe cross-reactivity can affect the accuracy of the Abbott-RT-HCV assay, both of which have significant implications on antiviral regimen choice. Confirmatory sequencing of ambiguous results is crucial for accurate genotyping. Copyright © 2018 Elsevier Inc. All rights reserved.

  6. Fine scale mapping of the 17q22 breast cancer locus using dense SNPs, genotyped within the Collaborative Oncological Gene-Environment Study (COGs)

    Darabi, Hatef; Beesley, Jonathan; Droit, Arnaud; Kar, Siddhartha; Nord, Silje; Moradi Marjaneh, Mahdi; Soucy, Penny; Michailidou, Kyriaki; Ghoussaini, Maya; Fues Wahl, Hanna; Bolla, Manjeet K.; Wang, Qin; Dennis, Joe; Alonso, M Rosario; Andrulis, Irene L.

    2016-01-01

    Genome-wide association studies have found SNPs at 17q22 to be associated with breast cancer risk. To identify potential causal variants related to breast cancer risk, we performed a high resolution fine-mapping analysis that involved genotyping 517 SNPs using a custom Illumina iSelect array (iCOGS) followed by imputation of genotypes for 3,134 SNPs in more than 89,000 participants of European ancestry from the Breast Cancer Association Consortium (BCAC). We identified 28 highly correlated co...

  7. Missing data in clinical trials: control-based mean imputation and sensitivity analysis.

    Mehrotra, Devan V; Liu, Fang; Permutt, Thomas

    2017-09-01

    In some randomized (drug versus placebo) clinical trials, the estimand of interest is the between-treatment difference in population means of a clinical endpoint that is free from the confounding effects of "rescue" medication (e.g., HbA1c change from baseline at 24 weeks that would be observed without rescue medication regardless of whether or when the assigned treatment was discontinued). In such settings, a missing data problem arises if some patients prematurely discontinue from the trial or initiate rescue medication while in the trial, the latter necessitating the discarding of post-rescue data. We caution that the commonly used mixed-effects model repeated measures analysis with the embedded missing at random assumption can deliver an exaggerated estimate of the aforementioned estimand of interest. This happens, in part, due to implicit imputation of an overly optimistic mean for "dropouts" (i.e., patients with missing endpoint data of interest) in the drug arm. We propose an alternative approach in which the missing mean for the drug arm dropouts is explicitly replaced with either the estimated mean of the entire endpoint distribution under placebo (primary analysis) or a sequence of increasingly more conservative means within a tipping point framework (sensitivity analysis); patient-level imputation is not required. A supplemental "dropout = failure" analysis is considered in which a common poor outcome is imputed for all dropouts followed by a between-treatment comparison using quantile regression. All analyses address the same estimand and can adjust for baseline covariates. Three examples and simulation results are used to support our recommendations. Copyright © 2017 John Wiley & Sons, Ltd.

  8. Imputation of Baseline LDL Cholesterol Concentration in Patients with Familial Hypercholesterolemia on Statins or Ezetimibe.

    Ruel, Isabelle; Aljenedil, Sumayah; Sadri, Iman; de Varennes, Émilie; Hegele, Robert A; Couture, Patrick; Bergeron, Jean; Wanneh, Eric; Baass, Alexis; Dufour, Robert; Gaudet, Daniel; Brisson, Diane; Brunham, Liam R; Francis, Gordon A; Cermakova, Lubomira; Brophy, James M; Ryomoto, Arnold; Mancini, G B John; Genest, Jacques

    2018-02-01

    Familial hypercholesterolemia (FH) is the most frequent genetic disorder seen clinically and is characterized by increased LDL cholesterol (LDL-C) (>95th percentile), family history of increased LDL-C, premature atherosclerotic cardiovascular disease (ASCVD) in the patient or in first-degree relatives, presence of tendinous xanthomas or premature corneal arcus, or presence of a pathogenic mutation in the LDLR , PCSK9 , or APOB genes. A diagnosis of FH has important clinical implications with respect to lifelong risk of ASCVD and requirement for intensive pharmacological therapy. The concentration of baseline LDL-C (untreated) is essential for the diagnosis of FH but is often not available because the individual is already on statin therapy. To validate a new algorithm to impute baseline LDL-C, we examined 1297 patients. The baseline LDL-C was compared with the imputed baseline obtained within 18 months of the initiation of therapy. We compared the percent reduction in LDL-C on treatment from baseline with the published percent reductions. After eliminating individuals with missing data, nonstandard doses of statins, or medications other than statins or ezetimibe, we provide data on 951 patients. The mean ± SE baseline LDL-C was 243.0 (2.2) mg/dL [6.28 (0.06) mmol/L], and the mean ± SE imputed baseline LDL-C was 244.2 (2.6) mg/dL [6.31 (0.07) mmol/L] ( P = 0.48). There was no difference in response according to the patient's sex or in percent reduction between observed and expected for individual doses or types of statin or ezetimibe. We provide a validated estimation of baseline LDL-C for patients with FH that may help clinicians in making a diagnosis. © 2017 American Association for Clinical Chemistry.

  9. Handling missing data in cluster randomized trials: A demonstration of multiple imputation with PAN through SAS

    Jiangxiu Zhou

    2014-09-01

    Full Text Available The purpose of this study is to demonstrate a way of dealing with missing data in clustered randomized trials by doing multiple imputation (MI with the PAN package in R through SAS. The procedure for doing MI with PAN through SAS is demonstrated in detail in order for researchers to be able to use this procedure with their own data. An illustration of the technique with empirical data was also included. In this illustration thePAN results were compared with pairwise deletion and three types of MI: (1 Normal Model (NM-MI ignoring the cluster structure; (2 NM-MI with dummy-coded cluster variables (fixed cluster structure; and (3 a hybrid NM-MI which imputes half the time ignoring the cluster structure, and the other half including the dummy-coded cluster variables. The empirical analysis showed that using PAN and the other strategies produced comparable parameter estimates. However, the dummy-coded MI overestimated the intraclass correlation, whereas MI ignoring the cluster structure and the hybrid MI underestimated the intraclass correlation. When compared with PAN, the p-value and standard error for the treatment effect were higher with dummy-coded MI, and lower with MI ignoring the clusterstructure, the hybrid MI approach, and pairwise deletion. Previous studies have shown that NM-MI is not appropriate for handling missing data in clustered randomized trials. This approach, in addition to the pairwise deletion approach, leads to a biased intraclass correlation and faultystatistical conclusions. Imputation in clustered randomized trials should be performed with PAN. We have demonstrated an easy way for using PAN through SAS.

  10. Using the Superpopulation Model for Imputations and Variance Computation in Survey Sampling

    Petr Novák

    2012-03-01

    Full Text Available This study is aimed at variance computation techniques for estimates of population characteristics based on survey sampling and imputation. We use the superpopulation regression model, which means that the target variable values for each statistical unit are treated as random realizations of a linear regression model with weighted variance. We focus on regression models with one auxiliary variable and no intercept, which have many applications and straightforward interpretation in business statistics. Furthermore, we deal with caseswhere the estimates are not independent and thus the covariance must be computed. We also consider chained regression models with auxiliary variables as random variables instead of constants.

  11. Missing Value Imputation Based on Gaussian Mixture Model for the Internet of Things

    Yan, Xiaobo; Xiong, Weiqing; Hu, Liang; Wang, Feng; Zhao, Kuo

    2015-01-01

    This paper addresses missing value imputation for the Internet of Things (IoT). Nowadays, the IoT has been used widely and commonly by a variety of domains, such as transportation and logistics domain and healthcare domain. However, missing values are very common in the IoT for a variety of reasons, which results in the fact that the experimental data are incomplete. As a result of this, some work, which is related to the data of the IoT, can’t be carried out normally. And it leads to the red...

  12. Non-imputability, criminal dangerousness and curative safety measures: myths and realities

    Frank Harbottle Quirós

    2017-04-01

    Full Text Available The curative safety measures are imposed in a criminal proceeding to the non-imputable people provided that through a prognosis it is concluded in an affirmative way about its criminal dangerousness. Although this statement seems very elementary, in judicial practice several myths remain in relation to these legal institutes whose versions may vary, to a greater or lesser extent, between the different countries of the world. In this context, the present article formulates ten myths based on the experience of Costa Rica and provides an explanation that seeks to weaken or knock them down, inviting the reader to reflect on them.

  13. Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data.

    Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E

    2015-01-01

    It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.

  14. Missing Value Imputation Improves Mortality Risk Prediction Following Cardiac Surgery: An Investigation of an Australian Patient Cohort.

    Karim, Md Nazmul; Reid, Christopher M; Tran, Lavinia; Cochrane, Andrew; Billah, Baki

    2017-03-01

    The aim of this study was to evaluate the impact of missing values on the prediction performance of the model predicting 30-day mortality following cardiac surgery as an example. Information from 83,309 eligible patients, who underwent cardiac surgery, recorded in the Australia and New Zealand Society of Cardiac and Thoracic Surgeons (ANZSCTS) database registry between 2001 and 2014, was used. An existing 30-day mortality risk prediction model developed from ANZSCTS database was re-estimated using the complete cases (CC) analysis and using multiple imputation (MI) analysis. Agreement between the risks generated by the CC and MI analysis approaches was assessed by the Bland-Altman method. Performances of the two models were compared. One or more missing predictor variables were present in 15.8% of the patients in the dataset. The Bland-Altman plot demonstrated significant disagreement between the risk scores (prisk of mortality. Compared to CC analysis, MI analysis resulted in an average of 8.5% decrease in standard error, a measure of uncertainty. The MI model provided better prediction of mortality risk (observed: 2.69%; MI: 2.63% versus CC: 2.37%, Pvalues improved the 30-day mortality risk prediction following cardiac surgery. Copyright © 2016 Australian and New Zealand Society of Cardiac and Thoracic Surgeons (ANZSCTS) and the Cardiac Society of Australia and New Zealand (CSANZ). Published by Elsevier B.V. All rights reserved.

  15. An R package "VariABEL" for genome-wide searching of potentially interacting loci by testing genotypic variance heterogeneity

    Struchalin Maksim V

    2012-01-01

    Full Text Available Abstract Background Hundreds of new loci have been discovered by genome-wide association studies of human traits. These studies mostly focused on associations between single locus and a trait. Interactions between genes and between genes and environmental factors are of interest as they can improve our understanding of the genetic background underlying complex traits. Genome-wide testing of complex genetic models is a computationally demanding task. Moreover, testing of such models leads to multiple comparison problems that reduce the probability of new findings. Assuming that the genetic model underlying a complex trait can include hundreds of genes and environmental factors, testing of these models in genome-wide association studies represent substantial difficulties. We and Pare with colleagues (2010 developed a method allowing to overcome such difficulties. The method is based on the fact that loci which are involved in interactions can show genotypic variance heterogeneity of a trait. Genome-wide testing of such heterogeneity can be a fast scanning approach which can point to the interacting genetic variants. Results In this work we present a new method, SVLM, allowing for variance heterogeneity analysis of imputed genetic variation. Type I error and power of this test are investigated and contracted with these of the Levene's test. We also present an R package, VariABEL, implementing existing and newly developed tests. Conclusions Variance heterogeneity analysis is a promising method for detection of potentially interacting loci. New method and software package developed in this work will facilitate such analysis in genome-wide context.

  16. Trend in BMI z-score among Private Schools’ Students in Delhi using Multiple Imputation for Growth Curve Model

    Vinay K Gupta

    2016-06-01

    Full Text Available Objective: The aim of the study is to assess the trend in mean BMI z-score among private schools’ students from their anthropometric records when there were missing values in the outcome. Methodology: The anthropometric measurements of student from class 1 to 12 were taken from the records of two private schools in Delhi, India from 2005 to 2010. These records comprise of an unbalanced longitudinal data that is not all the students had measurements recorded at each year. The trend in mean BMI z-score was estimated through growth curve model. Prior to that, missing values of BMI z-score were imputed through multiple imputation using the same model. A complete case analysis was also performed after excluding missing values to compare the results with those obtained from analysis of multiply imputed data. Results: The mean BMI z-score among school student significantly decreased over time in imputed data (β= -0.2030, se=0.0889, p=0.0232 after adjusting age, gender, class and school. Complete case analysis also shows a decrease in mean BMI z-score though it was not statistically significant (β= -0.2861, se=0.0987, p=0.065. Conclusions: The estimates obtained from multiple imputation analysis were better than those of complete data after excluding missing values in terms of lower standard errors. We showed that anthropometric measurements from schools records can be used to monitor the weight status of children and adolescents and multiple imputation using growth curve model can be useful while analyzing such data

  17. Popcorn genotypes resistance to fall armyworm

    Nádia Cristina de Oliveira

    2018-02-01

    Full Text Available ABSTRACT: The aim of this study was to evaluate popcorn genotypes for resistance to the fall armyworm, Spodoptera frugiperda. The experiment used a completely randomized design with 30 replicates. The popcorn genotypes Aelton, Arzm 05 083, Beija-Flor, Colombiana, Composto Chico, Composto Gaúcha, Márcia, Mateus, Ufvm Barão Viçosa, Vanin, and Viviane were evaluated,along with the common maize variety Zapalote Chico. Newly hatched fall armyworm larvae were individually assessed with regard to biological development and consumption of food. The data were subjected to multivariate analyses of variance and genetic divergence among genotypes was evaluated through the clustering methods of Tocher based on generalized Mahalanobis distances and canonical variable analyses. Seven popcorn genotypes, namely, Aelton, Arzm 05 083, Composto Chico, Composto Gaúcha, Márcia, Mateus, and Viviane,were shown to form a cluster (cluster I that had antibiosis as the mechanism of resistance to the pest. Cluster I genotypes and the Zapalote Chico genotype could be used for stacking genes for antibiosis and non-preference resistance.

  18. Development of a cost-efficient novel method for rapid, concurrent genotyping of five common single nucleotide polymorphisms of the brain derived neurotrophic factor (BDNF) gene by tetra-primer amplification refractory mutation system.

    Wang, Cathy K; Xu, Michael S; Ross, Colin J; Lo, Ryan; Procyshyn, Ric M; Vila-Rodriguez, Fidel; White, Randall F; Honer, William G; Barr, Alasdair M

    2015-09-01

    Brain derived neurotrophic factor (BDNF) is a molecular trophic factor that plays a key role in neuronal survival and plasticity. Single nucleotide polymorphisms (SNPs) of the BDNF gene have been associated with specific phenotypic traits in a large number of neuropsychiatric disorders and the response to psychotherapeutic medications in patient populations. Nevertheless, due to study differences and occasionally contrasting findings, substantial further research is required to understand in better detail the association between specific BDNF SNPs and these psychiatric disorders. While considerable progress has been made recently in developing advanced genotyping platforms of SNPs, many high-throughput probe- or array-based detection methods currently available are limited by high costs, slow processing times or access to advanced instrumentation. The polymerase chain reaction (PCR)-based, tetra-primer amplification refractory mutation system (T-ARMS) method is a potential alternative technique for detecting SNP genotypes efficiently, quickly, easily, and cheaply. As a tool in psychopathology research, T-ARMS was shown to be capable of detecting five common SNPs in the BDNF gene (rs6265, rs988748, rs11030104, 11757G/C and rs7103411), which are all SNPs with previously demonstrated clinical relevance to schizophrenia and depression. The present technique therefore represents a suitable protocol for many research laboratories to study the genetic correlates of BDNF in psychiatric disorders. Copyright Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  19. Local exome sequences facilitate imputation of less common variants and increase power of genome wide association studies.

    Peter K Joshi

    Full Text Available The analysis of less common variants in genome-wide association studies promises to elucidate complex trait genetics but is hampered by low power to reliably detect association. We show that addition of population-specific exome sequence data to global reference data allows more accurate imputation, particularly of less common SNPs (minor allele frequency 1-10% in two very different European populations. The imputation improvement corresponds to an increase in effective sample size of 28-38%, for SNPs with a minor allele frequency in the range 1-3%.

  20. Relationship of some upland rice genotype after gamma irradiation

    Suliartini, N. W. S.; Wijayanto, T.; Madiki, A.; Boer, D.; Muhidin; Juniawan

    2018-02-01

    The objective of the research was to group local upland rice genotypes after being treated with gamma irradiation. The research materials were upland rice genotypes resulted from mutation of the second generation and two parents: Pae Loilo (K3D0) and Pae Pongasi (K2D0) Cultivars. The research was conducted at the Indonesian Sweetener and Fiber Crops Research Institute, Malang Regency, and used the augmented design method. Research data were analyzed with R Program. Eight hundred and seventy one genotypes were selected with the selection criteria were based on yields on the average parents added 1.5 standard deviation. Based on the selection, eighty genotypes were analyzed with cluster analyses. Nine observation variables were used to develop cluster dendrogram using average linked method. Genetic distance was measured by euclidean distance. The results of cluster dendrogram showed that tested genotypes were divided into eight groups. Group 1, 2, 7, and 8 each had one genotype, group 3 and 6 each had two genotypes, group 4 had 25 genotypes, and group 5 had 51 genotypes. Check genotypes formed a separate group. Group 6 had the highest yield per plant of 126.11 gram, followed by groups 5 and 4 of 97.63 and 94.08 gram, respectively.

  1. Specificity of the Linear Array HPV Genotyping Test for detecting human papillomavirus genotype 52 (HPV-52)

    Kocjan, Boštjan; Poljak, Mario; Oštrbenk, Anja

    2015-01-01

    Introduction: HPV-52 is one of the most frequent human papillomavirus (HPV) genotypes causing significant cervical pathology. The most widely used HPV genotyping assay, the Roche Linear Array HPV Genotyping Test (Linear Array), is unable to identify HPV- 52 status in samples containing HPV-33, HPV-35, and/or HPV-58. Methods: Linear Array HPV-52 analytical specificity was established by testing 100 specimens reactive with the Linear Array HPV- 33/35/52/58 cross-reactive probe, but not with the...

  2. Treating pre-instrumental data as "missing" data: using a tree-ring-based paleoclimate record and imputations to reconstruct streamflow in the Missouri River Basin

    Ho, M. W.; Lall, U.; Cook, E. R.

    2015-12-01

    Advances in paleoclimatology in the past few decades have provided opportunities to expand the temporal perspective of the hydrological and climatological variability across the world. The North American region is particularly fortunate in this respect where a relatively dense network of high resolution paleoclimate proxy records have been assembled. One such network is the annually-resolved Living Blended Drought Atlas (LBDA): a paleoclimate reconstruction of the Palmer Drought Severity Index (PDSI) that covers North America on a 0.5° × 0.5° grid based on tree-ring chronologies. However, the use of the LBDA to assess North American streamflow variability requires a model by which streamflow may be reconstructed. Paleoclimate reconstructions have typically used models that first seek to quantify the relationship between the paleoclimate variable and the environmental variable of interest before extrapolating the relationship back in time. In contrast, the pre-instrumental streamflow is here considered as "missing" data. A method of imputing the "missing" streamflow data, prior to the instrumental record, is applied through multiple imputation using chained equations for streamflow in the Missouri River Basin. In this method, the distribution of the instrumental streamflow and LBDA is used to estimate sets of plausible values for the "missing" streamflow data resulting in a ~600 year-long streamflow reconstruction. Past research into external climate forcings, oceanic-atmospheric variability and its teleconnections, and assessments of rare multi-centennial instrumental records demonstrate that large temporal oscillations in hydrological conditions are unlikely to be captured in most instrumental records. The reconstruction of multi-centennial records of streamflow will enable comprehensive assessments of current and future water resource infrastructure and operations under the existing scope of natural climate variability.

  3. A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method

    Jun-He Yang; Ching-Hsue Cheng; Chia-Pan Chan

    2017-01-01

    Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir's water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting m...

  4. 21 CFR 1404.630 - May the Office of National Drug Control Policy impute conduct of one person to another?

    2010-04-01

    ... 21 Food and Drugs 9 2010-04-01 2010-04-01 false May the Office of National Drug Control Policy impute conduct of one person to another? 1404.630 Section 1404.630 Food and Drugs OFFICE OF NATIONAL DRUG CONTROL POLICY GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1404.630...

  5. 29 CFR 1471.630 - May the Federal Mediation and Conciliation Service impute conduct of one person to another?

    2010-07-01

    ... 29 Labor 4 2010-07-01 2010-07-01 false May the Federal Mediation and Conciliation Service impute...) FEDERAL MEDIATION AND CONCILIATION SERVICE GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1471.630 May the Federal Mediation and...

  6. Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels

    van Leeuwen, E.M.; Karssen, L.C.; Deelen, J.; Isaacs, A.; Medina-Gomez, C.; Mbarek, H.; Kanterakis, A.; Trompet, S.; Postmus, I.; Verweij, N.; van Enckevort, D.; Huffman, J.E.; White, C.C.; Feitosa, M.F.; Bartz, T.M.; Manichaikul, A.; Joshi, P.K.; Peloso, G.M.; Deelen, P.; Dijk, F.; Willemsen, G.; de Geus, E.J.C.; Milaneschi, Y.; Penninx, B.W.J.H.; Francioli, L.C.; Menelaou, A.; Pulit, S.L.; Rivadeneira, F.; Hofman, A.; Oostra, B.A.; Franco, O.H.; Mateo Leach, I.; Beekman, M.; de Craen, A.J.; Uh, H.W.; Trochet, H.; Hocking, L.J.; Porteous, D.J.; Sattar, N.; Packard, C.J.; Buckley, B.M.; Brody, J.A.; Bis, J.C.; Rotter, J.I.; Mychaleckyj, J.C.; Campbell, H.; Duan, Q.; Lange, L.A.; Wilson, J.F.; Hayward, C.; Polasek, O.; Vitart, V.; Rudan, I.; Wright, A.F.; Rich, S.S.; Psaty, B.M.; Borecki, I.B.; Kearney, P.M.; Stott, D.J.; Cupples, L.A.; Jukema, J.W.; van der Harst, P.; Sijbrands, E.J.; Hottenga, J.J.; Uitterlinden, A.G.; Swertz, M.A.; van Ommen, G.J.B; Bakker, P.I.W.; Slagboom, P.E.; Boomsma, D.I.; Wijmenga, C.; van Duijn, C.M.

    2015-01-01

    Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (∼35,000 samples) with the population-specific reference panel created

  7. 31 CFR 19.630 - May the Department of the Treasury impute conduct of one person to another?

    2010-07-01

    ... 31 Money and Finance: Treasury 1 2010-07-01 2010-07-01 false May the Department of the Treasury impute conduct of one person to another? 19.630 Section 19.630 Money and Finance: Treasury Office of the Secretary of the Treasury GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles...

  8. Genotyping of Coxiella burnetii from domestic ruminants in northern Spain

    Astobiza Ianire

    2012-12-01

    Full Text Available Abstract Background Information on the genotypic diversity of Coxiella burnetii isolates from infected domestic ruminants in Spain is limited. The aim of this study was to identify the C. burnetii genotypes infecting livestock in Northern Spain and compare them to other European genotypes. A commercial real-time PCR targeting the IS1111a insertion element was used to detect the presence of C. burnetii DNA in domestic ruminants from Spain. Genotypes were determined by a 6-loci Multiple Locus Variable number tandem repeat analysis (MLVA panel and Multispacer Sequence Typing (MST. Results A total of 45 samples from 4 goat herds (placentas, N = 4, 12 dairy cattle herds (vaginal mucus, individual milk, bulk tank milk, aerosols, N = 20 and 5 sheep flocks (placenta, vaginal swabs, faeces, air samples, dust, N = 21 were included in the study. Samples from goats and sheep were obtained from herds which had suffered abortions suspected to be caused by C. burnetii, whereas cattle samples were obtained from animals with reproductive problems compatible with C. burnetii infection, or consisted of bulk tank milk (BTM samples from a Q fever surveillance programme. C. burnetii genotypes identified in ruminants from Spain were compared to those detected in other countries. Three MLVA genotypes were found in 4 goat farms, 7 MLVA genotypes were identified in 12 cattle herds and 4 MLVA genotypes were identified in 5 sheep flocks. Clustering of the MLVA genotypes using the minimum spanning tree method showed a high degree of genetic similarity between most MLVA genotypes. Overall 11 different MLVA genotypes were obtained corresponding to 4 different MST genotypes: MST genotype 13, identified in goat, sheep and cattle from Spain; MST genotype 18, only identified in goats; and, MST genotypes 8 and 20, identified in small ruminants and cattle, respectively. All these genotypes had been previously identified in animal and human clinical samples from several

  9. The potential of plant viruses to promote genotypic diversity via genotype x environment interactions

    van Mölken, Tamara; Stuefer, Josef F.

    2011-01-01

    † Background and Aims Genotype by environment (G × E) interactions are important for the long-term persistence of plant species in heterogeneous environments. It has often been suggested that disease is a key factor for the maintenance of genotypic diversity in plant populations. However, empirical...... and the G × E interactions were examined with respect to genotypespecific plant responses to WClMV infection. Thus, the environment is defined as the presence or absence of the virus. † Key Results WClMV had a negative effect on plant performance as shown by a decrease in biomass and number of ramets...... evidence for this contention is scarce. Here virus infection is proposed as a possible candidate for maintaining genotypic diversity in their host plants. † Methods The effects of White clover mosaic virus (WClMV) on the performance and development of different Trifolium repens genotypes were analysed...

  10. Impute DC link (IDCL) cell based power converters and control thereof

    Divan, Deepakraj M.; Prasai, Anish; Hernendez, Jorge; Moghe, Rohit; Iyer, Amrit; Kandula, Rajendra Prasad

    2016-04-26

    Power flow controllers based on Imputed DC Link (IDCL) cells are provided. The IDCL cell is a self-contained power electronic building block (PEBB). The IDCL cell may be stacked in series and parallel to achieve power flow control at higher voltage and current levels. Each IDCL cell may comprise a gate drive, a voltage sharing module, and a thermal management component in order to facilitate easy integration of the cell into a variety of applications. By providing direct AC conversion, the IDCL cell based AC/AC converters reduce device count, eliminate the use of electrolytic capacitors that have life and reliability issues, and improve system efficiency compared with similarly rated back-to-back inverter system.

  11. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits.

    Benjamin F Voight

    Full Text Available Genome-wide association studies have identified hundreds of loci for type 2 diabetes, coronary artery disease and myocardial infarction, as well as for related traits such as body mass index, glucose and insulin levels, lipid levels, and blood pressure. These studies also have pointed to thousands of loci with promising but not yet compelling association evidence. To establish association at additional loci and to characterize the genome-wide significant loci by fine-mapping, we designed the "Metabochip," a custom genotyping array that assays nearly 200,000 SNP markers. Here, we describe the Metabochip and its component SNP sets, evaluate its performance in capturing variation across the allele-frequency spectrum, describe solutions to methodological challenges commonly encountered in its analysis, and evaluate its performance as a platform for genotype imputation. The metabochip achieves dramatic cost efficiencies compared to designing single-trait follow-up reagents, and provides the opportunity to compare results across a range of related traits. The metabochip and similar custom genotyping arrays offer a powerful and cost-effective approach to follow-up large-scale genotyping and sequencing studies and advance our understanding of the genetic basis of complex human diseases and traits.

  12. Multiple Imputation of Groundwater Data to Evaluate Spatial and Temporal Anthropogenic Influences on Subsurface Water Fluxes in Los Angeles, CA

    Manago, K. F.; Hogue, T. S.; Hering, A. S.

    2014-12-01

    In the City of Los Angeles, groundwater accounts for 11% of the total water supply on average, and 30% during drought years. Due to ongoing drought in California, increased reliance on local water supply highlights the need for better understanding of regional groundwater dynamics and estimating sustainable groundwater supply. However, in an urban setting, such as Los Angeles, understanding or modeling groundwater levels is extremely complicated due to various anthropogenic influences such as groundwater pumping, artificial recharge, landscape irrigation, leaking infrastructure, seawater intrusion, and extensive impervious surfaces. This study analyzes anthropogenic effects on groundwater levels using groundwater monitoring well data from the County of Los Angeles Department of Public Works. The groundwater data is irregularly sampled with large gaps between samples, resulting in a sparsely populated dataset. A multiple imputation method is used to fill the missing data, allowing for multiple ensembles and improved error estimates. The filled data is interpolated to create spatial groundwater maps utilizing information from all wells. The groundwater data is evaluated at a monthly time step over the last several decades to analyze the effect of land cover and identify other influencing factors on groundwater levels spatially and temporally. Preliminary results show irrigated parks have the largest influence on groundwater fluctuations, resulting in large seasonal changes, exceeding changes in spreading grounds. It is assumed that these fluctuations are caused by watering practices required to sustain non-native vegetation. Conversely, high intensity urbanized areas resulted in muted groundwater fluctuations and behavior decoupling from climate patterns. Results provides improved understanding of anthropogenic effects on groundwater levels in addition to providing high quality datasets for validation of regional groundwater models.

  13. Precise genotyping and recombination detection of Enterovirus

    2015-01-01

    Enteroviruses (EV) with different genotypes cause diverse infectious diseases in humans and mammals. A correct EV typing result is crucial for effective medical treatment and disease control; however, the emergence of novel viral strains has impaired the performance of available diagnostic tools. Here, we present a web-based tool, named EVIDENCE (EnteroVirus In DEep conception, http://symbiont.iis.sinica.edu.tw/evidence), for EV genotyping and recombination detection. We introduce the idea of using mixed-ranking scores to evaluate the fitness of prototypes based on relatedness and on the genome regions of interest. Using phylogenetic methods, the most possible genotype is determined based on the closest neighbor among the selected references. To detect possible recombination events, EVIDENCE calculates the sequence distance and phylogenetic relationship among sequences of all sliding windows scanning over the whole genome. Detected recombination events are plotted in an interactive figure for viewing of fine details. In addition, all EV sequences available in GenBank were collected and revised using the latest classification and nomenclature of EV in EVIDENCE. These sequences are built into the database and are retrieved in an indexed catalog, or can be searched for by keywords or by sequence similarity. EVIDENCE is the first web-based tool containing pipelines for genotyping and recombination detection, with updated, built-in, and complete reference sequences to improve sensitivity and specificity. The use of EVIDENCE can accelerate genotype identification, aiding clinical diagnosis and enhancing our understanding of EV evolution. PMID:26678286

  14. A Review of Methods for Missing Data.

    Pigott, Therese D.

    2001-01-01

    Reviews methods for handling missing data in a research study. Model-based methods, such as maximum likelihood using the EM algorithm and multiple imputation, hold more promise than ad hoc methods. Although model-based methods require more specialized computer programs and assumptions about the nature of missing data, these methods are appropriate…

  15. Fluorescence resonance energy transfer-based real-time polymerase chain reaction method without DNA extraction for the genotyping of F5, F2, F12, MTHFR, and HFE.

    Martinez-Serra, Jordi; Robles, Juan; Nicolàs, Antoni; Gutierrez, Antonio; Ros, Teresa; Amat, Juan Carlos; Alemany, Regina; Vögler, Oliver; Abelló, Aina; Noguera, Aina; Besalduch, Joan

    2014-01-01

    Blood samples are extensively used for the molecular diagnosis of many hematological diseases. The daily practice in a clinical laboratory of molecular diagnosis in hematology involves using a variety of techniques, based on the amplification of nucleic acids. Current methods for polymerase chain reaction (PCR) use purified genomic DNA, mostly isolated from total peripheral blood cells or white blood cells (WBC). In this paper we describe a real-time fluorescence resonance energy transfer-based method for genotyping directly from blood cells. Our strategy is based on an initial isolation of the WBCs, allowing the removal of PCR inhibitors, such as the heme group, present in the erythrocytes. Once the erythrocytes have been lysed, in the LightCycler(®) 2.0 Instrument, we perform a real-time PCR followed by a melting curve analysis for different genes (Factors 2, 5, 12, MTHFR, and HFE). After testing 34 samples comparing the real-time crossing point (CP) values between WBC (5×10(6) WBC/mL) and purified DNA (20 ng/μL), the results for F5 Leiden were as follows: CP mean value for WBC was 29.26±0.566 versus purified DNA 24.79±0.56. Thus, when PCR was performed from WBC (5×10(6) WBC/mL) instead of DNA (20 ng/μL), we observed a delay of about 4 cycles. These small differences in CP values were similar for all genes tested and did not significantly affect the subsequent analysis by melting curves. In both cases the fluorescence values were high enough, allowing a robust genotyping of all these genes without a previous DNA purification/extraction.

  16. A novel multiplex RT-qPCR method based on dual-labelled probes suitable for typing all known genotypes of viral haemorrhagic septicaemia virus

    Vázquez, D.; López-Vázquez, C.; Skall, Helle Frank

    2016-01-01

    , resulting in a correct detection and typing of all strains. The analytical sensitivity was evaluated in a comparative assay with titration in cell culture, observing that both methods provided similar limits of detection. The proposed method can be a powerful tool for epidemiological analysis of VHSV...

  17. Hepatitis C virus genotypes in Bahawalpur

    Qazi, M.A.; Fayyaz, M.; Chaudhry, G.M.D.; Jamil, A.

    2006-01-01

    This study was conducted at Medical Unit-II Bahawal Victoria Hospital / Quaid-e-Azam Medical College Bahawalpur from May 1st , 2005 to December 31st 2005. The objective of this study was to determine hepatitis C virus (HCV) genotypes in Bahawalpur, Pakistan. In consecutive 105 anti-HCV (ELISA-3) positive patients, complete history and physical examination was performed. Liver function tests, complete blood counts and platelet count, blood sugar fasting and 2 hours after breakfast, prothrombin time, serum albumin, serum globulin and abdominal ultrasound were carried out in all the patients. Tru cut biopsy was performed on 17 patients. We studied HCV RNA in all these patients by Nested PCR method. HCV RNA was detected in 98 patients and geno typing assay was done by genotype specific PCR. Among total of 105 anti-HCV positive patients, HCV-RNA was detected in 98 patients. Out of these 98 patients there were 57 (58.2%) males and 41 (42.8%) females. Their age range was 18-75 years. The age 18-29 years 26 (26.5%), 30-39 years 35 (35.7%) and 40-75 37 (37.8%), while 10 (10.2%) patients were diabetics and 34 (34.7%) patients were obese. Liver cirrhosis was present in 10 (10.2%) patients. Forty two (43.9%) patients were symptomatic while 56 (57.1%) were asymptomatic. Out of 98 patients 11 (11.2%) were un type-able and 87 (88.8%) were type able. 70/98 (71.4%) were genotype 3; 10/98 (10.2%) were genotype 1; 03/98 (3.1%) were genotype 2; 03/98 (3.1%) were mixed genotype 2 and 3; 01/98 (1%) were mixed genotype 3a and 3b. Genotype 3 is the most common HCV virus in our area which shows that both virological and biochemical response will be better. Because HCV genotype 3 is more frequent among the drug users which points towards unsafe injection practices in our area. (author)

  18. Development of a monoclonal antibody against viral haemorrhagic septicaemia virus (VHSV) genotype IVa

    Ito, T.; Olesen, Niels Jørgen; Skall, Helle Frank

    2010-01-01

    of the spread of genotypes to new geographical areas. A monoclonal antibody (MAb) against VHSV genotype IVa was produced, with the aim of providing a simple method of discriminating this genotype from the other VHSV genotypes (I, II, III and IVb). Balb/c mice were injected with purified VHSV-JF00Ehil (genotype...... IVa) from diseased farmed Japanese flounder. Ten hybridoma clones secreting monoclonal antibodies (MAbs) against VHSV were established. One of these, MAb VHS-10, reacted only with genotype IVa in indirect fluorescent antibody technique (IFAT) and ELISA. Using cell cultures that were transfected...

  19. Assessment of the microbiota of a mixed infection of the tongue using phenotypic and genotypic methods simultaneously and a review of the literature

    Veloo, A. C. M.; Schepers, R. H.; Welling, G. W.; Degener, J. E.

    We assessed the microbiota of a tongue abscess in which twelve different aerobic and anaerobic bacteria were identified using fluorescent in situ hybridisation (FISH), sequencing of the 16S rRNA gene and phenotypic methods. By applying the 16S rRNA based probes directly on the clinical material, a

  20. Double sampling with multiple imputation to answer large sample meta-research questions: Introduction and illustration by evaluating adherence to two simple CONSORT guidelines

    Patrice L. Capers

    2015-03-01

    Full Text Available BACKGROUND: Meta-research can involve manual retrieval and evaluation of research, which is resource intensive. Creation of high throughput methods (e.g., search heuristics, crowdsourcing has improved feasibility of large meta-research questions, but possibly at the cost of accuracy. OBJECTIVE: To evaluate the use of double sampling combined with multiple imputation (DS+MI to address meta-research questions, using as an example adherence of PubMed entries to two simple Consolidated Standards of Reporting Trials (CONSORT guidelines for titles and abstracts. METHODS: For the DS large sample, we retrieved all PubMed entries satisfying the filters: RCT; human; abstract available; and English language (n=322,107. For the DS subsample, we randomly sampled 500 entries from the large sample. The large sample was evaluated with a lower rigor, higher throughput (RLOTHI method using search heuristics, while the subsample was evaluated using a higher rigor, lower throughput (RHITLO human rating method. Multiple imputation of the missing-completely-at-random RHITLO data for the large sample was informed by: RHITLO data from the subsample; RLOTHI data from the large sample; whether a study was an RCT; and country and year of publication. RESULTS: The RHITLO and RLOTHI methods in the subsample largely agreed (phi coefficients: title=1.00, abstract=0.92. Compliance with abstract and title criteria has increased over time, with non-US countries improving more rapidly. DS+MI logistic regression estimates were more precise than subsample estimates (e.g., 95% CI for change in title and abstract compliance by Year: subsample RHITLO 1.050-1.174 vs. DS+MI 1.082-1.151. As evidence of improved accuracy, DS+MI coefficient estimates were closer to RHITLO than the large sample RLOTHI. CONCLUSIONS: Our results support our hypothesis that DS+MI would result in improved precision and accuracy. This method is flexible and may provide a practical way to examine large corpora of

  1. Joint genome-wide prediction in several populations accounting for randomness of genotypes: A hierarchical Bayes approach. I: Multivariate Gaussian priors for marker effects and derivation of the joint probability mass function of genotypes.

    Martínez, Carlos Alberto; Khare, Kshitij; Banerjee, Arunava; Elzo, Mauricio A

    2017-03-21

    It is important to consider heterogeneity of marker effects and allelic frequencies in across population genome-wide prediction studies. Moreover, all regression models used in genome-wide prediction overlook randomness of genotypes. In this study, a family of hierarchical Bayesian models to perform across population genome-wide prediction modeling genotypes as random variables and allowing population-specific effects for each marker was developed. Models shared a common structure and differed in the priors used and the assumption about residual variances (homogeneous or heterogeneous). Randomness of genotypes was accounted for by deriving the joint probability mass function of marker genotypes conditional on allelic frequencies and pedigree information. As a consequence, these models incorporated kinship and genotypic information that not only permitted to account for heterogeneity of allelic frequencies, but also to include individuals with missing genotypes at some or all loci without the need for previous imputation. This was possible because the non-observed fraction of the design matrix was treated as an unknown model parameter. For each model, a simpler version ignoring population structure, but still accounting for randomness of genotypes was proposed. Implementation of these models and computation of some criteria for model comparison were illustrated using two simulated datasets. Theoretical and computational issues along with possible applications, extensions and refinements were discussed. Some features of the models developed in this study make them promising for genome-wide prediction, the use of information contained in the probability distribution of genotypes is perhaps the most appealing. Further studies to assess the performance of the models proposed here and also to compare them with conventional models used in genome-wide prediction are needed. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. Avoid Filling Swiss Cheese with Whipped Cream; Imputation Techniques and Evaluation Procedures for Cross-Country Time Series

    Michael Weber; Michaela Denk

    2011-01-01

    International organizations collect data from national authorities to create multivariate cross-sectional time series for their analyses. As data from countries with not yet well-established statistical systems may be incomplete, the bridging of data gaps is a crucial challenge. This paper investigates data structures and missing data patterns in the cross-sectional time series framework, reviews missing value imputation techniques used for micro data in official statistics, and discusses the...

  3. Assessment of Consequences of Replacement of System of the Uniform Tax on Imputed Income Patent System of the Taxation

    Galina A. Manokhina

    2012-11-01

    Full Text Available The article highlights the main questions concerning possible consequences of replacement of nowadays operating system in the form of a single tax in reference to imputed income with patent system of the taxation. The main advantages and drawbacks of new system of the taxation are shown, including the opinion that not the replacement of one special mode of the taxation with another is more effective, but the introduction of patent a taxation system as an auxilary system.

  4. Choosing tree genotypes for phytoremediation of landfill leachate using phyto-recurrent selection

    Jill A. Zalesny; Ronald S., Jr. Zalesny; Adam H. Wiese; Richard B. Hall

    2007-01-01

    Information about the response of poplar (Populus spp.) genotypes to landfill leachate irrigation is needed, along with efficient methods for choosing genotypes based on leachate composition. Poplar clones were irrigated during three cycles of phyto-recurrent selection to test whether genotypes responded differently to leachate and water, and to test...

  5. Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias

    Didion John P

    2012-01-01

    Full Text Available Abstract Background High-density genotyping arrays that measure hybridization of genomic DNA fragments to allele-specific oligonucleotide probes are widely used to genotype single nucleotide polymorphisms (SNPs in genetic studies, including human genome-wide association studies. Hybridization intensities are converted to genotype calls by clustering algorithms that assign each sample to a genotype class at each SNP. Data for SNP probes that do not conform to the expected pattern of clustering are often discarded, contributing to ascertainment bias and resulting in lost information - as much as 50% in a recent genome-wide association study in dogs. Results We identified atypical patterns of hybridization intensities that were highly reproducible and demonstrated that these patterns represent genetic variants that were not accounted for in the design of the array platform. We characterized variable intensity oligonucleotide (VINO probes that display such patterns and are found in all hybridization-based genotyping platforms, including those developed for human, dog, cattle, and mouse. When recognized and properly interpreted, VINOs recovered a substantial fraction of discarded probes and counteracted SNP ascertainment bias. We developed software (MouseDivGeno that identifies VINOs and improves the accuracy of genotype calling. MouseDivGeno produced highly concordant genotype calls when compared with other methods but it uniquely identified more than 786000 VINOs in 351 mouse samples. We used whole-genome sequence from 14 mouse strains to confirm the presence of novel variants explaining 28000 VINOs in those strains. We also identified VINOs in human HapMap 3 samples, many of which were specific to an African population. Incorporating VINOs in phylogenetic analyses substantially improved the accuracy of a Mus species tree and local haplotype assignment in laboratory mouse strains. Conclusion The problems of ascertainment bias and missing

  6. HBV genotypic variability in Cuba.

    Carmen L Loureiro

    Full Text Available The genetic diversity of HBV in human population is often a reflection of its genetic admixture. The aim of this study was to explore the genotypic diversity of HBV in Cuba. The S genomic region of Cuban HBV isolates was sequenced and for selected isolates the complete genome or precore-core sequence was analyzed. The most frequent genotype was A (167/250, 67%, mainly A2 (149, 60% but also A1 and one A4. A total of 77 isolates were classified as genotype D (31%, with co-circulation of several subgenotypes (56 D4, 2 D1, 5 D2, 7 D3/6 and 7 D7. Three isolates belonged to genotype E, two to H and one to B3. Complete genome sequence analysis of selected isolates confirmed the phylogenetic analysis performed with the S region. Mutations or polymorphisms in precore region were more common among genotype D compared to genotype A isolates. The HBV genotypic distribution in this Caribbean island correlates with the Y lineage genetic background of the population, where a European and African origin prevails. HBV genotypes E, B3 and H isolates might represent more recent introductions.

  7. HBV Genotypic Variability in Cuba

    Loureiro, Carmen L.; Aguilar, Julio C.; Aguiar, Jorge; Muzio, Verena; Pentón, Eduardo; Garcia, Daymir; Guillen, Gerardo; Pujol, Flor H.

    2015-01-01

    The genetic diversity of HBV in human population is often a reflection of its genetic admixture. The aim of this study was to explore the genotypic diversity of HBV in Cuba. The S genomic region of Cuban HBV isolates was sequenced and for selected isolates the complete genome or precore-core sequence was analyzed. The most frequent genotype was A (167/250, 67%), mainly A2 (149, 60%) but also A1 and one A4. A total of 77 isolates were classified as genotype D (31%), with co-circulation of several subgenotypes (56 D4, 2 D1, 5 D2, 7 D3/6 and 7 D7). Three isolates belonged to genotype E, two to H and one to B3. Complete genome sequence analysis of selected isolates confirmed the phylogenetic analysis performed with the S region. Mutations or polymorphisms in precore region were more common among genotype D compared to genotype A isolates. The HBV genotypic distribution in this Caribbean island correlates with the Y lineage genetic background of the population, where a European and African origin prevails. HBV genotypes E, B3 and H isolates might represent more recent introductions. PMID:25742179

  8. Evaluation of 14 winter bread wheat genotypes in normal irrigation ...

    Evaluation of 14 winter bread wheat genotypes in normal irrigation and stress conditions after anthesis stage. ... African Journal of Biotechnology ... Using biplot graphic method, comparison of indices amounts and mean rating of indices for ...

  9. Fluorescence resonance energy transfer-based real-time polymerase chain reaction method without DNA extraction for the genotyping of F5, F2, F12, MTHFR, and HFE

    Martinez-Serra J

    2014-06-01

    Full Text Available Jordi Martinez-Serra,1 Juan Robles,2 Antoni Nicolàs,3 Antonio Gutierrez,1 Teresa Ros,1 Juan Carlos Amat,1 Regina Alemany,4 Oliver Vögler,4 Aina Abelló,2 Aina Noguera,2 Joan Besalduch1 1Department of Hematology, 2Department of Clinical Analysis, Hospital Universitary Son Espases, Palma de Mallorca, Spain; 3ECOGEN, Barcelona, 4Department of Cell Biology, University of the Balearic Islands, Palma de Mallorca, Spain Abstract: Blood samples are extensively used for the molecular diagnosis of many hematological diseases. The daily practice in a clinical laboratory of molecular diagnosis in hematology involves using a variety of techniques, based on the amplification of nucleic acids. Current methods for polymerase chain reaction (PCR use purified genomic DNA, mostly isolated from total peripheral blood cells or white blood cells (WBC. In this paper we describe a real-time fluorescence resonance energy transfer-based method for genotyping directly from blood cells. Our strategy is based on an initial isolation of the WBCs, allowing the removal of PCR inhibitors, such as the heme group, present in the erythrocytes. Once the erythrocytes have been lysed, in the LightCycler® 2.0 Instrument, we perform a real-time PCR followed by a melting curve analysis for different genes (Factors 2, 5, 12, MTHFR, and HFE. After testing 34 samples comparing the real-time crossing point (CP values between WBC (5×106 WBC/mL and purified DNA (20 ng/µL, the results for F5 Leiden were as follows: CP mean value for WBC was 29.26±0.566 versus purified DNA 24.79±0.56. Thus, when PCR was performed from WBC (5×106 WBC/mL instead of DNA (20 ng/µL, we observed a delay of about 4 cycles. These small differences in CP values were similar for all genes tested and did not significantly affect the subsequent analysis by melting curves. In both cases the fluorescence values were high enough, allowing a robust genotyping of all these genes without a previous DNA purification

  10. Effect of different home-cooking methods on textural and nutritional properties of sweet potato genotypes grown in temperate climate conditions.

    Nicoletto, Carlo; Vianello, Fabio; Sambo, Paolo

    2018-01-01

    The European Union (EU) market for sweet potato is small but is growing considerably and and has increased by 100% over the last 5 years. The cultivation of sweet potato in temperate climate conditions has not considered extensively and could be a new opportunity for the EU market. Healthy and qualitative traits of different sweet potato cultivars grown in temperate climate conditions were evaluated in accordance with four cooking methods. Traditional cultivars showed high hardness and adhesiveness values. The highest concentrations of sugars (especially maltose) and phenolic acids (caffeic and chlorogenic) were found in samples treated by boiling and steaming. High antioxidant activity was found in fried potatoes. Qualitative traits of sweet potatoes treated by microwaves did not report any significant variation compared to the control. Traditional and new sweet potato cultivars can be cultivated in temperate climate conditions and show interesting qualitative properties, especially as a result of the presence of antioxidant compounds. Concerning global quality, colored varieties expressed a better profile than traditional Italian ones and they are suitable for the European market, giving new opportunities for consumers and producers. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.

  11. A fast and cost-effective method for apolipoprotein E isotyping as an alternative to APOE genotyping for patient screening and stratification.

    Calero, Olga; García-Albert, Luis; Rodríguez-Martín, Andrés; Veiga, Sergio; Calero, Miguel

    2018-04-13

    Apolipoprotein E (apoE) is a 34 kDa glycoprotein involved in lipid metabolism. The human APOE gene encodes for three different apoE protein isoforms: E2, E3 and E4. The interest in apoE isoforms is high for epidemiological research, patient stratification and identification of those at increased risk for clinical trials and prevention. The isoform apoE4 is associated with increased risk for coronary heart and Alzheimer's diseases. This paper describes a method for specifically detecting the apoE4 isoform from biological fluids by taking advantage of the capacity of apoE to bind "specifically" to polystyrene surfaces as capture and a specific anti-apoE4 monoclonal antibody as reporter. Our results indicate that the apoE-polystyrene binding interaction is highly stable, resistant to detergents and acid and basic washes. The methodology here described is accurate, easily implementable, fast and cost-effective. Although at present, our technique is unable to discriminate homozygous APOE ε4/ε4 from APOE ε3/ε4 and ε2/ε4 heterozygous, it opens new avenues for the development of inexpensive, yet effective, tests for the detection of apoE4 for patients' stratification. Preliminary results indicated that this methodology is also adaptable into turbidimetric platforms, which make it a good candidate for clinical implementation through its translation to the clinical analysis routine.

  12. Identification of polymorphic inversions from genotypes

    Cáceres Alejandro

    2012-02-01

    Full Text Available Abstract Background Polymorphic inversions are a source of genetic variability with a direct impact on recombination frequencies. Given the difficulty of their experimental study, computational methods have been developed to infer their existence in a large number of individuals using genome-wide data of nucleotide variation. Methods based on haplotype tagging of known inversions attempt to classify individuals as having a normal or inverted allele. Other methods that measure differences between linkage disequilibrium attempt to identify regions with inversions but unable to classify subjects accurately, an essential requirement for association studies. Results We present a novel method to both identify polymorphic inversions from genome-wide genotype data and classify individuals as containing a normal or inverted allele. Our method, a generalization of a published method for haplotype data 1, utilizes linkage between groups of SNPs to partition a set of individuals into normal and inverted subpopulations. We employ a sliding window scan to identify regions likely to have an inversion, and accumulation of evidence from neighboring SNPs is used to accurately determine the inversion status of each subject. Further, our approach detects inversions directly from genotype data, thus increasing its usability to current genome-wide association studies (GWAS. Conclusions We demonstrate the accuracy of our method to detect inversions and classify individuals on principled-simulated genotypes, produced by the evolution of an inversion event within a coalescent model 2. We applied our method to real genotype data from HapMap Phase III to characterize the inversion status of two known inversions within the regions 17q21 and 8p23 across 1184 individuals. Finally, we scan the full genomes of the European Origin (CEU and Yoruba (YRI HapMap samples. We find population-based evidence for 9 out of 15 well-established autosomic inversions, and for 52 regions

  13. The discrimination of d-tartrate positive and d-tartrate negative S. enterica subsp. enterica serovar Paratyphi B isolated in Malaysia by phenotypic and genotypic methods.

    Ahmad, Norazah; Hoon, Shirley Tang Gee; Ghani, Mohamed Kamel Abd; Tee, Koh Yin

    2012-06-01

    Serotyping is not sufficient to differentiate between Salmonella species that cause paratyphoid fever from the strains that cause milder gastroenteritis as these organisms share the same serotype Salmonella Paratyphi B (S. Paratyphi B). Strains causing paratyphoid fever do not ferment d-tartrate and this key feature was used in this study to determine the prevalence of these strains among the collection of S. Paratyphi B strains isolated from patients in Malaysia. A total of 105 isolates of S. Paratyphi B were discriminated into d-tartrate positive (dT+) and d-tartrate negative (dT) variants by two lead acetate test protocols and multiplex PCR. The lead acetate test protocol 1 differed from protocol 2 by a lower inoculum size and different incubation conditions while the multiplex PCR utilized 2 sets of primers targeting the ATG start codon of the gene STM3356. Lead acetate protocol 1 discriminated 97.1% of the isolates as S. Paratyphi B dT+ and 2.9% as dT while test protocol 2 discriminated all the isolates as S. Paratyphi B dT+. The multiplex PCR test identified all 105 isolates as S. Paratyphi B dT+ strains. The concordance of the lead acetate test relative to that of multiplex PCR was 97.7% and 100% for protocol 1 and 2 respectively. This study showed that S. Paratyphi B dT+ is a common causative agent of gastroenteritis in Malaysia while paratyphoid fever appears to be relatively uncommon. Multiplex PCR was shown to be a simpler, more rapid and reliable method to discriminate S. Paratyphi B than the phenotypic lead acetate test.

  14. Establishment of a novel two-probe real-time PCR for simultaneously quantification of hepatitis B virus DNA and distinguishing genotype B from non-B genotypes.

    Wang, Wei; Liang, Hongpin; Zeng, Yongbin; Lin, Jinpiao; Liu, Can; Jiang, Ling; Yang, Bin; Ou, Qishui

    2014-11-01

    Establishment of a simple, rapid and economical method for quantification and genotyping of hepatitis B virus (HBV) is of great importance for clinical diagnosis and treatment of chronic hepatitis B patients. We hereby aim to develop a novel two-probe real-time PCR for simultaneous quantification of HBV viral concentration and distinguishing genotype B from non-B genotypes. Conserved primers and TaqMan probes for genotype B and non-B genotypes were designed. The linear range, detection sensitivity, specificity and repeatability of the method were assessed. 539 serum samples from HBV-infected patients were assayed, and the results were compared with commercial HBV quantification and HBV genotyping kits. The detection sensitivity of the two-probe real-time PCR was 500IU/ml; the linear range was 10(3)-10(9)IU/ml, and the intra-assay CVs and inter-assay CVs were between 0.84% and 2.80%. No cross-reaction was observed between genotypes B and non-B. Of the 539 detected samples, 509 samples were HBV DNA positive. The results showed that 54.0% (275/509) of the samples were genotype B, 39.5% (201/509) were genotype non-B and 6.5% (33/509) were mixed genotype. The coincidence rate between the method and a commercial HBV DNA genotyping kit was 95.9% (488/509, kappa=0.923, PDNA qPCR kit were achieved. A novel two-probe real-time PCR method for simultaneous quantification of HBV viral concentration and distinguishing genotype B from non-B genotypes was established. The assay was sensitive, specific and reproducible which can be applied to areas prevalent with HBV genotypes B and C, especially in China. Copyright © 2014 Elsevier B.V. All rights reserved.

  15. Toward fully automated genotyping: Genotyping microsatellite markers by deconvolution

    Perlin, M.W.; Lancia, G.; See-Kiong, Ng [Carnegie Mellon Univ., Pittsburgh, PA (United States)

    1995-11-01

    Dense genetic linkage maps have been constructed for the human and mouse genomes, with average densities of 2.9 cM and 0.35 cM, respectively. These genetic maps are crucial for mapping both Mendelian and complex traits and are useful in clinical genetic diagnosis. Current maps are largely comprised of abundant, easily assayed, and highly polymorphic PCR-based microsatellite markers, primarily dinucleotide (CA){sub n} repeats. One key limitation of these length polymorphisms is the PCR stutter (or slippage) artifact that introduces additional stutter bands. With two (or more) closely spaced alleles, the stutter bands overlap, and it is difficult to accurately determine the correct alleles; this stutter phenomenon has all but precluded full automation, since a human must visually inspect the allele data. We describe here novel deconvolution methods for accurate genotyping that mathematically remove PCR stutter artifact from microsatellite markers. These methods overcome the manual interpretation bottleneck and thereby enable full automation of genetic map construction and use. New functionalities, including the pooling of DNAs and the pooling of markers, are described that may greatly reduce the associated experimentation requirements. 32 refs., 5 figs., 3 tabs.

  16. Factor Analysis, AMMI Stability Value (ASV Parameter and GGE Bi-Plot Graphical Method of Quantitative and Qualitative Traits in Potato Genotypes

    Davood Hassanpanah

    2016-10-01

    Full Text Available Quantitative and qualitative traits and stability of marketable tuber yield of 14 promising potato clones, along with three commercial cultivars (Agria, Marfona and Savalan as checks, were evaluated at the Ardabil Agricultural and Natural Resources Research Station during 2013 and 2014. The experiment was based on a randomized complete block design with four replications. During growing period and after harvest, traits like main stem number per plant, plant height, tuber number and weight per plant, total and marketable tuber yield, dry matter percentage, baking type, hollow heart, tuber inner ring and discoloration of raw tuber flesh after 24 hours were measured. Combined ANONA for quantitative traits showed that there were significant differences among promising clones as to total and marketable tuber yield, tuber number and weight per plant, plant height, tuber mean weight, main stem number per plant and dry matter percentage and their interactions with year in total and marketable tuber yield. The clone 9 (397078-3 with the least amount of marketable tuber yield had significant difference with clones 4 (397045-13, 1 (397031-16, 3 (397031-11, 6 (397009-8 and 12 (397067-6 in 2013 and with clone 4 (397045-13 and Agria cultivar in 2014. The clones 4(397045-13, 1 (397031-16 and 12 (397067-6 had uniform tuber, yellow to dark-yellow skin and light-yellow to yellow flesh color, tuber shape of oval round and round, shallow to mid shallow eyes, no tuber inner ring, hollow heart and tuber inner crack and mid-late maturity. They were selected for home consumption of chips, french-fries and frying. Based on the results of factor analysis, "tuber yield", "number of tuber" and "plant structural and quality "were named as first, second and third quality determining factors respectively. In this experiment, GGE Bi-plot model and AMMI Stability Value (ASV parameter, were acceptable methods for the selection of marketable tuber yield stability which found to

  17. Genotyping of Hepatitis C virus isolated from hepatitis patients in Southeast of Iran by taqman realtime PCR

    Farivar, T.N.; Johari, P.

    2011-01-01

    Objectives: To check TaqMan Realtime PCR in detecting genotypes of hepatitis C virus in Iran. Methods: From July 2007 to April 2009, HCV genotyping was done on 52 patients who were referred to Research Centre for infectious Disease and Tropical Medicine, in Bou-Ali Hospital, Zahedan University of Medical Sciences. All these patients had proven hepatitis C infection. Results: Out of 52 anti HCV positive samples, 28(53.84%) had genotype 1, 2 cases (3.88 %) had genotype 2 , 12 (23.08 %) had genotype 3 and 7 (13.4 %) had genotype 4 . Mixed infection with genotypes 1 and 3 was seen in 3 cases (5.77 %). Conclusion: TaqMan probes for detecting genotyping of HCV were successful in picking genotyping of HCV infection especially those with mixed genotypes. (author)

  18. Distribution of Hepatitis C Virus Genotypes in the South Marmara Region

    Harun Agca

    2014-03-01

    Full Text Available Aim: Hepatitis C virus (HCV is an important caustive agent of hepatitis, cirrhosis and hepatocellular carcinoma both in our country and the world. Prognosis and response to treatment is related with the genotype of HCV which has six genotypes and over a hundred quasispecies. Knowing the HCV genotype is also important for epidemiological data. In this study we aimed to investigate the HCV genotypes of samples sent to Uludag University Hospital Microbiology Laboratory which is the reference centre in the South Marmara Region. Material and Method: This study was done retrospectively to analyse the HCV patients%u2019 sera sent to our laboratory between July 2010and December 2012 for HCV genotyping. Artus HCV QS-RGQ PCR kit (Qiagene,Hilden, Germany was used in Rotor-Gene Q (Qiagene, Hilden Germany for detection of HCV RNA. HCV RNA positive samples of patients%u2019 sera were were used for genotyping by the Linear Array HCV genotyping test (Roche, NJ, USA.Results: 214 (92.6 % of total 231 patients included in the study were genotype 1, one (0.4 % was genotype 2, nine (3.9 % were genotype 3 and, seven (3.4 % were found genotype 4. Three of genotype 3 patients were of foreign nationality, two were born abroad and one of the genotype 4 patients were born abroad. Discussion: Concordant with our country data the most frequent genotype was 1, genotype 2 was seen in patients especially related with foreign countries and genotype 4 was seen rare. The importance of genotype 1, which is seen more frequent in our country and region is; resistance to antiviral treatment and prolonged treatment duration in chronic hepatitis C patients.

  19. Novel approach for CES1 genotyping

    Bjerre, Ditte; Berg Rasmussen, Henrik

    2018-01-01

    AIM: Development of a specific procedure for genotyping of CES1A1 (CES1) and CES1A2, a hybrid of CES1A1 and the pseudogene CES1P1. MATERIALS & METHODS: The number of CES1A1 and CES1A2 copies and that of CES1P1 were determined using real-time PCR. Long range PCRs followed by secondary PCRs allowed...

  20. Echinococcus granulosus genotypes in Iran

    Sharafi, Seyedeh Maryam; Rostami-Nejad, Mohammad; Moazeni, Mohammad; Yousefi, Morteza; Saneie, Behnam; Hosseini-Safa, Ahmad

    2014-01-01

    Hydatidosis, caused by Echinococcus granulosus is one of the most important zoonotic diseases, throughout most parts of the world. Hydatidosis is endemic in Iran and responsible for approximately 1% of admission to surgical wards. There are extensive genetic variations within E. granulosus and 10 different genotypes (G1–G10) within this parasite have been reported. Identification of strains is important for improvement of control and prevention of the disease. No new review article presented the situation of Echinococcus granulosus genotypes in Iran in the recent years; therefore in this paper we reviewed the different studies regarding Echinococcus granulosus genotypes in Iran. PMID:24834298

  1. Rotavirus genotype shifts among Swedish children and adults-Application of a real-time PCR genotyping.

    Andersson, Maria; Lindh, Magnus

    2017-11-01

    It is well known that human rotavirus group A is the most important cause of severe diarrhoea in infants and young children. Less is known about rotavirus infections in other age groups, and about how rotavirus genotypes change over time in different age groups. Develop a real-time PCR to easily genotype rotavirus strains in order to monitor the pattern of circulating genotypes. In this study, rotavirus strains in clinical samples from children and adults in Western Sweden during 2010-2014 were retrospectively genotyped by using specific amplification of VP 4 and VP 7 genes with a new developed real-rime PCR. A genotype was identified in 97% of 775 rotavirus strains. G1P[8] was the most common genotype representing 34.9%, followed by G2P[4] (28.3%), G9P[8] (11.5%), G3P[8] (8.1%), and G4P[8] (7.9%) The genotype distribution changed over time, from predominance of G1P[8] in 2010-2012 to predominance of G2P[4] in 2013-2014. There were also age-related differences, with G1P[8] being the most common genotype in children under 2 years (47.6%), and G2P[4] the most common in those over 70 years of age (46.1%.). The shift to G2P[4] in 2013-2014 was associated with a change in the age distribution, with a greater number of rotavirus positive cases in elderly than in children. By using a new real-time PCR method for genotyping we found that genotype distribution was age related and changed over time with a decreasing proportion of G1P[8]. Copyright © 2017. Published by Elsevier B.V.

  2. Direct maximum parsimony phylogeny reconstruction from genotype data.

    Sridhar, Srinath; Lam, Fumei; Blelloch, Guy E; Ravi, R; Schwartz, Russell

    2007-12-05

    Maximum parsimony phylogenetic tree reconstruction from genetic variation data is a fundamental problem in computational genetics with many practical applications in population genetics, whole genome analysis, and the search for genetic predictors of disease. Efficient methods are available for reconstruction of maximum parsimony trees from haplotype data, but such data are difficult to determine directly for autosomal DNA. Data more commonly is available in the form of genotypes, which consist of conflated combinations of pairs of haplotypes from homologous chromosomes. Currently, there are no general algorithms for the direct reconstruction of maximum parsimony phylogenies from genotype data. Hence phylogenetic applications for autosomal data must therefore rely on other methods for first computationally inferring haplotypes from genotypes. In this work, we develop the first practical method for computing maximum parsimony phylogenies directly from genotype data. We show that the standard practice of first inferring haplotypes from genotypes and then reconstructing a phylogeny on the haplotypes often substantially overestimates phylogeny size. As an immediate application, our method can be used to determine the minimum number of mutations required to explain a given set of observed genotypes. Phylogeny reconstruction directly from unphased data is computationally feasible for moderate-sized problem instances and can lead to substantially more accurate tree size inferences than the standard practice of treating phasing and phylogeny construction as two separate analysis stages. The difference between the approaches is particularly important for downstream applications that require a lower-bound on the number of mutations that the genetic region has undergone.

  3. Porphyromonas gingivalis Fim-A genotype distribution among Colombians

    Jaramillo, Adriana; Parra, Beatriz; Botero, Javier Enrique; Contreras, Adolfo

    2015-01-01

    Introduction: Porphyromonas gingivalis is associated with periodontitis and exhibit a wide array of virulence factors, including fimbriae which is encoded by the FimA gene representing six known genotypes. Objetive: To identify FimA genotypes of P. gingivalis in subjects from Cali-Colombia, including the co-infection with Aggregatibacter actinomycetemcomitans, Treponema denticola, and Tannerella forsythia. Methods: Subgingival samples were collected from 151 people exhibiting diverse periodontal condition. The occurrence of P. gingivalis, FimA genotypes and other bacteria was determined by PCR. Results: P. gingivalis was positive in 85 patients. Genotype FimA II was more prevalent without reach significant differences among study groups (54.3%), FimA IV was also prevalent in gingivitis (13.0%). A high correlation (p= 0.000) was found among P. gingivalis, T. denticola, and T. forsythia co-infection. The FimA II genotype correlated with concomitant detection of T. denticola and T. forsythia. Conclusions: Porphyromonas gingivalis was high even in the healthy group at the study population. A trend toward a greater frequency of FimA II genotype in patients with moderate and severe periodontitis was determined. The FimA II genotype was also associated with increased pocket depth, greater loss of attachment level, and patients co-infected with T. denticola and T. forsythia. PMID:26600627

  4. Genetic similarity of soybean genotypes revealed by seed protein

    Nikolić Ana

    2005-01-01

    Full Text Available More accurate and complete descriptions of genotypes could help determinate future breeding strategies and facilitate introgression of new genotypes in current soybean genetic pool. The objective of this study was to characterize 20 soybean genotypes from the Maize Research Institute "Zemun Polje" collection, which have good agronomic performances, high yield, lodging and drought resistance, and low shuttering by seed proteins as biochemical markers. Seed proteins were isolated and separated by PAA electrophoresis. On the basis of the presence/absence of protein fractions coefficients of similarity were calculated as Dice and Roger and Tanamoto coefficient between pairs of genotypes. The similarity matrix was submitted for hierarchical cluster analysis of un weighted pair group using arithmetic average (UPGMA method and necessary computation were performed using NTSYS-pc program. Protein seed analysis confirmed low level of genetic diversity in soybean. The highest genetic similarity was between genotypes P9272 and Kador. According to obtained results, soybean genotypes were assigned in two larger groups and coefficients of similarity showed similar results. Because of the lack of pedigree data for analyzed genotypes, correspondence with marker data could not be determined. In plant with a narrow genetic base in their gene pool, such as soybean, protein markers may not be sufficient for characterization and study of genetic diversity.

  5. Principle Component Analysis with Incomplete Data: A simulation of R pcaMethods package in Constructing an Environmental Quality Index with Missing Data

    Missing data is a common problem in the application of statistical techniques. In principal component analysis (PCA), a technique for dimensionality reduction, incomplete data points are either discarded or imputed using interpolation methods. Such approaches are less valid when ...

  6. Analysis of the progression of systolic blood pressure using imputation of missing phenotype values

    Vaitsiakhovich, Tatsiana; Drichel, Dmitriy; Angisch, Marina; Becker, Tim; Herold, Christine; Lacour, André

    2014-01-01

    We present a genome-wide association study of a quantitative trait, "progression of systolic blood pressure in time," in which 142 unrelated individuals of the Genetic Analysis Workshop 18 real genotype data were analyzed. Information on systolic blood pressure and other phenotypic covariates was missing at certain time points for a considerable part of the sample. We observed that the dropout process causing missingness is not independent of the initial systolic blood pressure; that is, the ...

  7. A Note on the Effect of Data Clustering on the Multiple-Imputation Variance Estimator: A Theoretical Addendum to the Lewis et al. article in JOS 2014

    He Yulei

    2016-03-01

    Full Text Available Multiple imputation is a popular approach to handling missing data. Although it was originally motivated by survey nonresponse problems, it has been readily applied to other data settings. However, its general behavior still remains unclear when applied to survey data with complex sample designs, including clustering. Recently, Lewis et al. (2014 compared single- and multiple-imputation analyses for certain incomplete variables in the 2008 National Ambulatory Medicare Care Survey, which has a nationally representative, multistage, and clustered sampling design. Their study results suggested that the increase of the variance estimate due to multiple imputation compared with single imputation largely disappears for estimates with large design effects. We complement their empirical research by providing some theoretical reasoning. We consider data sampled from an equally weighted, single-stage cluster design and characterize the process using a balanced, one-way normal random-effects model. Assuming that the missingness is completely at random, we derive analytic expressions for the within- and between-multiple-imputation variance estimators for the mean estimator, and thus conveniently reveal the impact of design effects on these variance estimators. We propose approximations for the fraction of missing information in clustered samples, extending previous results for simple random samples. We discuss some generalizations of this research and its practical implications for data release by statistical agencies.

  8. Comparative evaluation of GenoType MTBDRplus line probe assay with solid culture method in early diagnosis of multidrug resistant tuberculosis (MDR-TB at a tertiary care centre in India.

    Raj N Yadav

    Full Text Available The objectives of the study were to compare the performance of line probe assay (GenoType MTBDRplus with solid culture method for an early diagnosis of multidrug resistant tuberculosis (MDR-TB, and to study the mutation patterns associated with rpoB, katG and inhA genes at a tertiary care centre in north India.In this cross-sectional study, 269 previously treated sputum-smear acid-fast bacilli (AFB positive MDR-TB suspects were enrolled from January to September 2012 at the All India Institute of Medical Sciences hospital, New Delhi. Line probe assay (LPA was performed directly on the sputum specimens and the results were compared with that of conventional drug susceptibility testing (DST on solid media [Lowenstein Jensen (LJ method].DST results by LPA and LJ methods were compared in 242 MDR-TB suspects. The LPA detected rifampicin (RIF resistance in 70 of 71 cases, isoniazid (INH resistance in 86 of 93 cases, and MDR-TB in 66 of 68 cases as compared to the conventional method. Overall (rifampicin, isoniazid and MDR-TB concordance of the LPA with the conventional DST was 96%. Sensitivity and specificity were 98% and 99% respectively for detection of RIF resistance; 92% and 99% respectively for detection of INH resistance; 97% and 100% respectively for detection of MDR-TB. Frequencies of katG gene, inhA gene and combined katG and inhA gene mutations conferring all INH resistance were 72/87 (83%, 10/87 (11% and 5/87 (6% respectively. The turnaround time of the LPA test was 48 hours.The LPA test provides an early diagnosis of monoresistance to isoniazid and rifampicin and is highly sensitive and specific for an early diagnosis of MDR-TB. Based on these findings, it is concluded that the LPA test can be useful in early diagnosis of drug resistant TB in high TB burden countries.

  9. SPATIAL ANALYSIS TO SUPPORT GEOGRAPHIC TARGETING OF GENOTYPES TO ENVIRONMENTS

    Glenn eHyman

    2013-03-01

    Full Text Available Crop improvement efforts have benefited greatly from advances in available data, computing technology and methods for targeting genotypes to environments. These advances support the analysis of genotype by environment interactions to understand how well a genotype adapts to environmental conditions. This paper reviews the use of spatial analysis to support crop improvement research aimed at matching genotypes to their most appropriate environmental niches. Better data sets are now available on soils, weather and climate, elevation, vegetation, crop distribution and local conditions where genotypes are tested in experimental trial sites. The improved data are now combined with spatial analysis methods to compare environmental conditions across sites, create agro-ecological region maps and assess environment change. Climate, elevation and vegetation data sets are now widely available, supporting analyses that were much more difficult even five or ten years ago. While detailed soil data for many parts of the world remains difficult to acquire for crop improvement studies, new advances in digital soil mapping are likely to improve our capacity. Site analysis and matching and regional targeting methods have advanced in parallel to data and technology improvements. All these developments have increased our capacity to link genotype to phenotype and point to a vast potential to improve crop adaptation efforts.

  10. A novel rapid genotyping technique for Collie eye anomaly: SYBR Green-based real-time polymerase chain reaction method applicable to blood and saliva specimens on Flinders Technology Associates filter paper.

    Chang, Hye-Sook; Mizukami, Keijiro; Yabuki, Akira; Hossain, Mohammad A; Rahman, Mohammad M; Uddin, Mohammad M; Arai, Toshiro; Yamato, Osamu

    2010-09-01

    Collie eye anomaly (CEA) is a canine inherited ocular disease that shows a wide variety of manifestations and severity of clinical lesions. Recently, a CEA-associated mutation was reported, and a DNA test that uses conventional polymerase chain reaction (PCR) has now become available. The objective of the current study was to develop a novel rapid genotyping technique by using SYBR Green-based real-time PCR for future large-scale surveys as a key part in the strategy to eradicate CEA by selective breeding. First, a SYBR Green-based real-time PCR assay for genotyping of CEA was developed and evaluated by using purified DNA samples from normal, carrier, and affected Border Collies in which genotypes had previously been determined by conventional PCR. This real-time PCR assay demonstrated appropriate amplifications in all genotypes, and the results were consistent with those of conventional PCR. Second, the availability of Flinders Technology Associates filter paper (FTA card) as DNA templates for the real-time PCR assay was evaluated by using blood and saliva specimens to determine suitability for CEA screening. DNA-containing solution prepared from a disc of blood- or saliva-spotted FTA cards was available directly as templates for the real-time PCR assay when the volume of solution was 2.5% of the PCR mixture. In conclusion, SYBR Green-based real-time PCR combined with FTA cards is a rapid genotyping technique for CEA that can markedly shorten the overall time required for genotyping as well as simplify the sample preparation. Therefore, this newly developed technique suits large-scale screening in breeding populations of Collie-related breeds.

  11. Transforming microbial genotyping: a robotic pipeline for genotyping bacterial strains.

    Brian O'Farrell

    Full Text Available Microbial genotyping increasingly deals with large numbers of samples, and data are commonly evaluated by unstructured approaches, such as spread-sheets. The efficiency, reliability and throughput of genotyping would benefit from the automation of manual manipulations within the context of sophisticated data storage. We developed a medium- throughput genotyping pipeline for MultiLocus Sequence Typing (MLST of bacterial pathogens. This pipeline was implemented through a combination of four automated liquid handling systems, a Laboratory Information Management System (LIMS consisting of a variety of dedicated commercial operating systems and programs, including a Sample Management System, plus numerous Python scripts. All tubes and microwell racks were bar-coded and their locations and status were recorded in the LIMS. We also created a hierarchical set of items that could be used to represent bacterial species, their products and experiments. The LIMS allowed reliable, semi-automated, traceable bacterial genotyping from initial single colony isolation and sub-cultivation through DNA extraction and normalization to PCRs, sequencing and MLST sequence trace evaluation. We also describe robotic sequencing to facilitate cherrypicking of sequence dropouts. This pipeline is user-friendly, with a throughput of 96 strains within 10 working days at a total cost of 200,000 items were processed by two to three people. Our sophisticated automated pipeline can be implemented by a small microbiology group without extensive external support, and provides a general framework for semi-automated bacterial genotyping of large numbers of samples at low cost.

  12. An imputation/copula-based stochastic individual tree growth model for mixed species Acadian forests: a case study using the Nova Scotia permanent sample plot network

    John A. KershawJr

    2017-09-01

    Full Text Available Background A novel approach to modelling individual tree growth dynamics is proposed. The approach combines multiple imputation and copula sampling to produce a stochastic individual tree growth and yield projection system. Methods The Nova Scotia, Canada permanent sample plot network is used as a case study to develop and test the modelling approach. Predictions from this model are compared to predictions from the Acadian variant of the Forest Vegetation Simulator, a widely used statistical individual tree growth and yield model. Results Diameter and height growth rates were predicted with error rates consistent with those produced using statistical models. Mortality and ingrowth error rates were higher than those observed for diameter and height, but also were within the bounds produced by traditional approaches for predicting these rates. Ingrowth species composition was very poorly predicted. The model was capable of reproducing a wide range of stand dynamic trajectories and in some cases reproduced trajectories that the statistical model was incapable of reproducing. Conclusions The model has potential to be used as a benchmarking tool for evaluating statistical and process models and may provide a mechanism to separate signal from noise and improve our ability to analyze and learn from large regional datasets that often have underlying flaws in sample design.

  13. [Modern methods application of genotyping of infectious diseases pathogens in the context of operational work of specialized anti-epidemic team during the XXII Olympic Winter Games and XI Paralympic Winter Games].

    Kuzkin, B P; Kulichenko, A N; Volynkina, A S; Efremenko, D V; Kuznetsova, I V; Kotenev, E S; Lyamkin, G I; Kartsev, N N; Klindukhov, V P

    2015-01-01

    This paper considers the experience of genotyping and sequencing technologies in laboratories of specialized anti-epidemic team (SAET) during the XXII Olympic Winter Games and XI Paralympic Winter Games of 2014 in Sochi. The work carried out during the pre-Olympic period on performance of readiness by SAET for these studies is analyzed. The results of genotyping strains of pathogens during the Olympic Games are presented. A conclusion about the effectiveness of the use of molecular genetic techniques in terms of SAET is made.

  14. Screening cotton genotypes for seedling drought tolerance

    Penna Julio C. Viglioni

    1998-01-01

    Full Text Available The objectives of this study were to adapt a screening method previously used to assess seedling drought tolerance in cereals for use in cotton (Gossypium hirsutum L. and to identify tolerant accessions among a wide range of genotypes. Ninety genotypes were screened in seven growth chamber experiments. Fifteen-day-old seedlings were subjected to four 4-day drought cycles, and plant survival was evaluated after each cycle. Three cycles are probably the minimum required in cotton work. Significant differences (at the 0.05 level or lower among entries were obtained in four of the seven experiments. A "confirmation test" with entries previously evaluated as "tolerant" (high survival and "susceptible" (low survival was run. A number of entries duplicated their earlier performance, but others did not, which indicates the need to reevaluate selections. Germplasms considered tolerant included: `IAC-13-1', `IAC-RM4-SM5', `Minas Sertaneja', `Acala 1517E-1' and `4521'. In general, the technique is simple, though time-consuming, with practical value for screening a large number of genotypes. Results from the screening tests generally agreed with field information. The screening procedure is suitable to select tolerant accessions from among a large number of entries in germplasm collections as a preliminary step in breeding for drought tolerance. This research also demonstrated the need to characterize the internal lack of uniformity in growth chambers to allow for adequate designs of experiments.

  15. Sofosbuvir based treatment of chronic hepatitis C genotype 3 infections

    Dalgard, Olav; Weiland, Ola; Noraberg, Geir

    2017-01-01

    BACKGROUND AND AIMS: Chronic hepatitis C virus (HCV) genotype 3 infection with advanced liver disease has emerged as the most challenging to treat. We retrospectively assessed the treatment outcome of sofosbuvir (SOF) based regimes for treatment of HCV genotype 3 infections in a real life setting...... in Scandinavia. METHODS: Consecutive patients with chronic HCV genotype 3 infection were enrolled at 16 treatment centers in Denmark, Sweden, Norway and Finland. Patients who had received a SOF containing regimen were included. The fibrosis stage was evaluated by liver biopsy or transient liver elastography...... was similar for all treatment regimens, but lower in men (p = 0.042), and in patients with decompensated liver disease (p = 0.004). CONCLUSION: We found that sofosbuvir based treatment in a real-life setting could offer SVR rates exceeding 90% in patients with HCV genotype 3 infection and advanced liver...

  16. Giardia and Cryptosporidium species and genotypes in coyotes (Canis latrans).

    Trout, James M; Santín, Mónica; Fayer, Ronald

    2006-06-01

    Feces and duodenal scrapings were collected from 22 coyotes (Canis latrans) killed in managed hunts in northeastern Pennsylvania. Polymerase chain reaction (PCR) methods were used to detect Giardia and Cryptosporidium spp. PCR-amplified fragments of Giardia and Cryptosporidium spp. SSU-rRNA genes were subjected to DNA sequence analysis for species/genotype determination. Seven coyotes (32%) were positive for G. duodenalis: three assemblage C, three assemblage D, and one assemblage B. Six coyotes (27%) were positive for Cryptosporidium spp. One isolate shared 99.7% homology with C. muris, whereas five others (23%) shared 100% homology with C. canis, coyote genotype. This is the first report on multiple genotypes of Giardia spp. in coyotes and on the prevalence of Cryptosporidium spp. genotypes in coyotes.

  17. Methods of Mitigating Double Taxation

    Lindhe, Tobias

    2002-01-01

    This paper presents a comprehensive overview of existing methods of mitigating double taxation of corporate income within a standard cost of capital model. Two of the most well-known and most utilized methods, the imputation and the split rate systems, do not mitigate double taxation in corporations where the marginal investment is financed with retained earnings. However, all methods are effective when the marginal investment is financed with new share issues. The corporate tax rate, fiscal ...

  18. Genotyping panel for assessing response to cancer chemotherapy

    Hampel Heather

    2008-06-01

    Full Text Available Abstract Background Variants in numerous genes are thought to affect the success or failure of cancer chemotherapy. Interindividual variability can result from genes involved in drug metabolism and transport, drug targets (receptors, enzymes, etc, and proteins relevant to cell survival (e.g., cell cycle, DNA repair, and apoptosis. The purpose of the current study is to establish a flexible, cost-effective, high-throughput genotyping platform for candidate genes involved in chemoresistance and -sensitivity, and treatment outcomes. Methods We have adopted SNPlex for genotyping 432 single nucleotide polymorphisms (SNPs in 160 candidate genes implicated in response to anticancer chemotherapy. Results The genotyping panels were applied to 39 patients with chronic lymphocytic leukemia undergoing flavopiridol chemotherapy, and 90 patients with colorectal cancer. 408 SNPs (94% produced successful genotyping results. Additional genotyping methods were established for polymorphisms undetectable by SNPlex, including multiplexed SNaPshot for CYP2D6 SNPs, and PCR amplification with fluorescently labeled primers for the UGT1A1 promoter (TAnTAA repeat polymorphism. Conclusion This genotyping panel is useful for supporting clinical anticancer drug trials to identify polymorphisms that contribute to interindividual variability in drug response. Availability of population genetic data across multiple studies has the potential to yield genetic biomarkers for optimizing anticancer therapy.

  19. An epidemiologic survey of methicillin-resistant Staphylococcus aureus by combined use of mec-HVR genotyping and toxin genotyping in a university hospital in Japan.

    Nishi, Junichiro; Yoshinaga, Masao; Miyanohara, Hiroaki; Kawahara, Motoshi; Kawabata, Masaharu; Motoya, Toshiro; Owaki, Tetsuhiro; Oiso, Shigeru; Kawakami, Masayuki; Kamewari, Shigeko; Koyama, Yumiko; Wakimoto, Naoko; Tokuda, Koichi; Manago, Kunihiro; Maruyama, Ikuro

    2002-09-01

    To evaluate the usefulness of an assay using two polymerase chain reaction-based genotyping methods in the practical surveillance of methicillin-resistant Staphylococcus aureus (MRSA). Nosocomial infection and colonization were surveyed monthly in a university hospital in Japan for 20 months. Genotyping with mec-HVR is based on the size of the mec-associated hypervariable region amplified by polymerase chain reaction. Toxin genotyping uses a multiplex polymerase chain reaction method to amplify eight staphylococcal toxin genes. Eight hundred nine MRSA isolates were classified into 49 genotypes. We observed differing prevalences of genotypes for different hospital wards, and could rapidly demonstrate the similarity of genotype for outbreak isolates. The incidence of genotype D: SEC/TSST1 was significantly higher in isolates causing nosocomial infections (49.5%; 48 of 97) than in nasal isolates (31.4%; 54 of 172) (P = .004), suggesting that this genotype may represent the nosocomial strains. The combined use of these two genotyping methods resulted in improved discriminatory ability and should be further investigated.

  20. Association of HLA Genotype and Fulminant Type 1 Diabetes in Koreans

    Soo Heon Kwak

    2015-12-01

    Full Text Available Fulminant type 1 diabetes (T1DM is a distinct subtype of T1DM that is characterized by rapid onset hyperglycemia, ketoacidosis, absolute insulin deficiency, and near normal levels of glycated hemoglobin at initial presentation. Although it has been reported that class II human leukocyte antigen (HLA genotype is associated with fulminant T1DM, the genetic predisposition is not fully understood. In this study we investigated the HLA genotype and haplotype in 11 Korean cases of fulminant T1DM using imputation of whole exome sequencing data and compared its frequencies with 413 participants of the Korean Reference Panel. The HLA-DRB1*04:05–HLA-DQB1*04:01 haplotype was significantly associated with increased risk of fulminant T1DM in Fisher's exact test (odds ratio [OR], 4.11; 95% confidence interval [CI], 1.56 to 10.86; p = 0.009. A histidine residue at HLA-DRβ1 position 13 was marginally associated with increased risk of fulminant T1DM (OR, 2.45; 95% CI ,1.01 to 5.94; p = 0.054. Although we had limited statistical power, we provide evidence that HLA haplotype and amino acid change can be a genetic risk factor of fulminant T1DM in Koreans. Further large-scale research is required to confirm these findings.

  1. Discrepancy between Hepatitis C Virus Genotypes and NS4-Based Serotypes: Association with Their Subgenomic Sequences

    Nan Nwe Win

    2017-01-01

    Full Text Available Determination of hepatitis C virus (HCV genotypes plays an important role in the direct-acting agent era. Discrepancies between HCV genotyping and serotyping assays are occasionally observed. Eighteen samples with discrepant results between genotyping and serotyping methods were analyzed. HCV serotyping and genotyping were based on the HCV nonstructural 4 (NS4 region and 5′-untranslated region (5′-UTR, respectively. HCV core and NS4 regions were chosen to be sequenced and were compared with the genotyping and serotyping results. Deep sequencing was also performed for the corresponding HCV NS4 regions. Seventeen out of 18 discrepant samples could be sequenced by the Sanger method. Both HCV core and NS4 sequences were concordant with that of genotyping in the 5′-UTR in all 17 samples. In cloning analysis of the HCV NS4 region, there were several amino acid variations, but each sequence was much closer to the peptide with the same genotype. Deep sequencing revealed that minor clones with different subgenotypes existed in two of the 17 samples. Genotyping by genome amplification showed high consistency, while several false reactions were detected by serotyping. The deep sequencing method also provides accurate genotyping results and may be useful for analyzing discrepant cases. HCV genotyping should be correctly determined before antiviral treatment.

  2. Haplotype-Based Genotyping in Polyploids

    Josh P. Clevenger

    2018-04-01

    Full Text Available Accurate identification of polymorphisms from sequence data is crucial to unlocking the potential of high throughput sequencing for genomics. Single nucleotide polymorphisms (SNPs are difficult to accurately identify in polyploid crops due to the duplicative nature of polyploid genomes leading to low confidence in the true alignment of short reads. Implementing a haplotype-based method in contrasting subgenome-specific sequences leads to higher accuracy of SNP identification in polyploids. To test this method, a large-scale 48K SNP array (Axiom Arachis2 was developed for Arachis hypogaea (peanut, an allotetraploid, in which 1,674 haplotype-based SNPs were included. Results of the array show that 74% of the haplotype-based SNP markers could be validated, which is considerably higher than previous methods used for peanut. The haplotype method has been implemented in a standalone program, HAPLOSWEEP, which takes as input bam files and a vcf file and identifies haplotype-based markers. Haplotype discovery can be made within single reads or span paired reads, and can leverage long read technology by targeting any length of haplotype. Haplotype-based genotyping is applicable in all allopolyploid genomes and provides confidence in marker identification and in silico-based genotyping for polyploid genomics.

  3. Genotyping Sleep Disorders Patients

    Kripke, Daniel F.; Shadan, Farhad F.; Dawson, Arthur; Cronin, John W.; Jamil, Shazia M.; Grizas, Alexandra P.; Koziol, James A.; Kline, Lawrence E.

    2010-01-01

    Objective The genetic susceptibility factors underlying sleep disorders might help us predict prognoses and responses to treatment. Several candidate polymorphisms for sleep disorders have been proposed, but there has as yet inadequate replication or validation that the candidates may be useful in the clinical setting. Methods To assess the validity of several candidate associations, we obtained saliva deoxyribonucleic acid (DNA) samples and clinical information from 360 consenting research p...

  4. Evaluation of allelopathic potential of safflower genotypes (Carthamus tinctorius L.

    Motamedi Marzieh

    2016-12-01

    Full Text Available Forty safflower genotypes were grown under normal irrigation and drought stress. In the first experiment, the allelopathic potential of shoot residues was evaluated using the sandwich method. Each genotype residue (0.4 g was placed in a sterile Petri dish and two layers of agar were poured on that. Radish seeds were placed on agar medium. The radish seeds were cultivated without safflower residues as the controls. The length of the radicle, hypocotyl, and fresh biomass weight and seed germination percentages were measured. A pot experiment was also done on two genotypes with the highest and two with the lowest allelopathic activity selected after screening genotypes in the first experiment. Before entering the reproductive phase, irrigation treatments (normal irrigation and drought stress were applied. Shoots were harvested, dried, milled and mixed with the topsoil of new pots and then radish seeds were sown. The pots with safflower genotypes were used to evaluate the effect of root residue allelopathy. The shoot length, fresh biomass weight, and germination percentage were measured. Different safflower genotypes showed varied allelopathic potential. The results of the first experiment showed that Egypt and Iran-Khorasan genotypes caused maximum inhibitory responses and Australia and Iran-Kerman genotypes resulted in minimum inhibitory responses on radish seedling growth. Fresh biomass weight had the most sensitivity to safflower residues. The results of the pot experiment were consistent with the results of in vitro experiments. Residues produced under drought stress had more inhibitory effects on the measured traits. Safflower root residue may have a higher level of allelochemicals or different allelochemicals than shoot residue.

  5. FTO genotype and weight loss

    Livingstone, Katherine M; Celis-Morales, Carlos; Papandonatos, George D

    2016-01-01

    : Ovid Medline, Scopus, Embase, and Cochrane from inception to November 2015. ELIGIBILITY CRITERIA FOR STUDY SELECTION: Randomised controlled trials in overweight or obese adults reporting reduction in body mass index, body weight, or waist circumference by FTO genotype (rs9939609 or a proxy) after...

  6. FTO genotype and weight loss

    Livingstone, Katherine M; Celis-Morales, Carlos; Papandonatos, George D

    2016-01-01

    OBJECTIVE: To assess the effect of the FTO genotype on weight loss after dietary, physical activity, or drug based interventions in randomised controlled trials. DESIGN: Systematic review and random effects meta-analysis of individual participant data from randomised controlled trials. DATA SOURC...

  7. Histomorphological changes in hepatitis C non-responders with respect to viral genotypes

    Adnan, U.; Mirza, T.; Naz, E.; Aziz, S.

    2013-01-01

    Objective: To evaluate the distinct histopathological changes of chronic hepatitis C (CHC) non-responders in association with viral genotypes. Methods: This cross-sectional study was conducted at the histopathology section of the Dow Diagnostic Research and Reference Laboratory, Dow University of Health Sciences in collaboration with Sarwar Zuberi Liver Centre, Civil Hospital, Karachi from September 2009 to August 2011. Seventy-five non-responders (end-treatment-response [ETR] positive patients) from a consecutive series of viral-RNA positive CHC patients with known genotypes were selected. Their genotypes and pertinent clinical history was recorded. They were subjected to liver biopsies which were assessed for grade, stage, steatosis, stainable iron and characteristic histological lesions. Results: Majority of the patients (63, 84%) had genotype 3 while 12(16%) cases had genotype 1. The genotype 1 patients had significantly higher scores of inflammation (p<0.03) and fibrosis (p<0.04) as compared to genotype 3. Steatosis was significantly present in all genotype 3 patients in higher scores (p<0.001) compared to genotype 1. Stainable iron scores were generally low in the patients in this study, however, it was more commonly seen in genotype 3. The distribution of characteristic histological lesions was noteworthy in both the groups, irrespective of genotype. Conclusion: In this series, the predominant genotype was 3. However, genotype 1 patients were more prone to the aggressive nature of the disease with significantly higher scores of inflammation and fibrosis. Steatosis was characteristically observed in genotype 3 group. Stainable iron could not be attributed as a cause of non-response. (author)

  8. Antioxidant capacity of anthocyanins from acerola genotypes

    Vera Lúcia Arroxelas Galvão De Lima

    2011-03-01

    Full Text Available Anthocyanins from 12 acerola genotypes cultivated at the Active Germplasm Bank at Federal Rural University of Pernambuco were isolated for antioxidant potential evaluation. The antioxidant activity and radical scavenging capacity of the anthocyanin isolates were measured according to the β-carotene bleaching method and 1,1-diphenyl-2-picrylhydrazyl (DPPH free radical scavenging assay, respectively. The antioxidant activity varied from 25.58 to 47.04% at 0.2 mg.mL-1, and it was measured using the β-carotene bleaching method. The free radical scavenging capacity increased according to the increase in concentration and reaction time by the DPPH assay. At 16.7 μg.mL-1 concentration and after 5 minutes and 2 hours reaction time, the percentage of scavenged radicals varied from 36.97 to 63.92% and 73.27 to 94.54%, respectively. Therefore, the antioxidant capacity of acerola anthocyanins varied amongst acerola genotypes and methods used. The anthocyanins present in this fruit may supply substantial dietary source of antioxidant which may promote health and produce disease prevention effects.

  9. Education and health and well-being: direct and indirect effects with multiple mediators and interactions with multiple imputed data in Stata.

    Sheikh, Mashhood Ahmed; Abelsen, Birgit; Olsen, Jan Abel

    2017-11-01

    Previous methods for assessing mediation assume no multiplicative interactions. The inverse odds weighting (IOW) approach has been presented as a method that can be used even when interactions exist. The substantive aim of this study was to assess the indirect effect of education on health and well-being via four indicators of adult socioeconomic status (SES): income, management position, occupational hierarchy position and subjective social status. 8516 men and women from the Tromsø Study (Norway) were followed for 17 years. Education was measured at age 25-74 years, while SES and health and well-being were measured at age 42-91 years. Natural direct and indirect effects (NIE) were estimated using weighted Poisson regression models with IOW. Stata code is provided that makes it easy to assess mediation in any multiple imputed dataset with multiple mediators and interactions. Low education was associated with lower SES. Consequently, low SES was associated with being unhealthy and having a low level of well-being. The effect (NIE) of education on health and well-being is mediated by income, management position, occupational hierarchy position and subjective social status. This study contributes to the literature on mediation analysis, as well as the literature on the importance of education for health-related quality of life and subjective well-being. The influence of education on health and well-being had different pathways in this Norwegian sample. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  10. In vitro screening of potato genotypes for osmotic stress tolerance

    Gelmesa Dandena

    2017-02-01

    Full Text Available Potato (Solanum tuberosum L. is a cool season crop which is susceptible to both drought and heat stresses. Lack of suitable varieties of the crop adapted to drought-prone areas of the lowland tropics deprives farmers living in such areas the opportunity to produce and use the crop as a source of food and income. As a step towards developing such varieties, the present research was conducted to evaluate different potato genotypes for osmotic stress tolerance under in vitro conditions and identify drought tolerant genotypes for future field evaluation. The experiment was carried out at the Leibniz University of Hannover, Germany, by inducing osmotic stress using sorbitol at two concentrations (0.1 and 0.2 M in the culture medium. A total of 43 genotypes collected from different sources (27 advanced clones from CIP, nine improved varieties, and seven farmers’ cultivars were used in a completely randomized design with four replications in two rounds. Data were collected on root and shoot growth. The results revealed that the main effects of genotype, sorbitol treatment, and their interactions significantly (P < 0.01 influenced root and shoot growthrelated traits. Under osmotic stress, all the measured root and shoot growth traits were significantly correlated. The dendrogram obtained from the unweighted pair group method with arithmetic mean allowed grouping of the genotypes into tolerant, moderately tolerant, and susceptible ones to a sorbitol concentration of 0.2 M in the culture medium. Five advanced clones (CIP304350.100, CIP304405.47, CIP392745.7, CIP388676.1, and CIP388615.22 produced shoots and rooted earlier than all other genotypes, with higher root numbers, root length, shoot and root mass under osmotic stress conditions induced by sorbitol. Some of these genotypes had been previously identified as drought-tolerant under field conditions, suggesting the capacity of the in vitro evaluation method to predict drought stress tolerant

  11. On Matrix Sampling and Imputation of Context Questionnaires with Implications for the Generation of Plausible Values in Large-Scale Assessments

    Kaplan, David; Su, Dan

    2016-01-01

    This article presents findings on the consequences of matrix sampling of context questionnaires for the generation of plausible values in large-scale assessments. Three studies are conducted. Study 1 uses data from PISA 2012 to examine several different forms of missing data imputation within the chained equations framework: predictive mean…

  12. GRIMP: A web- and grid-based tool for high-speed analysis of large-scale genome-wide association using imputed data.

    K. Estrada Gil (Karol); A. Abuseiris (Anis); F.G. Grosveld (Frank); A.G. Uitterlinden (André); T.A. Knoch (Tobias); F. Rivadeneira Ramirez (Fernando)

    2009-01-01

    textabstractThe current fast growth of genome-wide association studies (GWAS) combined with now common computationally expensive imputation requires the online access of large user groups to high-performance computing resources capable of analyzing rapidly and efficiently millions of genetic

  13. Estimating Stand Height and Tree Density in Pinus taeda plantations using in-situ data, airborne LiDAR and k-Nearest Neighbor Imputation

    CARLOS ALBERTO SILVA

    Full Text Available ABSTRACT Accurate forest inventory is of great economic importance to optimize the entire supply chain management in pulp and paper companies. The aim of this study was to estimate stand dominate and mean heights (HD and HM and tree density (TD of Pinus taeda plantations located in South Brazil using in-situ measurements, airborne Light Detection and Ranging (LiDAR data and the non- k-nearest neighbor (k-NN imputation. Forest inventory attributes and LiDAR derived metrics were calculated at 53 regular sample plots and we used imputation models to retrieve the forest attributes at plot and landscape-levels. The best LiDAR-derived metrics to predict HD, HM and TD were H99TH, HSD, SKE and HMIN. The Imputation model using the selected metrics was more effective for retrieving height than tree density. The model coefficients of determination (adj.R2 and a root mean squared difference (RMSD for HD, HM and TD were 0.90, 0.94, 0.38m and 6.99, 5.70, 12.92%, respectively. Our results show that LiDAR and k-NN imputation can be used to predict stand heights with high accuracy in Pinus taeda. However, furthers studies need to be realized to improve the accuracy prediction of TD and to evaluate and compare the cost of acquisition and processing of LiDAR data against the conventional inventory procedures.

  14. Estimating Stand Height and Tree Density in Pinus taeda plantations using in-situ data, airborne LiDAR and k-Nearest Neighbor Imputation.

    Silva, Carlos Alberto; Klauberg, Carine; Hudak, Andrew T; Vierling, Lee A; Liesenberg, Veraldo; Bernett, Luiz G; Scheraiber, Clewerson F; Schoeninger, Emerson R

    2018-01-01

    Accurate forest inventory is of great economic importance to optimize the entire supply chain management in pulp and paper companies. The aim of this study was to estimate stand dominate and mean heights (HD and HM) and tree density (TD) of Pinus taeda plantations located in South Brazil using in-situ measurements, airborne Light Detection and Ranging (LiDAR) data and the non- k-nearest neighbor (k-NN) imputation. Forest inventory attributes and LiDAR derived metrics were calculated at 53 regular sample plots and we used imputation models to retrieve the forest attributes at plot and landscape-levels. The best LiDAR-derived metrics to predict HD, HM and TD were H99TH, HSD, SKE and HMIN. The Imputation model using the selected metrics was more effective for retrieving height than tree density. The model coefficients of determination (adj.R2) and a root mean squared difference (RMSD) for HD, HM and TD were 0.90, 0.94, 0.38m and 6.99, 5.70, 12.92%, respectively. Our results show that LiDAR and k-NN imputation can be used to predict stand heights with high accuracy in Pinus taeda. However, furthers studies need to be realized to improve the accuracy prediction of TD and to evaluate and compare the cost of acquisition and processing of LiDAR data against the conventional inventory procedures.

  15. A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data

    Y.J. Kim (Young Jin); J. Lee (Juyoung); B.-J. Kim (Bong-Jo); T. Park (Taesung); G.R. Abecasis (Gonçalo); M.A.A. De Almeida (Marcio); D. Altshuler (David); J.L. Asimit (Jennifer L.); G. Atzmon (Gil); M. Barber (Mathew); A. Barzilai (Ari); N.L. Beer (Nicola L.); G.I. Bell (Graeme I.); J. Below (Jennifer); T. Blackwell (Tom); J. Blangero (John); M. Boehnke (Michael); D.W. Bowden (Donald W.); N.P. Burtt (Noël); J.C. Chambers (John); H. Chen (Han); P. Chen (Ping); P.S. Chines (Peter); S. Choi (Sungkyoung); C. Churchhouse (Claire); P. Cingolani (Pablo); B.K. Cornes (Belinda); N.J. Cox (Nancy); A.G. Day-Williams (Aaron); A. Duggirala (Aparna); J. Dupuis (Josée); T. Dyer (Thomas); S. Feng (Shuang); J. Fernandez-Tajes (Juan); T. Ferreira (Teresa); T.E. Fingerlin (Tasha E.); J. Flannick (Jason); J.C. Florez (Jose); P. Fontanillas (Pierre); T.M. Frayling (Timothy); C. Fuchsberger (Christian); E. Gamazon (Eric); K. Gaulton (Kyle); S. Ghosh (Saurabh); B. Glaser (Benjamin); A.L. Gloyn (Anna); R.L. Grossman (Robert L.); J. Grundstad (Jason); C. Hanis (Craig); A. Heath (Allison); H. Highland (Heather); M. Horikoshi (Momoko); I.-S. Huh (Ik-Soo); J.R. Huyghe (Jeroen R.); M.K. Ikram (Kamran); K.A. Jablonski (Kathleen); Y. Jun (Yang); N. Kato (Norihiro); J. Kim (Jayoun); Y.J. Kim (Young Jin); B.-J. Kim (Bong-Jo); J. Lee (Juyoung); C.R. King (C. Ryan); J.S. Kooner (Jaspal S.); M.-S. Kwon (Min-Seok); H.K. Im (Hae Kyung); M. Laakso (Markku); K.K.-Y. Lam (Kevin Koi-Yau); J. Lee (Jaehoon); S. Lee (Selyeong); S. Lee (Sungyoung); D.M. Lehman (Donna M.); H. Li (Heng); C.M. Lindgren (Cecilia); X. Liu (Xuanyao); O.E. Livne (Oren E.); A.E. Locke (Adam E.); A. Mahajan (Anubha); J.B. Maller (Julian B.); A.K. Manning (Alisa K.); T.J. Maxwell (Taylor J.); A. Mazoure (Alexander); M.I. McCarthy (Mark); J.B. Meigs (James B.); B. Min (Byungju); K.L. Mohlke (Karen); A.P. Morris (Andrew); S. Musani (Solomon); Y. Nagai (Yoshihiko); M.C.Y. Ng (Maggie C.Y.); D. Nicolae (Dan); S. Oh (Sohee); N.D. Palmer (Nicholette); T. Park (Taesung); T.I. Pollin (Toni I.); I. Prokopenko (Inga); D. Reich (David); M.A. Rivas (Manuel); L.J. Scott (Laura); M. Seielstad (Mark); Y.S. Cho (Yoon Shin); X. Sim (Xueling); R. Sladek (Rob); P. Smith (Philip); I. Tachmazidou (Ioanna); E.S. Tai (Shyong); Y.Y. Teo (Yik Ying); T.M. Teslovich (Tanya M.); J. Torres (Jason); V. Trubetskoy (Vasily); S.M. Willems (Sara); A.L. Williams (Amy L.); J.G. Wilson (James); S. Wiltshire (Steven); S. Won (Sungho); A.R. Wood (Andrew); W. Xu (Wang); J. Yoon (Joon); M. Zawistowski (Matthew); E. Zeggini (Eleftheria); W. Zhang (Weihua); S. Zöllner (Sebastian)

    2015-01-01

    textabstractBackground: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the

  16. Epidemiological manifestations of hepatitis C virus genotypes and its association with potential risk factors among Libyan patients

    Daw Mohamed A

    2010-11-01

    Full Text Available Abstract Background The information on hepatitis C virus genotypes and subtypes among Libyan population and its association with various risk factors is not known. The objectives of this study were to determine the epidemiological manifestations of HCV genotypes among Libyan patients and their association with certain potential risk factors. Methods A total of 1240 of HCV infected patients registered at Tripoli Medical Centre were studied in five years period from January 2005 to October 2009. The information were reviewed and the data were collected. A sample from each patient (785 male; 455 female was analysed for genotyping and sub-typing using specific genotyping assay. The information was correlated with the risk factors studied and the statistical data were analyzed using SPSS version 11.5. Results Off the total patients studied, four different genotypes were reported, including genotypes 1, 2, 3, and 4. Genotype4 was the commonest (35.7%, followed by genotype1 (32.6%. According to subtypes 28% were unclassified genotype 4, 14.6% were genotype 1b and some patients infected with more than one subtype (2.3% genotype 4c/d, 1% genotype 2a/c. Genotypes 1 was the commonest among males, while genotype 4 among females. According to the risk factors studied, Genotype1 and genotype 4 were found with most of the risk factors. Though they were particularly evident surgical intervention, dental procedures and blood transfusion while genotype 1 was only followed by genotype 3 mainly which mainly associated with certain risk groups such as intravenous drug abusers. Conclusion Here in we report on a detailed description of HCV genotype among Libyans. The most common genotype was type 4 followed by genotype 1, other genotypes were also reported at a low rate. The distribution of such genotypes were also variable according to gender and age. The commonly prevalent genotypes found to be attributable to the medical -related transmission of HCV, such as blood

  17. Hepatitis C Virus: Virology and Genotypes

    Abdelaziz, Ahmed

    2017-01-01

    Hepatitis C virus (HCV) is a major causative agent of chronic liver disease worldwide. HCV is characterized by genetic heterogeneity, with at least six genotypes identified. The geographic distribution of genotypes has shown variations in different

  18. Study of Various HCV Genotypes in Patients Managing by Referral Clinic in Yazd Province

    M Pedarzadeh

    2012-02-01

    Full Text Available Introduction: Determining virus genotype is a major factor for initiation of treatment because various kinds of genotypes need different antiviral drugs. Distribution of hepatitis C genotype in the word is variable in each country or even in each province. So we need to determine distribution pattern of hepatitis C genotype in our region. This study was performed in referral clinic of Yazd province. Methods: This was a descriptive study conducted between 2007 and 2010 on patients who were observed by Yazd referral clinic (the clinic for evaluating and management of patients with high risk behaviors. Ninety two patients who had positive RIBA test for hepatitis C infection were randomly selected and entered the study. Genotyping was performed using RT-PCR method. The primer was "universal primer HCV". Prevalence of various genotypes was analyzed according to gender, addiction and co- existence of HCV-HIV infection. Personal information and laboratory results were analyzed using SPSS. Results: The most common genotype in our study was genotype 3a (65% of cases, followed by 1a (35%. Globally 83% of patients were IV drug addict. Genotype distribution in these patients was similar to others. Fifteen patients had co-infection of HCV-HIV, and 47% of them were contaminated by genotype 1a and 53% with 3a. We could not find any patient contaminated with genotypes 2 or 4. No other genotypes except 1 & 3 or mixed genotype infection could be determined in our patients. Twenty three percent of patients had negative PCR despite positive RIBA test. This indicates that self improvement from acute hepatitis C infection in IV drug addict patients is similar to other people. Conclusion: According to the results of our study, about 2/3 of patients were infected by genotype 3a. This kind of chronic hepatitis C shows a better response to treatment comparing genotype 1a (or 1b with shorter duration and lower cost drugs. But despite higher incidence of genotype 3a, we

  19. Genomic prediction when some animals are not genotyped

    Lund Mogens S

    2010-01-01

    Full Text Available Abstract Background The use of genomic selection in breeding programs may increase the rate of genetic improvement, reduce the generation time, and provide higher accuracy of estimated breeding values (EBVs. A number of different methods have been developed for genomic prediction of breeding values, but many of them assume that all animals have been genotyped. In practice, not all animals are genotyped, and the methods have to be adapted to this situation. Results In this paper we provide an extension of a linear mixed model method for genomic prediction to the situation with non-genotyped animals. The model specifies that a breeding value is the sum of a genomic and a polygenic genetic random effect, where genomic genetic random effects are correlated with a genomic relationship matrix constructed from markers and the polygenic genetic random effects are correlated with the usual relationship matrix. The extension of the model to non-genotyped animals is made by using the pedigree to derive an extension of the genomic relationship matrix to non-genotyped animals. As a result, in the extended model the estimated breeding values are obtained by blending the information used to compute traditional EBVs and the information used to compute purely genomic EBVs. Parameters in the model are estimated using average information REML and estimated breeding values are best linear unbiased predictions (BLUPs. The method is illustrated using a simulated data set. Conclusions The extension of the method to non-genotyped animals presented in this paper makes it possible to integrate all the genomic, pedigree and phenotype information into a one-step procedure for genomic prediction. Such a one-step procedure results in more accurate estimated breeding values and has the potential to become the standard tool for genomic prediction of breeding values in future practical evaluations in pig and cattle breeding.

  20. Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels

    van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Deelen, Joris; Isaacs, Aaron; Medina-Gomez, Carolina; Mbarek, Hamdi; Kanterakis, Alexandros; Trompet, Stella; Postmus, Iris; Verweij, Niek; van Enckevort, David J.; Huffman, Jennifer E.; White, Charles C.; Feitosa, Mary F.; Bartz, Traci M.; Manichaikul, Ani; Joshi, Peter K.; Peloso, Gina M.; Deelen, Patrick; van Dijk, Freerk; Willemsen, Gonneke; de Geus, Eco J.; Milaneschi, Yuri; Penninx, Brenda W.J.H.; Francioli, Laurent C.; Menelaou, Androniki; Pulit, Sara L.; Rivadeneira, Fernando; Hofman, Albert; Oostra, Ben A.; Franco, Oscar H.; Leach, Irene Mateo; Beekman, Marian; de Craen, Anton J.M.; Uh, Hae-Won; Trochet, Holly; Hocking, Lynne J.; Porteous, David J.; Sattar, Naveed; Packard, Chris J.; Buckley, Brendan M.; Brody, Jennifer A.; Bis, Joshua C.; Rotter, Jerome I.; Mychaleckyj, Josyf C.; Campbell, Harry; Duan, Qing; Lange, Leslie A.; Wilson, James F.; Hayward, Caroline; Polasek, Ozren; Vitart, Veronique; Rudan, Igor; Wright, Alan F.; Rich, Stephen S.; Psaty, Bruce M.; Borecki, Ingrid B.; Kearney, Patricia M.; Stott, David J.; Adrienne Cupples, L.; Neerincx, Pieter B.T.; Elbers, Clara C.; Francesco Palamara, Pier; Pe'er, Itsik; Abdellaoui, Abdel; Kloosterman, Wigard P.; van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F.J.; Stoneking, Mark; de Knijff, Peter; Kayser, Manfred; Veldink, Jan H.; van den Berg, Leonard H.; Byelas, Heorhiy; den Dunnen, Johan T.; Dijkstra, Martijn; Amin, Najaf; Joeri van der Velde, K.; van Setten, Jessica; Kattenberg, Mathijs; van Schaik, Barbera D.C.; Bot, Jan; Nijman, Isaäc J.; Mei, Hailiang; Koval, Vyacheslav; Ye, Kai; Lameijer, Eric-Wubbo; Moed, Matthijs H.; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Sunyaev, Shamil R.; Sohail, Mashaal; Hormozdiari, Fereydoun; Marschall, Tobias; Schönhuth, Alexander; Guryev, Victor; Suchiman, H. Eka D.; Wolffenbuttel, Bruce H.; Platteel, Mathieu; Pitts, Steven J.; Potluri, Shobha; Cox, David R.; Li, Qibin; Li, Yingrui; Du, Yuanping; Chen, Ruoyan; Cao, Hongzhi; Li, Ning; Cao, Sujie; Wang, Jun; Bovenberg, Jasper A.; Jukema, J. Wouter; van der Harst, Pim; Sijbrands, Eric J.; Hottenga, Jouke-Jan; Uitterlinden, Andre G.; Swertz, Morris A.; van Ommen, Gert-Jan B.; de Bakker, Paul I.W.; Eline Slagboom, P.; Boomsma, Dorret I.; Wijmenga, Cisca; van Duijn, Cornelia M.

    2015-01-01

    Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (~35,000 samples) with the population-specific reference panel created by the Genome of the Netherlands Project and perform association testing with blood lipid levels. We report the discovery of five novel associations at four loci (P value <6.61 × 10−4), including a rare missense variant in ABCA6 (rs77542162, p.Cys1359Arg, frequency 0.034), which is predicted to be deleterious. The frequency of this ABCA6 variant is 3.65-fold increased in the Dutch and its effect (βLDL-C=0.135, βTC=0.140) is estimated to be very similar to those observed for single variants in well-known lipid genes, such as LDLR. PMID:25751400

  1. Genotype x environment interaction and optimum resource ...

    ... x E) interaction and to determine the optimum resource allocation for cassava yield trials. The effects of environment, genotype and G x E interaction were highly significant for all yield traits. Variations due to G x E interaction were greater than those due to genotypic differences for all yield traits. Genotype x location x year ...

  2. Representativeness of Tuberculosis Genotyping Surveillance in the United States, 2009-2010.

    Shak, Emma B; France, Anne Marie; Cowan, Lauren; Starks, Angela M; Grant, Juliana

    2015-01-01

    Genotyping of Mycobacterium tuberculosis isolates contributes to tuberculosis (TB) control through detection of possible outbreaks. However, 20% of U.S. cases do not have an isolate for testing, and 10% of cases with isolates do not have a genotype reported. TB outbreaks in populations with incomplete genotyping data might be missed by genotyping-based outbreak detection. Therefore, we assessed the representativeness of TB genotyping data by comparing characteristics of cases reported during January 1, 2009-December 31, 2010, that had a genotype result with those cases that did not. Of 22,476 cases, 14,922 (66%) had a genotype result. Cases without genotype results were more likely to be patients <19 years of age, with unknown HIV status, of female sex, U.S.-born, and with no recent history of homelessness or substance abuse. Although cases with a genotype result are largely representative of all reported U.S. TB cases, outbreak detection methods that rely solely on genotyping data may underestimate TB transmission among certain groups.

  3. Representativeness of Tuberculosis Genotyping Surveillance in the United States, 2009–2010

    Shak, Emma B.; Cowan, Lauren; Starks, Angela M.; Grant, Juliana

    2015-01-01

    Genotyping of Mycobacterium tuberculosis isolates contributes to tuberculosis (TB) control through detection of possible outbreaks. However, 20% of U.S. cases do not have an isolate for testing, and 10% of cases with isolates do not have a genotype reported. TB outbreaks in populations with incomplete genotyping data might be missed by genotyping-based outbreak detection. Therefore, we assessed the representativeness of TB genotyping data by comparing characteristics of cases reported during January 1, 2009–December 31, 2010, that had a genotype result with those cases that did not. Of 22,476 cases, 14,922 (66%) had a genotype result. Cases without genotype results were more likely to be patients <19 years of age, with unknown HIV status, of female sex, U.S.-born, and with no recent history of homelessness or substance abuse. Although cases with a genotype result are largely representative of all reported U.S. TB cases, outbreak detection methods that rely solely on genotyping data may underestimate TB transmission among certain groups. PMID:26556930

  4. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

    Momoko Horikoshi

    2015-07-01

    Full Text Available Reference panels from the 1000 Genomes (1000G Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS, supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI at genome-wide significance, and two for fasting glucose (FG, none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3 and FG (GCK and G6PC2. The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.

  5. Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

    Horikoshi, Momoko; Mӓgi, Reedik; van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; Hӓgg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S; Winkler, Thomas W; Willems, Sara M; Pervjakova, Natalia; Esko, Tõnu; Beekman, Marian; Nelson, Christopher P; Willenborg, Christina; Wiltshire, Steven; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K E; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R; Groves, Christopher J; Bennett, Amanda J; Lehtimӓki, Terho; Viikari, Jorma S; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M; Herder, Christian; Grallert, Harald; Müller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M; Karssen, Lennart C; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J; de Craen, Anton J M; Deelen, Joris; Havulinna, Aki S; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D; Samani, Nilesh J; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M; Slagboom, P Eline; Metspalu, Andres; van Duijn, Cornelia M; Eriksson, Johan G; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T; Power, Chris; Penninx, Brenda W J H; de Geus, Eco; Smit, Johannes H; Boomsma, Dorret I; Pedersen, Nancy L; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I; Morris, Andrew P

    2015-07-01

    Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.

  6. Servizi finanziari imputati e interdipendenze settoriali: un'analisi settoriale del ruolo del credito nel sistema economico. (Imputed bank services and sectoral interdependences: a structural analysis of the role of credit in the economy

    C. BIANCHI

    2013-12-01

    Full Text Available I sistemi di contabilità nazionale in base alla metodologia SEC sono soliti comportarsi in modo da garantire l'impossibilità pratica di effettuare qualsiasi assegnazione significativa di servizi bancari imputati tra i singoli rami di attività economica.  Il presente lavoro mostra come questo vieta l'analisi strutturale del ruolo del credito nel sistema di interdipendenze. . L'analisi mette in evidenza la duplice natura del credito come contenuti a valore aggiunto altamente intermedio e alto. È in grado di influenzare forte su i costi di produzione degli altri rami, senza essere influenzato da loro. Queste proprietà conferiscono al settore bancario un potenziale molto elevato per l'inflazione.National accounts systems based on the SEC methodology are usually thought to comport the practical impossibility of carrying out any meaningful allocation of imputed bank services among the single branches of economic activity. As a consequence, the total value of the net interest earned by the credit system as a whole is considered as a cost entry and a negative component of added value in an ad-hoc additional industry, to be aggregated to the main credit one in the typical input-output analysis. The present work shows how this prohibits the structural analysis of the role of credit in the system of interdependencies. A method of is proposed in which imputed services of credit are distributed by branches, on the basis of existing statistics, proving valuable in assessing the significance of certain quantities of national accounts, such as operating results. The analysis highlights the dual nature of credit as highly intermediate and high value-added content. It is able to strongly influence the production costs of the other branches, without being influenced by them. These properties give the banking industry a very high potential for inflation. JEL: E51, G21

  7. Hepatitis C virus genotypes: A plausible association with viral loads

    Salma Ghulam Nabi

    2013-01-01

    Full Text Available Background and Aim: The basic aim of this study was to find out the association of genotypes with host age, gender and viral load. Material and Methods: The present study was conducted at Social Security Hospital, Pakistan. This study included 320 patients with chronic hepatitis C virus (HCV infection who were referred to the hospital between November 2011 and July 2012. HCV viral detection and genotyping was performed and the association was seen between genotypes and host age, gender and viral load. Results : The analysis revealed the presence of genotypes 1 and 3 with further subtypes 1a, 1b, 3a, 3b and mixed genotypes 1b + 3a, 1b + 3b and 3a + 3b. Viral load quantification was carried out in all 151 HCV ribonucleic acid (RNA positive patients. The genotype 3a was observed in 124 (82.12% patients, 3b was found in 21 (13.91%, 1a was seen in 2 (1.32%, 1b in 1 (0.66%, mixed infection with 1b + 3a in 1 (0.66%, 1b + 3b in 1 (0.66% and 3a + 3b was also found in 1 (0.66% patient. Viral load quantification was carried out in all 151 HCV RNA positive patients and was compared between the various genotypes. The mean viral load in patients infected with genotype 1a was 2.75 × 10 6 , 1b 3.9 × 10 6 , 3a 2.65 × 10 6 , 3b 2.51 × 10 6 , 1b + 3a 3.4 × 106, 1b + 3b 2.7 × 106 and 3a + 3b 3.5 × 10 6 . An association between different types of genotypes and viral load was observed. Conclusion : Further studies should be carried out to determine the association of viral load with different genotypes so that sufficient data is available and can be used to determine the type and duration of therapy needed and predict disease outcome.

  8. Direct maximum parsimony phylogeny reconstruction from genotype data

    Ravi R

    2007-12-01

    Full Text Available Abstract Background Maximum parsimony phylogenetic tree reconstruction from genetic variation data is a fundamental problem in computational genetics with many practical applications in population genetics, whole genome analysis, and the search for genetic predictors of disease. Efficient methods are available for reconstruction of maximum parsimony trees from haplotype data, but such data are difficult to determine directly for autosomal DNA. Data more commonly is available in the form of genotypes, which consist of conflated combinations of pairs of haplotypes from homologous chromosomes. Currently, there are no general algorithms for the direct reconstruction of maximum parsimony phylogenies from genotype data. Hence phylogenetic applications for autosomal data must therefore rely on other methods for first computationally inferring haplotypes from genotypes. Results In this work, we develop the first practical method for computing maximum parsimony phylogenies directly from genotype data. We show that the standard practice of first inferring haplotypes from genotypes and then reconstructing a phylogeny on the haplotypes often substantially overestimates phylogeny size. As an immediate application, our method can be used to determine the minimum number of mutations required to explain a given set of observed genotypes. Conclusion Phylogeny reconstruction directly from unphased data is computationally feasible for moderate-sized problem instances and can lead to substantially more accurate tree size inferences than the standard practice of treating phasing and phylogeny construction as two separate analysis stages. The difference between the approaches is particularly important for downstream applications that require a lower-bound on the number of mutations that the genetic region has undergone.

  9. Evaluation of the Abbott realtime HCV genotype II RUO (GT II) assay with reference to 5'UTR, core and NS5B sequencing.

    Mallory, Melanie A; Lucic, Danijela X; Sears, Mitchell T; Cloherty, Gavin A; Hillyard, David R

    2014-05-01

    HCV genotyping is a critical tool for guiding initiation of therapy and selecting the most appropriate treatment regimen. To evaluate the concordance between the Abbott GT II assay and genotyping by sequencing subregions of the HCV 5'UTR, core and NS5B. The Abbott assay was used to genotype 127 routine patient specimens and 35 patient specimens with unusual subtypes and mixed infection. Abbott results were compared to genotyping by 5'UTR, core and NS5B sequencing. Sequences were genotyped using the NCBI non-redundant database and the online genotyping tool COMET. Among routine specimens, core/NS5B sequencing identified 93 genotype 1s, 13 genotype 2s, 15 genotype 3s, three genotype 4s, two genotype 6s and one recombinant specimen. Genotype calls by 5'UTR, core, NS5B sequencing and the Abbott assay were 97.6% concordant. Core/NS5B sequencing identified two discrepant samples as genotype 6 (subtypes 6l and 6u) while Abbott and 5'UTR sequencing identified these samples as genotype 1 with no subtype. The Abbott assay subtyped 91.4% of genotype 1 specimens. Among the 35 rare specimens, the Abbott assay inaccurately genotyped 3k, 6e, 6o, 6q and one genotype 4 variant; gave indeterminate results for 3g, 3h, 4r, 6m, 6n, and 6q specimens; and agreed with core/NS5B sequencing for mixed specimens. The Abbott assay is an automated HCV genotyping method with improved accuracy over 5'UTR sequencing. Samples identified by the Abbott assay as genotype 1 with no subtype may be rare subtypes of other genotypes and thus require confirmation by another method. Copyright © 2014 Elsevier B.V. All rights reserved.

  10. Genetic Divergence in Sugarcane Genotypes

    Tahir, Mohammad; Rahman, Hidayatur; Gul, Rahmani; Ali, Amjad; Khalid, Muhammad

    2012-01-01

    To assess genetic divergence of sugarcane germplasm, an experiment comprising 25 sugarcane genotypes was conducted at Sugar Crops Research Institute (SCRI), Mardan, Khyber Pakhtunkhwa, Pakistan, in quadruple lattice design during 2008-09. Among the 14 parameters evaluated, majority exhibited significant differences while some showed nonsignificant mean squares. The initial correlation matrix revealed medium to high correlations. Principal Component Analysis (PCA) showed that there were two pr...

  11. [A study on genotype of 271 mycobacterium tuberculosis isolates in 6 prefectures in Yunnan Province].

    Chen, L Y; Yang, X; Ru, H H; Yang, H J; Yan, S Q; Ma, L; Chen, J O; Yang, R; Xu, L

    2018-01-06

    Objective: To understand the characteristics of genotypes of Mycobacterium tuberculosis isolates in Yunnan province, and provide the molecular epidemiological evidence for prevention and control of tuberculosis in Yunnan Province. Methods: Mycobacterium Tuberculosis isolates were collected from 6 prefectures of Yunnan province in 2014 and their Genetypes of Mycobacterium tuberculosis isolates were obtained using spoligotyping and multiple locus variable numbers of tandem repeats analysis (MLVA). The results of spoligotyping were entered into the SITVITWEB database to obtain the Spoligotyping International Type (SIT) patterns and the sublineages of MTB isolates. The genoyping patterns were clustered with BioNumerics (version 5.0). Results: A total of 271 MTB isolates represented patients were collected from six prefectures in Yunnan province. Out of these patients, 196 (72.3%) were male. The mean age of the patients was (41.9±15.1) years. The most MTB isolates were from Puer, totally 94 iusolates(34.69%). Spoligotyping analysis revealed that 151 (55.72%) MTB isolates belonged to the Beijing genotype, while the other 120 (44.28%) were from non-Beijing genotype; 40 genotypes were consisted of 24 unique genotypes and 16 clusters. The 271 isolates were differentiated into 30 clusters (2 to 17 isolates per cluster) and 177 unique genotypes, showing a clustering rate of 23.62%. Beijing genotype strains showed higher clustering rate than non-Beijing genotype strains (29.14% vs 16.67%). The HGI of 12-locus VNTR in total MTB strains, Beijing genotype strains and non-Beijing genotype was 0.993, 0.982 and 0.995 respectively. Conclusion: The Beijing genotype was the predominant genotype in Yunnan Province, the characteristics of Mycobacterium tuberculosis showed high genetic diversity. The genotyping data reflect the potential recent ongoing transmission in some area, which highlights the urgent need for early diagnosis and treatment of the infectious TB cases, to cut off the

  12. The Phenotype/Genotype Correlation of Lactase Persistence among Omani Adults

    Abdulrahim Al-Abri

    2013-09-01

    Full Text Available Objective: To examine the correlation of lactase persistence phenotype with genotype in Omani adults.Methods: Lactase persistence phenotype was tested by hydrogen breath test in 52 Omani Adults using the Micro H2 analyzer. Results were checked against genotyping using direct DNA sequencing.Results: Forty one individuals with C/C-13910 and T/T-13915 genotypes had positive breath tests (≥20 ppm; while eight of nine individuals with T/C-13910 or T/G-13915 genotypes had negative breath tests (<20 ppm and two subjects were non-hydrogen producers. The agreement between phenotype and genotype using Kappa value was very good (0.93.Conclusion: Genotyping both T/C-13910 and T/G-13915 alleles can be used to assist diagnosis and predict lactose intolerance in the Omani population.

  13. Viral fitness does not correlate with three genotype displacement events involving infectious hematopoietic necrosis virus

    Kell, Alison M.; Wargo, Andrew R.; Kurath, Gael

    2014-01-01

    Viral genotype displacement events are characterized by the replacement of a previously dominant virus genotype by a novel genotype of the same virus species in a given geographic region. We examine here the fitness of three pairs of infectious hematopoietic necrosis virus (IHNV) genotypes involved in three major genotype displacement events in Washington state over the last 30 years to determine whether increased virus fitness correlates with displacement. Fitness was assessed using in vivo assays to measure viral replication in single infection, simultaneous co-infection, and sequential superinfection in the natural host, steelhead trout. In addition, virion stability of each genotype was measured in freshwater and seawater environments at various temperatures. By these methods, we found no correlation between increased viral fitness and displacement in the field. These results suggest that other pressures likely exist in the field with important consequences for IHNV evolution.

  14. Is incidence of multiple HPV genotypes rising in genital infections?

    Amir Sohrabi

    2017-11-01

    Full Text Available Frequency of cervical cancer related to Human Papilloma Virus (HPV has increased remarkably in less-developed countries. Hence, applying capable diagnostic methods is urgently needed, as is having a therapeutic strategy as an effective step for cervical cancer prevention. The aim of this study was to investigate the prevalence of various multi-type HPV infection patterns and their possible rising incidence in women with genital infections.This descriptive study was conducted on women who attended referral clinical laboratories in Tehran for genital infections from January 2012 until December 2013. A total of 1387 archival cervical scraping and lesion specimens were collected from referred women. HPV genotyping was performed using approved HPV commercial diagnostic technologies with either INNO-LiPA HPV or Geno Array Test kits.HPV was positive in 563 cases (40.59% with mean age of 32.35 ± 9.96. Single, multiple HPV genotypes and untypable cases were detected in 398 (70.69%, 160 (28.42% and 5 (0.89% cases, respectively. Multiple HPV infections were detected in 92 (57.5%, 42 (26.2%, 17 (10.6% and 9 (5.7% cases as two, three, four and five or more genotypes, respectively. The prevalence of 32 HPV genotypes was determined one by one. Seventeen HPV genotypes were identified in 95.78% of all positive infections. Five dominant genotypes, HPV6, 16, 53, 11 and 31, were identified in a total of 52.35%of the HPV positive cases.In the present study, we were able to evaluate the rate of multiple HPV types in genital infections. Nevertheless, it is necessary to evaluate the role of the dominant HPV low-risk types and the new probably high-risk genotypes, such as HPV53, in the increasing incidences of genital infections. Keywords: Multiple HPV Types, Incidence, Genital infection, Cervical cancer, Iran

  15. The JP2 genotype of Aggregatibacter actinomycetemcomitans and marginal periodontitis in the mixed dentition

    Jensen, Anne Birkeholm; Ennibi, Oum Keltoum; Ismaili, Zouheir

    2016-01-01

    AIM: To perform a cross-sectional study on the carrier frequency of JP2 and non-JP2 genotypes of A. actinomycetemcomitans in Moroccan schoolchildren and relate the presence of these genotypes to the periodontal status in the mixed dentition. MATERIAL AND METHODS: A plaque sample from 513 children...... the JP2 genotype and 186 (36.3%) were positive for non-JP2 genotypes, whereas A. actinomycetemcomitans could not be detected in the remaining 281 subjects. Among 75 subjects with mixed dentition and selected for clinical examination, clinical attachment loss (CAL) ≥3 mm at two or more periodontal sites...

  16. Occupational Tuberculosis in Denmark through 21 Years Analysed by Nationwide Genotyping

    Pedersen, Mathias Klok; Andersen, Aase Bengaard; Andersen, Peter Henrik

    2016-01-01

    Tuberculosis (TB) is a well-known occupational hazard. Based on more than two decades (1992-2012) of centralized nationwide genotyping of all Mycobacterium tuberculosis culture-positive TB patients in Denmark, we compared M. tuberculosis genotypes from all cases notified as presumed occupational (N...... = 130) with M. tuberculosis genotypes from all TB cases present in the country (N = 7,127). From 1992 through 2006, the IS6110 Restriction Fragment Length Polymorphism (RFLP) method was used for genotyping, whereas from 2005 to present, the 24-locus-based Mycobacterial Interspersed Repetitive Unit...

  17. Nephele: genotyping via complete composition vectors and MapReduce

    Mardis Scott

    2011-08-01

    Full Text Available Abstract Background Current sequencing technology makes it practical to sequence many samples of a given organism, raising new challenges for the processing and interpretation of large genomics data sets with associated metadata. Traditional computational phylogenetic methods are ideal for studying the evolution of gene/protein families and using those to infer the evolution of an organism, but are less than ideal for the study of the whole organism mainly due to the presence of insertions/deletions/rearrangements. These methods provide the researcher with the ability to group a set of samples into distinct genotypic groups based on sequence similarity, which can then be associated with metadata, such as host information, pathogenicity, and time or location of occurrence. Genotyping is critical to understanding, at a genomic level, the origin and spread of infectious diseases. Increasingly, genotyping is coming into use for disease surveillance activities, as well as for microbial forensics. The classic genotyping approach has been based on phylogenetic analysis, starting with a multiple sequence alignment. Genotypes are then established by expert examination of phylogenetic trees. However, these traditional single-processor methods are suboptimal for rapidly growing sequence datasets being generated by next-generation DNA sequencing machines, because they increase in computational complexity quickly with the number of sequences. Results Nephele is a suite of tools that uses the complete composition vector algorithm to represent each sequence in the dataset as a vector derived from its constituent k-mers by passing the need for multiple sequence alignment, and affinity propagation clustering to group the sequences into genotypes based on a distance measure over the vectors. Our methods produce results that correlate well with expert-defined clades or genotypes, at a fraction of the computational cost of traditional phylogenetic methods run on

  18. Genotypic and phenotypic characterization of Chikungunya virus of different genotypes from Malaysia.

    I-Ching Sam

    Full Text Available BACKGROUND: Mosquito-borne Chikungunya virus (CHIKV has recently re-emerged globally. The epidemic East/Central/South African (ECSA strains have spread for the first time to Asia, which previously only had endemic Asian strains. In Malaysia, the ECSA strain caused an extensive nationwide outbreak in 2008, while the Asian strains only caused limited outbreaks prior to this. To gain insight into these observed epidemiological differences, we compared genotypic and phenotypic characteristics of CHIKV of Asian and ECSA genotypes isolated in Malaysia. METHODS AND FINDINGS: CHIKV of Asian and ECSA genotypes were isolated from patients during outbreaks in Bagan Panchor in 2006, and Johor in 2008. Sequencing of the CHIKV strains revealed 96.8% amino acid similarity, including an unusual 7 residue deletion in the nsP3 protein of the Asian strain. CHIKV replication in cells and Aedes mosquitoes was measured by virus titration. There were no differences in mammalian cell lines. The ECSA strain reached significantly higher titres in Ae. albopictus cells (C6/36. Both CHIKV strains infected Ae. albopictus mosquitoes at a higher rate than Ae. aegypti, but when compared to each other, the ECSA strain had much higher midgut infection and replication, and salivary gland dissemination, while the Asian strain infected Ae. aegypti at higher rates. CONCLUSIONS: The greater ability of the ECSA strain to replicate in Ae. albopictus may explain why it spread far more quickly and extensively in humans in Malaysia than the Asian strain ever did, particularly in rural areas where Ae. albopictus predominates. Intergenotypic genetic differences were found at E1, E2, and nsP3 sites previously reported to be determinants of host adaptability in alphaviruses. Transmission of CHIKV in humans is influenced by virus strain and vector species, which has implications for regions with more than one circulating CHIKV genotype and Aedes species.

  19. Possible Synergistic Interactions Among Multiple HPV Genotypes in Women Suffering from Genital Neoplasia

    Hajia, Massoud; Sohrabi, Amir

    2018-03-27

    Objective: Persistence of HPV infection is the true cause of cervical disorders. It is reported that competition may exist among HPV genotypes for colonization. This survey was designed to establish the multiple HPV genotype status in our community and the probability of multiple HPV infections involvement. Methods: All multiple HPV infections were selected for investigation in women suffering from genital infections referred to private laboratories in Tehran, Iran. A total of 160 multi HPV positive specimens from cervical scraping were identified by the HPV genotyping methods, "INNO-LiPA and Geno Array". Result: In present study, HPV 6 (LR), 16 (HR), 53 (pHR), 31 (HR) and 11 (LR) were included in 48.8% of detected infections as the most five dominant genotypes. HPV 16 was detected at the highest rate with genotypes 53, 31 and 52, while HPV 53 appeared linked with HPV 16, 51 and 56 in concurrent infections. It appears that HPV 16 and 53 may have significant tendencies to associate with each other rather than with other genotypes. Analysis of the data revealed there may be some synergistic interactions with a few particular genotypes such as "HPV 53". Conclusion: Multiple HPV genotypes appear more likely to be linked with development of cervical abnormalities especially in patients with genital infections. Since, there are various patterns of dominant HPV genotypes in different regions of world, more investigations of this type should be performed for careHPV programs in individual countries. Creative Commons Attribution License

  20. Haemoglobin genotype of children with severe malaria seen at the ...

    Prof Ezechukwu

    2011-10-23

    Oct 23, 2011 ... malaria seen in University of Benin. Teaching Hospital (UBTH), Benin. City. Patients and methods: ... gested to play crucial role in the defense of host against malaria infection and reduce susceptibility to severe .... Binary logistic regression model using Hb genotype status (abnormal Hb versus HbAA) as the ...

  1. Interaction between genotype and climates for Holstein milk ...

    This study was designed to investigate the interaction between genotype and climate for milk and fat production traits of Iranian Holstein dairy herds. Milk and fat production data were grouped in 5 climates, on the basis of Extended De Martonne method. (Co)Variance components and genetic parameters of first lactation ...

  2. Early-onset stargardt disease: phenotypic and genotypic characteristics

    Lambertus, S.; Huet, R.A.C. van; Bax, N.M.; Hoefsloot, L.H.; Cremers, F.P.M.; Boon, C.J.F.; Klevering, B.J.; Hoyng, C.B.

    2015-01-01

    OBJECTIVE: To describe the phenotype and genotype of patients with early-onset Stargardt disease. DESIGN: Retrospective cohort study. PARTICIPANTS: Fifty-one Stargardt patients with age at onset METHODS: We reviewed patient medical records for age at onset, medical history, initial

  3. Comparison and suitability of genotype by environment analysis ...

    Pearl millet (Pennisetum glaucum (L.) R. Br.) is an important food security and income crop for households living in semi-arid zones in Uganda. However, the genotype by environment interaction, in addition to the several methods used for its assessment, complicates selection of varieties adapted to such semi-arid areas.

  4. detection of the predominant microcystin-producing genotype of ...

    use

    2011-12-21

    Dec 21, 2011 ... genotypes of MC-producing cyanobacteria in Mozambique. Polymerase chain ...... 1(7): 359-366. Pan H, Song L, Liu Y, Börner T (2002). ... Miles CO (2009). A convenient and cost-effective method for monitoring marine algal ...

  5. Protein landmarks for diversity assessment in wheat genotypes ...

    Grain proteins from 20 Indian wheat genotypes were evaluated for diversity assessment based seed storage protein profiling on sodium dodecylsulphate polyacrylamide gel electrophoresis (SDS-PAGE). Genetic diversity was evaluated using Nei's index, Shannon index and Unweighted pair group method with arithmetic ...

  6. Serotonin transporter genotype, salivary cortisol, neuroticism and life events

    Vinberg, Maj; Miskowiak, Kamilla; Kessing, Lars Vedel

    2014-01-01

    OBJECTIVE: To investigate if cortisol alone or in interaction with other risk factors (familial risk, the serotonin transporter genotype, neuroticism and life events (LEs)) predicts onset of psychiatric disorder in healthy individuals at heritable risk. MATRIAL AND METHODS: In a high-risk study...

  7. Reliable Single Chip Genotyping with Semi-Parametric Log-Concave Mixtures

    R.C.A. Rippe (Ralph); J.J. Meulman (Jacqueline); P.H.C. Eilers (Paul)

    2012-01-01

    textabstractThe common approach to SNP genotyping is to use (model-based) clustering per individual SNP, on a set of arrays. Genotyping all SNPs on a single array is much more attractive, in terms of flexibility, stability and applicability, when developing new chips. A new semi-parametric method,

  8. Towards a database for genotype-phenotype association research: mining data from encyclopaedia

    Pajić, V.S.; Pavlović-Lažetić, G.M.; Beljanski, M.V.; Brandt, B.W.; Pajić, M.B.

    2013-01-01

    To associate phenotypic characteristics of an organism to molecules encoded by its genome, there is a need for well-structured genotype and phenotype data. We use a novel method for extracting data on phenotype and genotype characteristics of microorganisms from text. As a resource, we use an

  9. Hepatitis C viral load, genotype 3 and interleukin-28B CC genotype predict mortality in HIV and hepatitis C-coinfected individuals

    Clausen, Louise Nygaard; Astvad, Karen; Ladelund, Steen

    2012-01-01

    OBJECTIVE: We hypothesized that hepatitis C virus (HCV) load and genotype may influence all-cause mortality in HIV-HCV-coinfected individuals. DESIGN AND METHODS: Observational prospective cohort study. Mortality rates were compared in a time-updated multivariate Poisson regression analysis....... RESULTS: We included 264 consecutive HIV-HCV-coinfected individuals. During 1143 person years at risk (PYR) 118 individuals died [overall mortality rate 10 (95% confidence interval; 8, 12)/100 PYR]. In multivariate analysis, a 1 log increase in HCV viral load was associated with a 30% higher mortality......) CC genotype was associated with 54% higher mortality risk [aMRR: 1.54 (0.89, 3.82] compared to TT genotype. CONCLUSION: High-HCV viral load, HCV genotype 3 and IL28B genotype CC had a significant influence on the risk of all-cause mortality among individuals coinfected with HIV-1. This may have...

  10. The study of correlation between HBV genotype and the response to transcatheter arterial chemoembolization therapy in hepatocellular carcinoma patients

    Huang Keyao; Yang Weizhu; Jiang Na; Zheng Qubing

    2004-01-01

    Objective: To evaluate the influence of hepatitis B virus(HBV) genotype on response to transcatheter arterial embolization therapy in patients with HBV-related HCC. Methods: Transcatheter arterial chemoem-bolization therapy was conducted in patients with HBV-related HCC and response to embolization therapy were observed according to the tumor necrosis rate, the HCC recurrence rate, the cumulative incidence of survival rate and the change of AFP. The HBV genotype was determined by sequencing directly the polymerase chain reaction products of the HBV S gene. The response of HCC to embolization therapy was compared between patients who were infected with different genotypic HBV. Results: The tumor necrosis rate of genotype C patients was similar to that of genotype B patients (P=0.099). The HCC recurrence rate of genotype B was lower than that of genotype C patients (P=0.036). The cumulative incidence of survival rates of 2 and 3 years were significantly higher in the genotype B patients (P=0.036 and P=0.013). There was no difference between the two genotypes, patients in the change of AFP (P>0.05). Conclusions: HBV genotype B patients seem to have a better response to embolization therapy as compared to genotype C patients. Determination of HBV genotype may be useful in predicting the outcomes of TACE therapy in HBV-related HCC. (authors)

  11. Simultaneous Cocirculation of Both European Varicella-Zoster Virus Genotypes (E1 and E2) in Mexico City▿

    Rodríguez-Castillo, Araceli; Vaughan, Gilberto; Ramírez-González, José Ernesto; Escobar-Gutiérrez, Alejandro

    2010-01-01

    Full-length genome analysis of varicella-zoster virus (VZV) has shown that viral strains can be classified into seven different genotypes: European (E), Mosaic (M), and Japanese (J), and the E and M genotypes can be further subclassified into E1, E2, and M1 through 4, respectively. The distribution of the main VZV genotypes in Mexico was described earlier, demonstrating the predominance of E genotype, although other genotypes (M1 and M4) were also identified. However, no information regarding the circulation of either E genotype in the country is available. In the present study, we confirm the presence of both E1 and E2 genotypes in the country and explore the possibility of coinfection as the triggering factor for increased virulence among severe cases. A total of 61 different European VZV isolates collected in the Mexico City metropolitan area from 2005 to 2006 were typed by using a PCR method based on genotype-specific primer amplification. Fifty isolates belonged to the E1 genotype, and the eleven remaining samples were classified as E2 genotypes. No coinfection with both E genotypes was identified among these specimens. We provide here new information on the distribution of VZV genotypes circulating in Mexico City. PMID:20220168

  12. Temperature Switch PCR (TSP: Robust assay design for reliable amplification and genotyping of SNPs

    Mather Diane E

    2009-12-01

    Full Text Available Abstract Background Many research and diagnostic applications rely upon the assay of individual single nucleotide polymorphisms (SNPs. Thus, methods to improve the speed and efficiency for single-marker SNP genotyping are highly desirable. Here, we describe the method of temperature-switch PCR (TSP, a biphasic four-primer PCR system with a universal primer design that permits amplification of the target locus in the first phase of thermal cycling before switching to the detection of the alleles. TSP can simplify assay design for a range of commonly used single-marker SNP genotyping methods, and reduce the requirement for individual assay optimization and operator expertise in the deployment of SNP assays. Results We demonstrate the utility of TSP for the rapid construction of robust and convenient endpoint SNP genotyping assays based on allele-specific PCR and high resolution melt analysis by generating a total of 11,232 data points. The TSP assays were performed under standardised reaction conditions, requiring minimal optimization of individual assays. High genotyping accuracy was verified by 100% concordance of TSP genotypes in a blinded study with an independent genotyping method. Conclusion Theoretically, TSP can be directly incorporated into the design of assays for most current single-marker SNP genotyping methods. TSP provides several technological advances for single-marker SNP genotyping including simplified assay design and development, increased assay specificity and genotyping accuracy, and opportunities for assay automation. By reducing the requirement for operator expertise, TSP provides opportunities to deploy a wider range of single-marker SNP genotyping methods in the laboratory. TSP has broad applications and can be deployed in any animal and plant species.

  13. Two-temperature LATE-PCR endpoint genotyping

    Reis Arthur H

    2006-12-01

    Full Text Available Abstract Background In conventional PCR, total amplicon yield becomes independent of starting template number as amplification reaches plateau and varies significantly among replicate reactions. This paper describes a strategy for reconfiguring PCR so that the signal intensity of a single fluorescent detection probe after PCR thermal cycling reflects genomic composition. The resulting method corrects for product yield variations among replicate amplification reactions, permits resolution of homozygous and heterozygous genotypes based on endpoint fluorescence signal intensities, and readily identifies imbalanced allele ratios equivalent to those arising from gene/chromosomal duplications. Furthermore, the use of only a single colored probe for genotyping enhances the multiplex detection capacity of the assay. Results Two-Temperature LATE-PCR endpoint genotyping combines Linear-After-The-Exponential (LATE-PCR (an advanced form of asymmetric PCR that efficiently generates single-stranded DNA and mismatch-tolerant probes capable of detecting allele-specific targets at high temperature and total single-stranded amplicons at a lower temperature in the same reaction. The method is demonstrated here for genotyping single-nucleotide alleles of the human HEXA gene responsible for Tay-Sachs disease and for genotyping SNP alleles near the human p53 tumor suppressor gene. In each case, the final probe signals were normalized against total single-stranded DNA generated in the same reaction. Normalization reduces the coefficient of variation among replicates from 17.22% to as little as 2.78% and permits endpoint genotyping with >99.7% accuracy. These assays are robust because they are consistent over a wide range of input DNA concentrations and give the same results regardless of how many cycles of linear amplification have elapsed. The method is also sufficiently powerful to distinguish between samples with a 1:1 ratio of two alleles from samples comprised of

  14. Echinococcus granulosus sensu lato genotypes infecting humans--review of current knowledge.

    Alvarez Rojas, Cristian A; Romig, Thomas; Lightowlers, Marshall W

    2014-01-01

    Genetic variability in the species group Echinococcus granulosus sensu lato is well recognised as affecting intermediate host susceptibility and other biological features of the parasites. Molecular methods have allowed discrimination of different genotypes (G1-10 and the 'lion strain'), some of which are now considered separate species. An accumulation of genotypic analyses undertaken on parasite isolates from human cases of cystic echinococcosis provides the basis upon which an assessment is made here of the relative contribution of the different genotypes to human disease. The allocation of samples to G-numbers becomes increasingly difficult, because much more variability than previously recognised exists in the genotypic clusters G1-3 (=E. granulosus sensu stricto) and G6-10 (Echinococcus canadensis). To accommodate the heterogeneous criteria used for genotyping in the literature, we restrict ourselves to differentiate between E. granulosus sensu stricto (G1-3), Echinococcus equinus (G4), Echinococcus ortleppi (G5) and E. canadensis (G6-7, G8, G10). The genotype G1 is responsible for the great majority of human cystic echinococcosis worldwide (88.44%), has the most cosmopolitan distribution and is often associated with transmission via sheep as intermediate hosts. The closely related genotypes G6 and G7 cause a significant number of human infections (11.07%). The genotype G6 was found to be responsible for 7.34% of infections worldwide. This strain is known from Africa and Asia, where it is transmitted mainly by camels (and goats), and South America, where it appears to be mainly transmitted by goats. The G7 genotype has been responsible for 3.73% of human cases of cystic echinococcosis in eastern European countries, where the parasite is transmitted by pigs. Some of the samples (11) could not be identified with a single specific genotype belonging to E. canadensis (G6/10). Rare cases of human cystic echinococcosis have been identified as having been caused by

  15. Automated genotyping of dinucleotide repeat markers

    Perlin, M.W.; Hoffman, E.P. [Carnegie Mellon Univ., Pittsburgh, PA (United States)]|[Univ. of Pittsburgh, PA (United States)

    1994-09-01

    The dinucleotide repeats (i.e., microsatellites) such as CA-repeats are a highly polymorphic, highly abundant class of PCR-amplifiable markers that have greatly streamlined genetic mapping experimentation. It is expected that over 30,000 such markers (including tri- and tetranucleotide repeats) will be characterized for routine use in the next few years. Since only size determination, and not sequencing, is required to determine alleles, in principle, dinucleotide repeat genotyping is easily performed on electrophoretic gels, and can be automated using DNA sequencers. Unfortunately, PCR stuttering with these markers generates not one band for each allele, but a pattern of bands. Since closely spaced alleles must be disambiguated by human scoring, this poses a key obstacle to full automation. We have developed methods that overcome this obstacle. Our model is that the observed data is generated by arithmetic superposition (i.e., convolution) of multiple allele patterns. By quantitatively measuring the size of each component band, and exploiting the unique stutter pattern associated with each marker, closely spaced alleles can be deconvolved; this unambiguously reconstructs the {open_quotes}true{close_quotes} allele bands, with stutter artifact removed. We used this approach in a system for automated diagnosis of (X-linked) Duchenne muscular dystrophy; four multiplexed CA-repeats within the dystrophin gene were assayed on a DNA sequencer. Our method accurately detected small variations in gel migration that shifted the allele size estimate. In 167 nonmutated alleles, 89% (149/167) showed no size variation, 9% (15/167) showed 1 bp variation, and 2% (3/167) showed 2 bp variation. We are currently developing a library of dinucleotide repeat patterns; together with our deconvolution methods, this library will enable fully automated genotyping of dinucleotide repeats from sizing data.

  16. Decoding noises in HIV computational genotyping.

    Jia, MingRui; Shaw, Timothy; Zhang, Xing; Liu, Dong; Shen, Ye; Ezeamama, Amara E; Yang, Chunfu; Zhang, Ming

    2017-11-01

    Lack of a consistent and reliable genotyping system can critically impede HIV genomic research on pathogenesis, fitness, virulence, drug resistance, and genomic-based healthcare and treatment. At present, mis-genotyping, i.e., background noises in molecular genotyping, and its impact on epidemic surveillance is unknown. For the first time, we present a comprehensive assessment of HIV genotyping quality. HIV sequence data were retrieved from worldwide published records, and subjected to a systematic genotyping assessment pipeline. Results showed that mis-genotyped cases occurred at 4.6% globally, with some regional and high-risk population heterogeneities. Results also revealed a consistent mis-genotyping pattern in gp120 in all studied populations except the group of men who have sex with men. Our study also suggests novel virus diversities in the mis-genotyped cases. Finally, this study reemphasizes the importance of implementing a standardized genotyping pipeline to avoid genotyping disparity and to advance our understanding of virus evolution in various epidemiological settings. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. What are the appropriate methods for analyzing patient-reported outcomes in randomized trials when data are missing?

    Hamel, J F; Sebille, V; Le Neel, T; Kubis, G; Boyer, F C; Hardouin, J B

    2017-12-01

    Subjective health measurements using Patient Reported Outcomes (PRO) are increasingly used in randomized trials, particularly for patient groups comparisons. Two main types of analytical strategies can be used for such data: Classical Test Theory (CTT) and Item Response Theory models (IRT). These two strategies display very similar characteristics when data are complete, but in the common case when data are missing, whether IRT or CTT would be the most appropriate remains unknown and was investigated using simulations. We simulated PRO data such as quality of life data. Missing responses to items were simulated as being completely random, depending on an observable covariate or on an unobserved latent trait. The considered CTT-based methods allowed comparing scores using complete-case analysis, personal mean imputations or multiple-imputations based on a two-way procedure. The IRT-based method was the Wald test on a Rasch model including a group covariate. The IRT-based method and the multiple-imputations-based method for CTT displayed the highest observed power and were the only unbiased method whatever the kind of missing data. Online software and Stata® modules compatibles with the innate mi impute suite are provided for performing such analyses. Traditional procedures (listwise deletion and personal mean imputations) should be avoided, due to inevitable problems of biases and lack of power.

  18. Genetic variation of european maize genotypes (zea mays l. Detected using ssr markers

    Martin Vivodík

    2017-01-01

    Full Text Available The SSR molecular markers were used to assess genetic diversity in 40 old European maize genotypes. Ten SSR primers revealed a total of 65 alleles ranging from 4 (UMC1060 to 8 (UMC2002 and UMC1155 alleles per locus with a mean value of 6.50 alleles per locus. The PIC values ranged from 0.713 (UMC1060 to 0.842 (UMC2002 with an average value of 0.810 and the DI value ranged from 0.734 (UMC1060 to 0.848 (UMC2002 with an average value of 0.819. 100% of used SSR markers had PIC and DI values higher than 0.7 that means high polymorphism of chosen markers used for analysis. Probability of identity (PI was low ranged from 0.004 (UMC1072 to 0.022 (UMC1060 with an average of 0.008. A dendrogram was constructed from a genetic distance matrix based on profiles of the 10 maize SSR loci using the unweighted pair-group method with the arithmetic average (UPGMA. According to analysis, the collection of 40 diverse accessions of maize was clustered into four clusters. The first cluster contained nine genotypes of maize, while the second cluster contained the four genotypes of maize. The third cluster contained 5 maize genotypes. Cluster 4 contained five genotypes from Hungary (22.73%, two genotypes from Poland (9.10%, seven genotypes of maize from Union of Soviet Socialist Republics (31.81%, six genotypes from Czechoslovakia (27.27%, one genotype from Slovak Republic (4.55% and one genotype of maize is from Yugoslavia (4.55%. We could not distinguish 4 maize genotypes grouped in cluster 4, (Voroneskaja and Kocovska Skora and 2 Hungarian maize genotypes - Feheres Sarga Filleres and Mindszentpusztai Feher, which are genetically the closest.

  19. Polymorphism of proteins in selected slovak winter wheat genotypes using SDS-PAGE

    Dana Miháliková

    2016-12-01

    Full Text Available Winter wheat is especially used for bread-making. The specific composition of the grain storage proteins and the representation of individual subunits determines the baking quality of wheat. The aim of this study was to analyze 15 slovak varieties of the winter wheat (Triticum aestivum L. based on protein polymorphism and to predict their technological quality. SDS-PAGE method by ISTA was used to separate glutenin protein subunits. Glutenins were separated into HMW-GS (15.13% and LMW-GS (65.89% on the basis of molecular weight in SDS-PAGE. At the locus Glu-A1 was found allele Null (53% of genotypes and allele 1 (47% of genotypes. The locus Glu-B1 was represented by the HMW-GS subunits 6+8 (33% of genotypes, 7+8 (27% of genotypes, 7+9 (40% of genotypes. At the locus Glu-D1 were detected two subunits, 2+12 (33% of genotypes and 5+10 (67% of genotypes which is correlated with good bread-making properties. The Glu – score was ranged from 4 (genotype Viglanka to 10 (genotypes Viola, Vladarka. According to the representation of individual glutenin subunits in samples, the dendrogram of genetic similarity was constructed. By the prediction of quality the results showed that the best technological quality was significant in the varieties Viola and Vladarka which are suitable for use in food processing.

  20. GMFilter and SXTestPlate: software tools for improving the SNPlex™ genotyping system

    Schreiber Stefan

    2009-03-01

    Full Text Available Abstract Background Genotyping of single-nucleotide polymorphisms (SNPs is a fundamental technology in modern genetics. The SNPlex™ mid-throughput genotyping system (Applied Biosystems, Foster City, CA, USA enables the multiplexed genotyping of up to 48 SNPs simultaneously in a single DNA sample. The high level of automation and the large amount of data produced in a high-throughput laboratory require advanced software tools for quality control and workflow management. Results We have developed two programs, which address two main aspects of quality control in a SNPlex™ genotyping environment: GMFilter improves the analysis of SNPlex™ plates by removing wells with a low overall signal intensity. It enables scientists to automatically process the raw data in a standardized way before analyzing a plate with the proprietary GeneMapper software from Applied Biosystems. SXTestPlate examines the genotype concordance of a SNPlex™ test plate, which was typed with a control SNP set. This program allows for regular quality control checks of a SNPlex™ genotyping platform. It is compatible to other genotyping methods as well. Conclusion GMFilter and SXTestPlate provide a valuable tool set for laboratories engaged in genotyping based on the SNPlex™ system. The programs enhance the analysis of SNPlex™ plates with the GeneMapper software and enable scientists to evaluate the performance of their genotyping platform.

  1. Identification of Coxiella burnetii genotypes in Croatia using multi-locus VNTR analysis.

    Račić, Ivana; Spičić, Silvio; Galov, Ana; Duvnjak, Sanja; Zdelar-Tuk, Maja; Vujnović, Anja; Habrun, Boris; Cvetnić, Zeljko

    2014-10-10

    Although Q fever affects humans and animals in Croatia, we are unaware of genotyping studies of Croatian strains of the causative pathogen Coxiella burnetii, which would greatly assist monitoring and control efforts. Here 3261 human and animal samples were screened for C. burnetii DNA by conventional PCR, and 335 (10.3%) were positive. Of these positive samples, 82 were genotyped at 17 loci using the relatively new method of multi-locus variable number tandem repeat analysis (MLVA). We identified 13 C. burnetii genotypes not previously reported anywhere in the world. Two of these 13 genotypes are typical of the continental part of Croatia and share more similarity with genotypes outside Croatia than with genotypes within the country. The remaining 11 novel genotypes are typical of the coastal part of Croatia and show more similarity to one another than to genotypes outside the country. Our findings shed new light on the phylogeny of C. burnetii strains and may help establish MLVA as a standard technique for Coxiella genotyping. Copyright © 2014 Elsevier B.V. All rights reserved.

  2. Hepatitis C virus genotypes among multiply transfused hemoglobinopathy patients from Northern Iraq

    Adil A Othman

    2014-01-01

    Full Text Available Background and Aim: Owing to the scarcity of data on hepatitis C virus (HCV genotypes in Iraq and due to their epidemiological as well as therapy implications, this study was initiated aiming at determining these genotypes in Northern Iraq. Materials and Methods: A total of 70 HCV antibody positive multi transfused patients with hemoglobinopathies, who had detectable HCV ribonucleic acid, were recruited for genotyping using genotype-specific nested polymerase chain reaction. Results: The most frequent genotype detected was genotype 4 (52.9% followed by 3a (17.1%, 1b (12.9% and 1a (1.4%, while mixed genotypes (4 with either 3a or 1b were detected in 7.1%. Conclusion: The predominance of genotype 4 is similar to other studies from surrounding Eastern Mediterranean Arab countries and to the only earlier study from central Iraq, however the significant high proportion of 3a and scarcity of 1a, are in contrast to the latter study and may be explainable by the differing population interactions in this part of Iraq. This study complements previous studies from Eastern Mediterranean region and demonstrates relative heterogeneity of HCV genotype distribution within Iraq and should trigger further studies in other parts of the country.

  3. Allelopathic interference of alfalfa (Medicago sativa L.) genotypes to annual ryegrass (Lolium rigidum).

    Zubair, Hasan Muhammad; Pratley, James E; Sandral, G A; Humphries, A

    2017-07-01

    Alfalfa (Medicago sativa L.) genotypes at varying densities were investigated for allelopathic impact using annual ryegrass (Lolium rigidum) as the target species in a laboratory bioassay. Three densities (15, 30, and 50 seedlings/beaker) and 40 alfalfa genotypes were evaluated by the equal compartment agar method (ECAM). Alfalfa genotypes displayed a range of allelopathic interference in ryegrass seedlings, reducing root length from 5 to 65%. The growth of ryegrass decreased in response to increasing density of alfalfa seedlings. At the lowest density, Q75 and Titan9 were the least allelopathic genotypes. An overall inhibition index was calculated to rank each alfalfa genotype. Reduction in seed germination of annual ryegrass occurred in the presence of several alfalfa genotypes including Force 10, Haymaster7 and SARDI Five. A comprehensive metabolomic analysis using Quadruple Time of Flight (Q-TOF), was conducted to compare six alfalfa genotypes. Variation in chemical compounds was found between alfalfa root extracts and exudates and also between genotypes. Further individual compound assessments and quantitative study at greater chemical concentrations are needed to clarify the allelopathic activity. Considerable genetic variation exists among alfalfa genotypes for allelopathic activity creating the opportunity for its use in weed suppression through selection.

  4. [The HVR genotypes and their relationship with the resistance of methicillin-resistant staphylococci].

    Liao, F; Fan, X; Lü, X; Feng, P

    2001-06-01

    To investigate the HVR-PCR genotype of methicillin-resistant Staphylococci in local hospitals and compare it with the antibiograms, with aview to selecting effective antibacterial agents, moreover, to discuss preliminarily its role in molecular epidemiology. The minimal inhibitory concentrations(MICs) of 86 MRSA, 10 MRSE(Mc'S. epidemidis), 5 MSSE(Mc'S. epidemidis), 8 MRSH(Mc'S. haemolyticus) and 5 MSSH(Mc'S. haemolyticus) clinical isolates collected from 4 local hospitals were tested by serial two-fold agar dilution method; their DNA were extracted by moved basic lytic method, whose polymerase chain reaction(PCR) products amplified, based on the size of mec-associated hypervariable region(HVR) were analyzed by PAG vertical and agarose gel electrophoresis. MRSA, MRSE and MRSH were grouped into 4, 3 and 2 HVR genotypes respectively according to the size of the PCR products. The PCR products amplified from 9 of 10 MRSE isolates were the same as the products amplified from MRSA isolates. MRSA strains in this study were mainly HVR genotypes A and D, which accounted for 52.32% and 39.53%; Genotypes B and C were the most multi-drug resistant, but genotype D was multi-sensitive. The I genotype of MRSE was multi-drug resistant, but its genotype III was multi-drug sensitive. The genotype a of MRSH was more resistant than genotype b. These results suggest that HVR-PCR genotype method is an easy and fast method for epidemiological investigation of nosocomial infections caused by MRSA, and it is helpful for clinical selection of antibacterial agents. This method can compare the mec determinants of MRSA and Mc'CNSt isolates and hence to search for the origin of the mec determinant.

  5. Pervasive Genotypic Mosaicism in Founder Mice Derived from Genome Editing through Pronuclear Injection.

    Daniel Oliver

    Full Text Available Genome editing technologies, especially the Cas9/CRISPR system, have revolutionized biomedical research over the past several years. Generation of novel alleles has been simplified to unprecedented levels, allowing for rapid expansion of available genetic tool kits for researchers. However, the issue of genotypic mosaicism has become evident, making stringent analyses of the penetrance of genome-edited alleles essential. Here, we report that founder mice, derived from pronuclear injection of ZFNs or a mix of guidance RNAs and Cas9 mRNAs, display consistent genotypic mosaicism for both deletion and insertion alleles. To identify founders with greater possibility of transmitting the mutant allele through the germline, we developed an effective germline genotyping method. The awareness of the inherent genotypic mosaicism issue with genome editing will allow for a more efficient implementation of the technologies, and the germline genotyping method will save valuable time and resources.

  6. AMMI analysis to evaluate the adaptability and phenotypic stability of sugarcane genotypes

    Luís Cláudio Inácio da Silveira

    2013-02-01

    Full Text Available Sugarcane (Saccharum sp. is one of the most important crops in Brazil. The high demand for sugarcane-derived products has stimulated the expansion of sugarcane cultivation in recent years, exploring different environments. The adaptability and the phenotypic stability of sugarcane genotypes in the Minas Gerais state, Brazil, were evaluated based on the additive main effects and multiplicative interaction (AMMI method. We evaluated 15 genotypes (13 clones and two checks: RB867515 and RB72454 in nine environments. The average of two cuttings for the variable tons of pol per hectare (TPH measure was used to discriminate genotypes. Besides the check RB867515 (20.44 t ha-1, the genotype RB987935 showed a high average TPH (20.71 t ha-1, general adaptability and phenotypic stability, and should be suitable for cultivation in the target region. The AMMI method allowed for easy visual identification of superior genotypes for each set of environments.

  7. Applications of blood group genotyping

    Mariza A. Mota

    2006-03-01

    Full Text Available Introduction: The determination of blood group polymorphism atthe genomic level facilitates the resolution of clinical problemsthat cannot be addressed by hemagglutination. They are useful to(a determine antigen types for which currently available antibodiesare weakly reactive; (b type patients who have been recentlytransfused; (c identify fetuses at risk for hemolytic disease of thenewborn; and (d to increase the reliability of repositories of antigennegative RBCs for transfusion. Objectives: This review assessedthe current applications of blood group genotyping in transfusionmedicine and hemolytic disease of the newborn. Search strategy:Blood group genotyping studies and reviews were searched ingeneral database (MEDLINE and references were reviewed.Selection criteria: All published data and reviews were eligible forinclusion provided they reported results for molecular basis ofblood group antigens, DNA analysis for blood group polymorphisms,determination of fetal group status and applications of blood groupgenotyping in blood transfusion. Data collection: All data werecollected based on studies and reviews of blood grouppolymorphisms and their clinical applications.

  8. An Efficient, Simple, and Noninvasive Procedure for Genotyping Aquatic and Nonaquatic Laboratory Animals.

    Okada, Morihiro; Miller, Thomas C; Roediger, Julia; Shi, Yun-Bo; Schech, Joseph Mat

    2017-09-01

    Various animal models are indispensible in biomedical research. Increasing awareness and regulations have prompted the adaptation of more humane approaches in the use of laboratory animals. With the development of easier and faster methodologies to generate genetically altered animals, convenient and humane methods to genotype these animals are important for research involving such animals. Here, we report skin swabbing as a simple and noninvasive method for extracting genomic DNA from mice and frogs for genotyping. We show that this method is highly reliable and suitable for both immature and adult animals. Our approach allows a simpler and more humane approach for genotyping vertebrate animals.

  9. Grain yield stability of early maize genotypes

    Chitra Bahadur Kunwar

    2016-12-01

    Full Text Available The objective of this study was to estimate grain yield stability of early maize genotypes. Five early maize genotypes namely Pool-17, Arun1EV, Arun-4, Arun-2 and Farmer’s variety were evaluated using Randomized Complete Block Design along with three replications at four different locations namely Rampur, Rajahar, Pakhribas and Kabre districts of Nepal during summer seasons of three consecutive years from 2010 to 2012 under farmer’s fields. Genotype and genotype × environment (GGE biplot was used to identify superior genotype for grain yield and stability pattern. The genotypes Arun-1 EV and Arun-4 were better adapted for Kabre and Pakhribas where as pool-17 for Rajahar environments. The overall findings showed that Arun-1EV was more stable followed by Arun-2 therefore these two varieties can be recommended to farmers for cultivation in both environments.

  10. Screening of salinity tolerant jute (corchorus capsularis and c. olitorius) genotypes via phenotypic and phsiology-assisted procedures

    Hongyu, M.A.; Wang, Z.; Wang, X.

    2011-01-01

    To obtain salt tolerant genotypes, salt tolerance of 10 jute genotypes of different origins was evaluated by relative salt harm rate at germination stage and by index of salt harm at seedling stage, respectively. The results indicated that salt tolerance of germination stage of jute was consistent with that of seedling stage, with a markedly significant (P < 0.01) correlation of 0.8432 (n =10). Two high salt tolerant genotypes (Huang No.1 and 9511) and two salt sensitive genotypes (Mengyuan and 07-21) were screened out by these methods. Further activity analysis of POD, SOD and CAT and determination of MDA content at seedling stage validated that genotypes Huang No.1 and 9511 were more salt tolerant than genotypes Mengyuan and 07-21. Our results indicated that the combination of relative salt harm rate at germination stage and index of salt harm at seedling stage can be used to evaluate salt tolerance of jute genotypes. (author)

  11. Heterogeneous recombination among Hepatitis B virus genotypes.

    Castelhano, Nadine; Araujo, Natalia M; Arenas, Miguel

    2017-10-01

    The rapid evolution of Hepatitis B virus (HBV) through both evolutionary forces, mutation and recombination, allows this virus to generate a large variety of adapted variants at both intra and inter-host levels. It can, for instance, generate drug resistance or the diverse viral genotypes that currently exist in the HBV epidemics. Concerning the latter, it is known that recombination played a major role in the emergence and genetic diversification of novel genotypes. In this regard, the quantification of viral recombination in each genotype can provide relevant information to devise expectations about the evolutionary trends of the epidemic. Here we measured the amount of this evolutionary force by estimating global and local recombination rates in >4700 HBV complete genome sequences corresponding to nine (A to I) HBV genotypes. Counterintuitively, we found that genotype E presents extremely high levels of recombination, followed by genotypes B and C. On the other hand, genotype G presents the lowest level, where recombination is almost negligible. We discuss these findings in the light of known characteristics of these genotypes. Additionally, we present a phylogenetic network to depict the evolutionary history of the studied HBV genotypes. This network clearly classified all genotypes into specific groups and indicated that diverse pairs of genotypes are derived from a common ancestor (i.e., C-I, D-E and, F-H) although still the origin of this virus presented large uncertainty. Altogether we conclude that the amount of observed recombination is heterogeneous among HBV genotypes and that this heterogeneity can influence on the future expansion of the epidemic. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Improving the precision of genotype selection in wheat performance trials

    Giovani Benin

    2013-12-01

    Full Text Available The aim of this study was to verify whether using the Papadakis method improves model assumptions and experimental accuracy in field trials used to determine grain yield for wheat lineages indifferent Value for Cultivation and Use (VCU regions. Grain yield data from 572 field trials at 31 locations in the VCU Regions 1, 2, 3 and 4 in 2007-2011 were used. Each trial was run with and without the use of the Papadakis method. The Papadakis method improved the indices of experimental precision measures and reduced the number of experimental repetitions required to predict grain yield performance among the wheat genotypes. There were differences among the wheat adaptation regions in terms of the efficiency of the Papadakis method, the adjustment coefficient of the genotype averages and the increases in the selective accuracy of grain yield.

  13. Hepatitis C virus genotypes in Myanmar.

    Win, Nan Nwe; Kanda, Tatsuo; Nakamoto, Shingo; Yokosuka, Osamu; Shirasawa, Hiroshi

    2016-07-21

    Myanmar is adjacent to India, Bangladesh, Thailand, Laos and China. In Myanmar, the prevalence of hepatitis C virus (HCV) infection is 2%, and HCV infection accounts for 25% of hepatocellular carcinoma. In this study, we reviewed the prevalence of HCV genotypes in Myanmar. HCV genotypes 1, 3 and 6 were observed in volunteer blood donors in and around the Myanmar city of Yangon. Although there are several reports of HCV genotype 6 and its variants in Myanmar, the distribution of the HCV genotypes has not been well documented in areas other than Yangon. Previous studies showed that treatment with peginterferon and a weight-based dose of ribavirin for 24 or 48 wk could lead to an 80%-100% sustained virological response (SVR) rates in Myanmar. Current interferon-free treatments could lead to higher SVR rates (90%-95%) in patients infected with almost all HCV genotypes other than HCV genotype 3. In an era of heavy reliance on direct-acting antivirals against HCV, there is an increasing need to measure HCV genotypes, and this need will also increase specifically in Myanmar. Current available information of HCV genotypes were mostly from Yangon and other countries than Myanmar. The prevalence of HCV genotypes in Myanmar should be determined.

  14. Hepatitis C Virus: Virology and Genotypes

    Abdelaziz, Ahmed

    2017-12-01

    Hepatitis C virus (HCV) is a major causative agent of chronic liver disease worldwide. HCV is characterized by genetic heterogeneity, with at least six genotypes identified. The geographic distribution of genotypes has shown variations in different parts of the world over the past decade because of variations in population structure, immigration, and routes of transmission. Genotype differences are of epidemiologic interest and help the study of viral transmission dynamics to trace the source of HCV infection in a given population. HCV genotypes are also of considerable clinical importance because they affect response to antiviral therapy and represent a challenging obstacle for vaccine development.

  15. Approaches to genotyping individual miracidia of Schistosoma japonicum.

    Xiao, Ning; Remais, Justin V; Brindley, Paul J; Qiu, Dong-Chuan; Carlton, Elizabeth J; Li, Rong-Zhi; Lei, Yang; Blair, David

    2013-12-01

    Molecular genetic tools are needed to address questions as to the source and dynamics of transmission of the human blood fluke Schistosoma japonicum in regions where human infections have reemerged, and to characterize infrapopulations in individual hosts. The life stage that interests us as a target for collecting genotypic data is the miracidium, a very small larval stage that consequently yields very little DNA for analysis. Here, we report the successful development of a multiplex format permitting genotyping of 17 microsatellite loci in four sequential multiplex reactions using a single miracidium held on a Whatman Classic FTA indicating card. This approach was successful after short storage periods, but after long storage (>4 years), considerable difficulty was encountered in multiplex genotyping, necessitating the use of whole genome amplification (WGA) methods. WGA applied to cards stored for long periods of time resulted in sufficient DNA for accurate and repeatable genotyping. Trials and tests of these methods, as well as application to some field-collected samples, are reported, along with the discussion of the potential insights to be gained from such techniques. These include recognition of sibships among miracidia from a single host, and inference of the minimum number of worm pairs that might be present in a host.

  16. Resolving ambiguity in the phylogenetic relationship of genotypes A, B, and C of hepatitis B virus

    2013-01-01

    Background Hepatitis B virus (HBV) is an important infectious agent that causes widespread concern because billions of people are infected by at least 8 different HBV genotypes worldwide. However, reconstruction of the phylogenetic relationship between HBV genotypes is difficult. Specifically, the phylogenetic relationships among genotypes A, B, and C are not clear from previous studies because of the confounding effects of genotype recombination. In order to clarify the evolutionary relationships, a rigorous approach is required that can effectively explore genetic sequences with recombination. Result In the present study, phylogenetic relationship of the HBV genotypes was reconstructed using a consensus phylogeny of phylogenetic trees of HBV genome segments. Reliability of the reconstructed phylogeny was extensively evaluated in agreements of local phylogenies of genome segments. The reconstructed phylogenetic tree revealed that HBV genotypes B and C had a closer phylogenetic relationship than genotypes A and B or A and C. Evaluations showed the consensus method was capable to reconstruct reliable phylogenetic relationship in the presence of recombinants. Conclusion The consensus method implemented in this study provides an alternative approach for reconstructing reliable phylogenetic relationships for viruses with possible genetic recombination. Our approach revealed the phylogenetic relationships of genotypes A, B, and C of HBV. PMID:23758960

  17. Comparison of advanced imputation algorithms for detection of transportation mode and activity episode using GPS data

    Feng, T.; Timmermans, H.J.P.

    2016-01-01

    Global Positioning System (GPS) technologies have been increasingly considered as an alternative to traditional travel survey methods to collect activity-travel data. Algorithms applied to extract activity-travel patterns vary from informal ad-hoc decision rules to advanced machine learning methods

  18. Procedures for identifying S-allele genotypes of Brassica.

    Wallace, D H

    1979-11-01

    Procedures are described for efficient selection of: (1) homozygous and heterozygous S-allele genotypes; (2) homozygous inbreds with the strong self- and sib-incompatibility required for effective seed production of single-cross F1 hybrids; (3) heterozygous genotypes with the high self- and sib-incompatibility required for effective seed production of 3- and 4-way hybrids.From reciprocal crosses between two first generation inbred (I1) plants there are three potential results: both crosses are incompatible; one is incompatible and the other compatible; and both are compatible. Incompatibility of both crosses is useful information only when combined with data from other reciprocal crosses. Each compatible cross, depending on whether its reciprocal is incompatible or compatible, dictates alternative reasoning and additional reciprocal crosses for efficiently and simultaneously identifying: (A) the S-allele genotype of all individual I1 plants, and (B) the expressions of dominance or codominance in pollen and stigma (sexual organs) of an S-allele heterozygous genotype. Reciprocal crosses provide the only efficient means of identifying S-allele genotypes and also the sexual-organ x S-allele-interaction types.Fluorescent microscope assay of pollen tube penetration into the style facilitates quantitation within 24-48 hours of incompatibility and compatibility of the reciprocal crosses. A procedure for quantitating the reciprocal difference is described that maximizes informational content of the data about interactions between S alleles in pollen and stigma of the S-allele-heterozygous genotype.Use of the non-inbred Io generation parent as a 'known' heterozygous S-allele genotype in crosses with its first generation selfed (I1) progeny usually reduces at least 7 fold the effort required for achieving objectives 1, 2, and 3, compared to the method of making reciprocal crosses only among I1 plants.Identifying the heterozygous and both homozygous S-allele genotypes during

  19. Sources of Wilhelm Johannsen's genotype theory.

    Roll-Hansen, Nils

    2009-01-01

    This paper describes the historical background and early formation of Wilhelm Johannsen's distinction between genotype and phenotype. It is argued that contrary to a widely accepted interpretation (For instance, W. Provine, 1971. The Origins of Theoretical Population Genetics. Chicago: The University of Chicago Press; Mayr, 1973; F. B. Churchill, 1974. Journal of the History of Biology 7: 5-30; E. Mayr, 1982. The Growth of Biological Thought, Cambridge: Harvard University Press; J. Sapp, 2003. Genesis. The Evolution of Biology. New York: Oxford University Press) his concepts referred primarily to properties of individual organisms and not to statistical averages. Johannsen's concept of genotype was derived from the idea of species in the tradition of biological systematics from Linnaeus to de Vries: An individual belonged to a group - species, subspecies, elementary species - by representing a certain underlying type (S. Müller-Wille and V. Orel, 2007. Annals of Science 64: 171-215). Johannsen sharpened this idea theoretically in the light of recent biological discoveries, not least those of cytology. He tested and confirmed it experimentally combining the methods of biometry, as developed by Francis Galton, with the individual selection method and pedigree analysis, as developed for instance by Louis Vilmorin. The term "genotype" was introduced in W. Johannsen's 1909 (Elemente der Exakten Erblichkeitslehre. Jena: Gustav Fischer) treatise, but the idea of a stable underlying biological "type" distinct from observable properties was the core idea of his classical bean selection experiment published 6 years earlier (W. Johannsen, 1903. Ueber Erblichkeit in Populationen und reinen Linien. Eine Beitrag zur Beleuchtung schwebender Selektionsfragen, Jena: Gustav Fischer, pp. 58-59). The individual ontological foundation of population analysis was a self-evident presupposition in Johannsen's studies of heredity in populations from their start in the early 1890s till his

  20. SNP high-throughput screening in grapevine using the SNPlex™ genotyping system

    Velasco Riccardo

    2008-01-01

    Full Text Available Abstract Background Until recently, only a small number of low- and mid-throughput methods have been used for single nucleotide polymorphism (SNP discovery and genotyping in grapevine (Vitis vinifera L.. However, following completion of the sequence of the highly heterozygous genome of Pinot Noir, it has been possible to identify millions of electronic SNPs (eSNPs thus providing a valuable source for high-throughput genotyping methods. Results Herein we report the first application of the SNPlex™ genotyping system in grapevine aiming at the anchoring of an eukaryotic genome. This approach combines robust SNP detection with automated assay readout and data analysis. 813 candidate eSNPs were developed from non-repetitive contigs of the assembled genome of Pinot Noir and tested in 90 progeny of Syrah × Pinot Noir cross. 563 new SNP-based markers were obtained and mapped. The efficiency rate of 69% was enhanced to 80% when multiple displacement amplification (MDA methods were used for preparation of genomic DNA for the SNPlex assay. Conclusion Unlike other SNP genotyping methods used to investigate thousands of SNPs in a few genotypes, or a few SNPs in around a thousand genotypes, the SNPlex genotyping system represents a good compromise to investigate several hundred SNPs in a hundred or more samples simultaneously. Therefore, the use of the SNPlex assay, coupled with whole genome amplification (WGA, is a good solution for future applications in well-equipped laboratories.

  1. Selection of common bean genotypes for the Cerrado/Pantanal ecotone via mixed models and multivariate analysis.

    Corrêa, A M; Pereira, M I S; de Abreu, H K A; Sharon, T; de Melo, C L P; Ito, M A; Teodoro, P E; Bhering, L L

    2016-10-17

    The common bean, Phaseolus vulgaris, is predominantly grown on small farms and lacks accurate genotype recommendations for specific micro-regions in Brazil. This contributes to a low national average yield. The aim of this study was to use the methods of the harmonic mean of the relative performance of genetic values (HMRPGV) and the centroid, for selecting common bean genotypes with high yield, adaptability, and stability for the Cerrado/Pantanal ecotone region in Brazil. We evaluated 11 common bean genotypes in three trials carried out in the dry season in Aquidauana in 2013, 2014, and 2015. A likelihood ratio test detected a significant interaction between genotype x year, contributing 54% to the total phenotypic variation in grain yield. The three genotypes selected by the joint analysis of genotypic values in all years (Carioca Precoce, BRS Notável, and CNFC 15875) were the same as those recommended by the HMRPGV method. Using the centroid method, genotypes BRS Notável and CNFC 15875 were considered ideal genotypes based on their high stability to unfavorable environments and high responsiveness to environmental improvement. We identified a high association between the methods of adaptability and stability used in this study. However, the use of centroid method provided a more accurate and precise recommendation of the behavior of the evaluated genotypes.

  2. An Evaluation of Quantitative PCR Assays (TaqMan® and SYBR Green for the Detection of Babesia bigemina and Babesia bovis, and a Novel Fluorescent-ITS1-PCR Capillary Electrophoresis Method for Genotyping B. bovis Isolates

    Bing Zhang

    2016-09-01

    Full Text Available Babesia spp. are tick-transmitted haemoparasites causing tick fever in cattle. In Australia, economic losses to the cattle industry from tick fever are estimated at AUD$26 Million per annum. If animals recover from these infections, they become immune carriers. Here we describe a novel multiplex TaqMan qPCR targeting cytochrome b genes for the identification of Babesia spp. The assay shows high sensitivity, specificity and reproducibility, and allows quantification of parasite DNA from Babesia bovis and B. bigemina compared to standard PCR assays. A previously published cytochrome b SYBR Green qPCR was also tested in this study, showing slightly higher sensitivity than the Taqman qPCRs but requires melting curve analysis post-PCR to confirm specificity. The SYBR Green assays were further evaluated using both diagnostic submissions and vaccinated cattle (at 7, 9, 11 and 14 days post-inoculation showed that B. bigemina can be detected more frequently than B. bovis. Due to fewer circulating parasites, B. bovis detection in carrier animals requires higher DNA input. Preliminary data for a novel fluorescent PCR genotyping based on the Internal Transcribed Spacer 1 region to detect vaccine and field alleles of B. bovis are described. This assay is capable of detecting vaccine and novel field isolate alleles in a single sample.

  3. Development of a near-infrared spectroscopy method (NIRS) for fast analysis of total, indolic, aliphatic and individual glucosinolates in new bred open pollinating genotypes of broccoli (Brassica oleracea convar. botrytis var. italica).

    Sahamishirazi, Samira; Zikeli, Sabine; Fleck, Michael; Claupein, Wilhelm; Graeff-Hoenninger, Simone

    2017-10-01

    This study describes the development of near-infrared spectroscopy (NIRS) calibration to determine individual and total glucosinolates (GSLs) content of 12 new-bred open-pollinating genotypes of broccoli (Brassica oleracea convar. botrytis var. italica). Six individual GSLs were identified using high-performance-liquid chromatography (HPLC). The NIRS calibration was established based on modified partial least squares regression with reference values of HPLC. The calibration was analyzed using coefficient of determination in prediction (R 2 ) and ratio of preference of determination (RPD). Large variation occurred in the calibrations, R 2 and RPD due to the variability of the samples. Derived calibrations for total-GSLs, aliphatic-GSLs, glucoraphanin and 4-methoxyglucobrassicin were quantitative with a high accuracy (RPD=1.36, 1.65, 1.63, 1.11) while, for indole-GSLs, glucosinigrin, glucoiberin, glucobrassicin and 1-methoxyglucobrassicin were more qualitative (RPD=0.95, 0.62, 0.67, 0.81, 0.56). Overall, the results indicated NIRS has a good potential to determine different GSLs in a large sample pool of broccoli quantitatively and qualitatively. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. Accurate genotyping across variant classes and lengths using variant graphs

    Sibbesen, Jonas Andreas; Maretty, Lasse; Jensen, Jacob Ma