utilizing genotype imputation: Topics by WorldWideScience.org

Sample records for utilizing genotype imputation

LinkImputeR: user-guided genotype calling and imputation for non-model organisms.

Science.gov (United States)

Money, Daniel; Migicovsky, Zoë; Gardner, Kyle; Myles, Sean

2017-07-10

Genomic studies such as genome-wide association and genomic selection require genome-wide genotype data. All existing technologies used to create these data result in missing genotypes, which are often then inferred using genotype imputation software. However, existing imputation methods most often make use only of genotypes that are successfully inferred after having passed a certain read depth threshold. Because of this, any read information for genotypes that did not pass the threshold, and were thus set to missing, is ignored. Most genomic studies also choose read depth thresholds and quality filters without investigating their effects on the size and quality of the resulting genotype data. Moreover, almost all genotype imputation methods require ordered markers and are therefore of limited utility in non-model organisms. Here we introduce LinkImputeR, a software program that exploits the read count information that is normally ignored, and makes use of all available DNA sequence information for the purposes of genotype calling and imputation. It is specifically designed for non-model organisms since it requires neither ordered markers nor a reference panel of genotypes. Using next-generation DNA sequence (NGS) data from apple, cannabis and grape, we quantify the effect of varying read count and missingness thresholds on the quantity and quality of genotypes generated from LinkImputeR. We demonstrate that LinkImputeR can increase the number of genotype calls by more than an order of magnitude, can improve genotyping accuracy by several percent and can thus improve the power of downstream analyses. Moreover, we show that the effects of quality and read depth filters can differ substantially between data sets and should therefore be investigated on a per-study basis. By exploiting DNA sequence data that is normally ignored during genotype calling and imputation, LinkImputeR can significantly improve both the quantity and quality of genotype data generated from
Assessing accuracy of genotype imputation in American Indians.

Directory of Open Access Journals (Sweden)

Alka Malhotra

Full Text Available Genotype imputation is commonly used in genetic association studies to test untyped variants using information on linkage disequilibrium (LD with typed markers. Imputing genotypes requires a suitable reference population in which the LD pattern is known, most often one selected from HapMap. However, some populations, such as American Indians, are not represented in HapMap. In the present study, we assessed accuracy of imputation using HapMap reference populations in a genome-wide association study in Pima Indians.Data from six randomly selected chromosomes were used. Genotypes in the study population were masked (either 1% or 20% of SNPs available for a given chromosome. The masked genotypes were then imputed using the software Markov Chain Haplotyping Algorithm. Using four HapMap reference populations, average genotype error rates ranged from 7.86% for Mexican Americans to 22.30% for Yoruba. In contrast, use of the original Pima Indian data as a reference resulted in an average error rate of 1.73%.Our results suggest that the use of HapMap reference populations results in substantial inaccuracy in the imputation of genotypes in American Indians. A possible solution would be to densely genotype or sequence a reference American Indian population.
Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS Data.

Directory of Open Access Journals (Sweden)

Ariel W Chan

Full Text Available Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS methods, such as Genotyping-By-Sequencing (GBS, offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1 can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2 are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'. We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and
Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.

Science.gov (United States)

Chan, Ariel W; Hamblin, Martha T; Jannink, Jean-Luc

2016-01-01

Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the
A spatial haplotype copying model with applications to genotype imputation.

Science.gov (United States)

Yang, Wen-Yun; Hormozdiari, Farhad; Eskin, Eleazar; Pasaniuc, Bogdan

2015-05-01

Ever since its introduction, the haplotype copy model has proven to be one of the most successful approaches for modeling genetic variation in human populations, with applications ranging from ancestry inference to genotype phasing and imputation. Motivated by coalescent theory, this approach assumes that any chromosome (haplotype) can be modeled as a mosaic of segments copied from a set of chromosomes sampled from the same population. At the core of the model is the assumption that any chromosome from the sample is equally likely to contribute a priori to the copying process. Motivated by recent works that model genetic variation in a geographic continuum, we propose a new spatial-aware haplotype copy model that jointly models geography and the haplotype copying process. We extend hidden Markov models of haplotype diversity such that at any given location, haplotypes that are closest in the genetic-geographic continuum map are a priori more likely to contribute to the copying process than distant ones. Through simulations starting from the 1000 Genomes data, we show that our model achieves superior accuracy in genotype imputation over the standard spatial-unaware haplotype copy model. In addition, we show the utility of our model in selecting a small personalized reference panel for imputation that leads to both improved accuracy as well as to a lower computational runtime than the standard approach. Finally, we show our proposed model can be used to localize individuals on the genetic-geographical map on the basis of their genotype data.
Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy.

Science.gov (United States)

Johnson, Eric O; Hancock, Dana B; Levy, Joshua L; Gaddis, Nathan C; Saccone, Nancy L; Bierut, Laura J; Page, Grier P

2013-05-01

A great promise of publicly sharing genome-wide association data is the potential to create composite sets of controls. However, studies often use different genotyping arrays, and imputation to a common set of SNPs has shown substantial bias: a problem which has no broadly applicable solution. Based on the idea that using differing genotyped SNP sets as inputs creates differential imputation errors and thus bias in the composite set of controls, we examined the degree to which each of the following occurs: (1) imputation based on the union of genotyped SNPs (i.e., SNPs available on one or more arrays) results in bias, as evidenced by spurious associations (type 1 error) between imputed genotypes and arbitrarily assigned case/control status; (2) imputation based on the intersection of genotyped SNPs (i.e., SNPs available on all arrays) does not evidence such bias; and (3) imputation quality varies by the size of the intersection of genotyped SNP sets. Imputations were conducted in European Americans and African Americans with reference to HapMap phase II and III data. Imputation based on the union of genotyped SNPs across the Illumina 1M and 550v3 arrays showed spurious associations for 0.2 % of SNPs: ~2,000 false positives per million SNPs imputed. Biases remained problematic for very similar arrays (550v1 vs. 550v3) and were substantial for dissimilar arrays (Illumina 1M vs. Affymetrix 6.0). In all instances, imputing based on the intersection of genotyped SNPs (as few as 30 % of the total SNPs genotyped) eliminated such bias while still achieving good imputation quality.
Imputation of microsatellite alleles from dense SNP genotypes for parental verification

Directory of Open Access Journals (Sweden)

Matthew eMcclure

2012-08-01

Full Text Available Microsatellite (MS markers have recently been used for parental verification and are still the international standard despite higher cost, error rate, and turnaround time compared with Single Nucleotide Polymorphisms (SNP-based assays. Despite domestic and international interest from producers and research communities, no viable means currently exist to verify parentage for an individual unless all familial connections were analyzed using the same DNA marker type (MS or SNP. A simple and cost-effective method was devised to impute MS alleles from SNP haplotypes within breeds. For some MS, imputation results may allow inference across breeds. A total of 347 dairy cattle representing 4 dairy breeds (Brown Swiss, Guernsey, Holstein, and Jersey were used to generate reference haplotypes. This approach has been verified (>98% accurate for imputing the International Society of Animal Genetics (ISAG recommended panel of 12 MS for cattle parentage verification across a validation set of 1,307 dairy animals.. Implementation of this method will allow producers and breed associations to transition to SNP-based parentage verification utilizing MS genotypes from historical data on parents where SNP genotypes are missing. This approach may be applicable to additional cattle breeds and other species that wish to migrate from MS- to SNP- based parental verification.
Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels

Directory of Open Access Journals (Sweden)

Xiaoyi eGao

2012-06-01

Full Text Available Genotype imputation is a vital tool in genome-wide association studies (GWAS and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for reference panels. Multiple GWAS and meta-analyses are targeting Latinos, the most populous and fastest growing minority group in the US. However, genotype imputation resources for Latinos are rather limited compared to individuals of European ancestry at present, largely because of the lack of good reference data. One choice of reference panel for Latinos is one derived from the population of Mexican individuals in Los Angeles contained in the HapMap Phase 3 project and the 1000 Genomes Project. However, a detailed evaluation of the quality of the imputed genotypes derived from the public reference panels has not yet been reported. Using simulation studies, the Illumina OmniExpress GWAS data from the Los Angles Latino Eye Study and the MACH software package, we evaluated the accuracy of genotype imputation in Latinos. Our results show that the 1000 Genomes Project AMR+CEU+YRI reference panel provides the highest imputation accuracy for Latinos, and that also including Asian samples in the panel can reduce imputation accuracy. We also provide the imputation accuracy for each autosomal chromosome using the 1000 Genomes Project panel for Latinos. Our results serve as a guide to future imputation-based analysis in Latinos.
Performance of genotype imputation for low frequency and rare variants from the 1000 genomes.

Science.gov (United States)

Zheng, Hou-Feng; Rong, Jing-Jing; Liu, Ming; Han, Fang; Zhang, Xing-Wei; Richards, J Brent; Wang, Li

2015-01-01

Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF ≤ 0.3%), only 0-1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute.
Multi-generational imputation of single nucleotide polymorphism marker genotypes and accuracy of genomic selection.

Science.gov (United States)

Toghiani, S; Aggrey, S E; Rekaya, R

2016-07-01

Availability of high-density single nucleotide polymorphism (SNP) genotyping platforms provided unprecedented opportunities to enhance breeding programmes in livestock, poultry and plant species, and to better understand the genetic basis of complex traits. Using this genomic information, genomic breeding values (GEBVs), which are more accurate than conventional breeding values. The superiority of genomic selection is possible only when high-density SNP panels are used to track genes and QTLs affecting the trait. Unfortunately, even with the continuous decrease in genotyping costs, only a small fraction of the population has been genotyped with these high-density panels. It is often the case that a larger portion of the population is genotyped with low-density and low-cost SNP panels and then imputed to a higher density. Accuracy of SNP genotype imputation tends to be high when minimum requirements are met. Nevertheless, a certain rate of genotype imputation errors is unavoidable. Thus, it is reasonable to assume that the accuracy of GEBVs will be affected by imputation errors; especially, their cumulative effects over time. To evaluate the impact of multi-generational selection on the accuracy of SNP genotypes imputation and the reliability of resulting GEBVs, a simulation was carried out under varying updating of the reference population, distance between the reference and testing sets, and the approach used for the estimation of GEBVs. Using fixed reference populations, imputation accuracy decayed by about 0.5% per generation. In fact, after 25 generations, the accuracy was only 7% lower than the first generation. When the reference population was updated by either 1% or 5% of the top animals in the previous generations, decay of imputation accuracy was substantially reduced. These results indicate that low-density panels are useful, especially when the generational interval between reference and testing population is small. As the generational interval
Imputation of genotypes in Danish two-way crossbred pigs using low density panels

DEFF Research Database (Denmark)

Xiang, Tao; Christensen, Ole Fredslund; Legarra, Andres

Genotype imputation is commonly used as an initial step of genomic selection. Studies on humans, plants and ruminants suggested many factors would affect the performance of imputation. However, studies rarely investigated pigs, especially crossbred pigs. In this study, different scenarios...... of imputation from 5K SNPs to 7K SNPs on Danish Landrace, Yorkshire, and crossbred Landrace-Yorkshire were compared. In conclusion, genotype imputation on crossbreds performs equally well as in purebreds, when parental breeds are used as the reference panel. When the size of reference is considerably large...... SNPs. This dataset will be analyzed for genomic selection in a future study...
Comparison of three boosting methods in parent-offspring trios for genotype imputation using simulation study

Directory of Open Access Journals (Sweden)

Abbas Mikhchi

2016-01-01

Full Text Available Abstract Background Genotype imputation is an important process of predicting unknown genotypes, which uses reference population with dense genotypes to predict missing genotypes for both human and animal genetic variations at a low cost. Machine learning methods specially boosting methods have been used in genetic studies to explore the underlying genetic profile of disease and build models capable of predicting missing values of a marker. Methods In this study strategies and factors affecting the imputation accuracy of parent-offspring trios compared from lower-density SNP panels (5 K to high density (10 K SNP panel using three different Boosting methods namely TotalBoost (TB, LogitBoost (LB and AdaBoost (AB. The methods employed using simulated data to impute the un-typed SNPs in parent-offspring trios. Four different datasets of G1 (100 trios with 5 k SNPs, G2 (100 trios with 10 k SNPs, G3 (500 trios with 5 k SNPs, and G4 (500 trio with 10 k SNPs were simulated. In four datasets all parents were genotyped completely, and offspring genotyped with a lower density panel. Results Comparison of the three methods for imputation showed that the LB outperformed AB and TB for imputation accuracy. The time of computation were different between methods. The AB was the fastest algorithm. The higher SNP densities resulted the increase of the accuracy of imputation. Larger trios (i.e. 500 was better for performance of LB and TB. Conclusions The conclusion is that the three methods do well in terms of imputation accuracy also the dense chip is recommended for imputation of parent-offspring trios.
Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

Science.gov (United States)

Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

2016-06-21

Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. Copyright © 2016 Elsevier Ltd. All rights reserved.
Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

Directory of Open Access Journals (Sweden)

Ward Judson A

2013-01-01

Full Text Available Abstract Background Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry. Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Results Genotyping by Sequencing (GBS was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs linked these results to published maps for cross-validation and map comparison. Conclusions GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation
Imputation of missing genotypes within LD-blocks relying on the basic coalescent and beyond: consideration of population growth and structure.

Science.gov (United States)

Kabisch, Maria; Hamann, Ute; Lorenzo Bermejo, Justo

2017-10-17

Genotypes not directly measured in genetic studies are often imputed to improve statistical power and to increase mapping resolution. The accuracy of standard imputation techniques strongly depends on the similarity of linkage disequilibrium (LD) patterns in the study and reference populations. Here we develop a novel approach for genotype imputation in low-recombination regions that relies on the coalescent and permits to explicitly account for population demographic factors. To test the new method, study and reference haplotypes were simulated and gene trees were inferred under the basic coalescent and also considering population growth and structure. The reference haplotypes that first coalesced with study haplotypes were used as templates for genotype imputation. Computer simulations were complemented with the analysis of real data. Genotype concordance rates were used to compare the accuracies of coalescent-based and standard (IMPUTE2) imputation. Simulations revealed that, in LD-blocks, imputation accuracy relying on the basic coalescent was higher and less variable than with IMPUTE2. Explicit consideration of population growth and structure, even if present, did not practically improve accuracy. The advantage of coalescent-based over standard imputation increased with the minor allele frequency and it decreased with population stratification. Results based on real data indicated that, even in low-recombination regions, further research is needed to incorporate recombination in coalescence inference, in particular for studies with genetically diverse and admixed individuals. To exploit the full potential of coalescent-based methods for the imputation of missing genotypes in genetic studies, further methodological research is needed to reduce computer time, to take into account recombination, and to implement these methods in user-friendly computer programs. Here we provide reproducible code which takes advantage of publicly available software to facilitate
Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel.

Science.gov (United States)

Mitt, Mario; Kals, Mart; Pärn, Kalle; Gabriel, Stacey B; Lander, Eric S; Palotie, Aarno; Ripatti, Samuli; Morris, Andrew P; Metspalu, Andres; Esko, Tõnu; Mägi, Reedik; Palta, Priit

2017-06-01

Genetic imputation is a cost-efficient way to improve the power and resolution of genome-wide association (GWA) studies. Current publicly accessible imputation reference panels accurately predict genotypes for common variants with minor allele frequency (MAF)≥5% and low-frequency variants (0.5≤MAF<5%) across diverse populations, but the imputation of rare variation (MAF<0.5%) is still rather limited. In the current study, we evaluate imputation accuracy achieved with reference panels from diverse populations with a population-specific high-coverage (30 ×) whole-genome sequencing (WGS) based reference panel, comprising of 2244 Estonian individuals (0.25% of adult Estonians). Although the Estonian-specific panel contains fewer haplotypes and variants, the imputation confidence and accuracy of imputed low-frequency and rare variants was significantly higher. The results indicate the utility of population-specific reference panels for human genetic studies.
Improving accuracy of rare variant imputation with a two-step imputation approach

DEFF Research Database (Denmark)

Kreiner-Møller, Eskil; Medina-Gomez, Carolina; Uitterlinden, André G

2015-01-01

not being comprehensively scrutinized. Next-generation arrays ensuring sufficient coverage together with new reference panels, as the 1000 Genomes panel, are emerging to facilitate imputation of low frequent single-nucleotide polymorphisms (minor allele frequency (MAF) ... reference sample genotyped on a dense array and hereafter to the 1000 Genomes reference panel. We show that mean imputation quality, measured by the r(2) using this approach, increases by 28% for variants with a MAF between 1 and 5% as compared with direct imputation to 1000 Genomes reference. Similarly......Genotype imputation has been the pillar of the success of genome-wide association studies (GWAS) for identifying common variants associated with common diseases. However, most GWAS have been run using only 60 HapMap samples as reference for imputation, meaning less frequent and rare variants...
Improving accuracy of genomic prediction in Brangus cattle by adding animals with imputed low-density SNP genotypes.

Science.gov (United States)

Lopes, F B; Wu, X-L; Li, H; Xu, J; Perkins, T; Genho, J; Ferretti, R; Tait, R G; Bauck, S; Rosa, G J M

2018-02-01

Reliable genomic prediction of breeding values for quantitative traits requires the availability of sufficient number of animals with genotypes and phenotypes in the training set. As of 31 October 2016, there were 3,797 Brangus animals with genotypes and phenotypes. These Brangus animals were genotyped using different commercial SNP chips. Of them, the largest group consisted of 1,535 animals genotyped by the GGP-LDV4 SNP chip. The remaining 2,262 genotypes were imputed to the SNP content of the GGP-LDV4 chip, so that the number of animals available for training the genomic prediction models was more than doubled. The present study showed that the pooling of animals with both original or imputed 40K SNP genotypes substantially increased genomic prediction accuracies on the ten traits. By supplementing imputed genotypes, the relative gains in genomic prediction accuracies on estimated breeding values (EBV) were from 12.60% to 31.27%, and the relative gain in genomic prediction accuracies on de-regressed EBV was slightly small (i.e. 0.87%-18.75%). The present study also compared the performance of five genomic prediction models and two cross-validation methods. The five genomic models predicted EBV and de-regressed EBV of the ten traits similarly well. Of the two cross-validation methods, leave-one-out cross-validation maximized the number of animals at the stage of training for genomic prediction. Genomic prediction accuracy (GPA) on the ten quantitative traits was validated in 1,106 newly genotyped Brangus animals based on the SNP effects estimated in the previous set of 3,797 Brangus animals, and they were slightly lower than GPA in the original data. The present study was the first to leverage currently available genotype and phenotype resources in order to harness genomic prediction in Brangus beef cattle. © 2018 Blackwell Verlag GmbH.
The Use of Imputed Sibling Genotypes in Sibship-Based Association Analysis: On Modeling Alternatives, Power and Model Misspecification

NARCIS (Netherlands)

Minica, C.C.; Dolan, C.V.; Willemsen, G.; Vink, J.M.; Boomsma, D.I.

2013-01-01

When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of
Evaluation and application of summary statistic imputation to discover new height-associated loci.

Science.gov (United States)

Rüeger, Sina; McDaid, Aaron; Kutalik, Zoltán

2018-05-01

As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian

Imputation of genotypes from low density (50,000 markers) to high density (700,000 markers) of cows from research herds in Europe, North America, and Australasia using 2 reference populations

DEFF Research Database (Denmark)

Pryce, J E; Johnston, J; Hayes, B J

2014-01-01

detection in genome-wide association studies and the accuracy of genomic selection may increase when the low-density genotypes are imputed to higher density. Genotype data were available from 10 research herds: 5 from Europe [Denmark, Germany, Ireland, the Netherlands, and the United Kingdom (UK)], 2 from...... reference populations. Although it was not possible to use a combined reference population, which would probably result in the highest accuracies of imputation, differences arising from using 2 high-density reference populations on imputing 50,000-marker genotypes of 583 animals (from the UK) were...... information exploited. The UK animals were also included in the North American data set (n = 1,579) that was imputed to high density using a reference population of 2,018 bulls. After editing, 591,213 genotypes on 5,999 animals from 10 research herds remained. The correlation between imputed allele...
Highly accurate sequence imputation enables precise QTL mapping in Brown Swiss cattle.

Science.gov (United States)

Frischknecht, Mirjam; Pausch, Hubert; Bapst, Beat; Signer-Hasler, Heidi; Flury, Christine; Garrick, Dorian; Stricker, Christian; Fries, Ruedi; Gredler-Grandl, Birgit

2017-12-29

Within the last few years a large amount of genomic information has become available in cattle. Densities of genomic information vary from a few thousand variants up to whole genome sequence information. In order to combine genomic information from different sources and infer genotypes for a common set of variants, genotype imputation is required. In this study we evaluated the accuracy of imputation from high density chips to whole genome sequence data in Brown Swiss cattle. Using four popular imputation programs (Beagle, FImpute, Impute2, Minimac) and various compositions of reference panels, the accuracy of the imputed sequence variant genotypes was high and differences between the programs and scenarios were small. We imputed sequence variant genotypes for more than 1600 Brown Swiss bulls and performed genome-wide association studies for milk fat percentage at two stages of lactation. We found one and three quantitative trait loci for early and late lactation fat content, respectively. Known causal variants that were imputed from the sequenced reference panel were among the most significantly associated variants of the genome-wide association study. Our study demonstrates that whole-genome sequence information can be imputed at high accuracy in cattle populations. Using imputed sequence variant genotypes in genome-wide association studies may facilitate causal variant detection.
Imputation of single nucleotide polymorhpism genotypes of Hereford cattle: reference panel size, family relationship and population structure

Science.gov (United States)

The objective of this study is to investigate single nucleotide polymorphism (SNP) genotypes imputation of Hereford cattle. Purebred Herefords were from two sources, Line 1 Hereford (N=240) and representatives of Industry Herefords (N=311). Using different reference panels of 62 and 494 males with 1...
Imputation and quality control steps for combining multiple genome-wide datasets

Directory of Open Access Journals (Sweden)

Shefali S Verma

2014-12-01

Full Text Available The electronic MEdical Records and GEnomics (eMERGE network brings together DNA biobanks linked to electronic health records (EHRs from multiple institutions. Approximately 52,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes, and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2 were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.
Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation

Science.gov (United States)

Wang, Chaolong; Zhan, Xiaowei; Liang, Liming; Abecasis, Gonçalo R.; Lin, Xihong

2015-01-01

Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources. PMID:26027497
Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies

Directory of Open Access Journals (Sweden)

McElwee Joshua

2009-06-01

Full Text Available Abstract Background Although high-throughput genotyping arrays have made whole-genome association studies (WGAS feasible, only a small proportion of SNPs in the human genome are actually surveyed in such studies. In addition, various SNP arrays assay different sets of SNPs, which leads to challenges in comparing results and merging data for meta-analyses. Genome-wide imputation of untyped markers allows us to address these issues in a direct fashion. Methods 384 Caucasian American liver donors were genotyped using Illumina 650Y (Ilmn650Y arrays, from which we also derived genotypes from the Ilmn317K array. On these data, we compared two imputation methods: MACH and BEAGLE. We imputed 2.5 million HapMap Release22 SNPs, and conducted GWAS on ~40,000 liver mRNA expression traits (eQTL analysis. In addition, 200 Caucasian American and 200 African American subjects were genotyped using the Affymetrix 500 K array plus a custom 164 K fill-in chip. We then imputed the HapMap SNPs and quantified the accuracy by randomly masking observed SNPs. Results MACH and BEAGLE perform similarly with respect to imputation accuracy. The Ilmn650Y results in excellent imputation performance, and it outperforms Affx500K or Ilmn317K sets. For Caucasian Americans, 90% of the HapMap SNPs were imputed at 98% accuracy. As expected, imputation of poorly tagged SNPs (untyped SNPs in weak LD with typed markers was not as successful. It was more challenging to impute genotypes in the African American population, given (1 shorter LD blocks and (2 admixture with Caucasian populations in this population. To address issue (2, we pooled HapMap CEU and YRI data as an imputation reference set, which greatly improved overall performance. The approximate 40,000 phenotypes scored in these populations provide a path to determine empirically how the power to detect associations is affected by the imputation procedures. That is, at a fixed false discovery rate, the number of cis
PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.

Directory of Open Access Journals (Sweden)

Oren E Livne

2015-03-01

Full Text Available Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm, a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs, from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.
ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

Directory of Open Access Journals (Sweden)

Kamatani Naoyuki

2011-05-01

Full Text Available Abstract Background Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required. Results We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo. Conclusions ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.
Comparing strategies for selection of low-density SNPs for imputation-mediated genomic prediction in U. S. Holsteins.

Science.gov (United States)

He, Jun; Xu, Jiaqi; Wu, Xiao-Lin; Bauck, Stewart; Lee, Jungjae; Morota, Gota; Kachman, Stephen D; Spangler, Matthew L

2018-04-01

SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821-0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825-0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction.
Imputing amino acid polymorphisms in human leukocyte antigens.

Directory of Open Access Journals (Sweden)

Xiaoming Jia

Full Text Available DNA sequence variation within human leukocyte antigen (HLA genes mediate susceptibility to a wide range of human diseases. The complex genetic structure of the major histocompatibility complex (MHC makes it difficult, however, to collect genotyping data in large cohorts. Long-range linkage disequilibrium between HLA loci and SNP markers across the major histocompatibility complex (MHC region offers an alternative approach through imputation to interrogate HLA variation in existing GWAS data sets. Here we describe a computational strategy, SNP2HLA, to impute classical alleles and amino acid polymorphisms at class I (HLA-A, -B, -C and class II (-DPA1, -DPB1, -DQA1, -DQB1, and -DRB1 loci. To characterize performance of SNP2HLA, we constructed two European ancestry reference panels, one based on data collected in HapMap-CEPH pedigrees (90 individuals and another based on data collected by the Type 1 Diabetes Genetics Consortium (T1DGC, 5,225 individuals. We imputed HLA alleles in an independent data set from the British 1958 Birth Cohort (N = 918 with gold standard four-digit HLA types and SNPs genotyped using the Affymetrix GeneChip 500 K and Illumina Immunochip microarrays. We demonstrate that the sample size of the reference panel, rather than SNP density of the genotyping platform, is critical to achieve high imputation accuracy. Using the larger T1DGC reference panel, the average accuracy at four-digit resolution is 94.7% using the low-density Affymetrix GeneChip 500 K, and 96.7% using the high-density Illumina Immunochip. For amino acid polymorphisms within HLA genes, we achieve 98.6% and 99.3% accuracy using the Affymetrix GeneChip 500 K and Illumina Immunochip, respectively. Finally, we demonstrate how imputation and association testing at amino acid resolution can facilitate fine-mapping of primary MHC association signals, giving a specific example from type 1 diabetes.
Design of a bovine low-density SNP array optimized for imputation.

Directory of Open Access Journals (Sweden)

Didier Boichard

Full Text Available The Illumina BovineLD BeadChip was designed to support imputation to higher density genotypes in dairy and beef breeds by including single-nucleotide polymorphisms (SNPs that had a high minor allele frequency as well as uniform spacing across the genome except at the ends of the chromosome where densities were increased. The chip also includes SNPs on the Y chromosome and mitochondrial DNA loci that are useful for determining subspecies classification and certain paternal and maternal breed lineages. The total number of SNPs was 6,909. Accuracy of imputation to Illumina BovineSNP50 genotypes using the BovineLD chip was over 97% for most dairy and beef populations. The BovineLD imputations were about 3 percentage points more accurate than those from the Illumina GoldenGate Bovine3K BeadChip across multiple populations. The improvement was greatest when neither parent was genotyped. The minor allele frequencies were similar across taurine beef and dairy breeds as was the proportion of SNPs that were polymorphic. The new BovineLD chip should facilitate low-cost genomic selection in taurine beef and dairy cattle.
Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population

Directory of Open Access Journals (Sweden)

Danai Jattawa

2016-04-01

Full Text Available The objective of this study was to investigate the accuracy of imputation from low density (LDC to moderate density SNP chips (MDC in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244 from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570, GGP26K (n = 540 and GGP80K (n = 134 chips. After checking for single nucleotide polymorphism (SNP quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912 and a test group (n = 332. The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652. The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm, FImpute 2.2 (combined family- and population-based algorithms and Findhap 4 (combined family- and population-based algorithms. Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94% than Findhap (84.64% and Beagle (76.79%. Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73% or low (80% imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart. Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information.
Candidate gene analysis using imputed genotypes: cell cycle single-nucleotide polymorphisms and ovarian cancer risk

DEFF Research Database (Denmark)

Goode, Ellen L; Fridley, Brooke L; Vierkant, Robert A

2009-01-01

Polymorphisms in genes critical to cell cycle control are outstanding candidates for association with ovarian cancer risk; numerous genes have been interrogated by multiple research groups using differing tagging single-nucleotide polymorphism (SNP) sets. To maximize information gleaned from......, and rs3212891; CDK2 rs2069391, rs2069414, and rs17528736; and CCNE1 rs3218036. These results exemplify the utility of imputation in candidate gene studies and lend evidence to a role of cell cycle genes in ovarian cancer etiology, suggest a reduced set of SNPs to target in additional cases and controls....
Missing data imputation: focusing on single imputation.

Science.gov (United States)

Zhang, Zhongheng

2016-01-01

Complete case analysis is widely used for handling missing data, and it is the default method in many statistical packages. However, this method may introduce bias and some useful information will be omitted from analysis. Therefore, many imputation methods are developed to make gap end. The present article focuses on single imputation. Imputations with mean, median and mode are simple but, like complete case analysis, can introduce bias on mean and deviation. Furthermore, they ignore relationship with other variables. Regression imputation can preserve relationship between missing values and other variables. There are many sophisticated methods exist to handle missing values in longitudinal data. This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations.
Imputation of variants from the 1000 Genomes Project modestly improves known associations and can identify low-frequency variant-phenotype associations undetected by HapMap based imputation.

Science.gov (United States)

Wood, Andrew R; Perry, John R B; Tanaka, Toshiko; Hernandez, Dena G; Zheng, Hou-Feng; Melzer, David; Gibbs, J Raphael; Nalls, Michael A; Weedon, Michael N; Spector, Tim D; Richards, J Brent; Bandinelli, Stefania; Ferrucci, Luigi; Singleton, Andrew B; Frayling, Timothy M

2013-01-01

Genome-wide association (GWA) studies have been limited by the reliance on common variants present on microarrays or imputable from the HapMap Project data. More recently, the completion of the 1000 Genomes Project has provided variant and haplotype information for several million variants derived from sequencing over 1,000 individuals. To help understand the extent to which more variants (including low frequency (1% ≤ MAF 1000 Genomes imputation, respectively, and 9 and 11 that reached a stricter, likely conservative, threshold of P1000 Genomes genotype data modestly improved the strength of known associations. Of 20 associations detected at P1000 Genomes imputed data and one was nominally more strongly associated in HapMap imputed data. We also detected an association between a low frequency variant and phenotype that was previously missed by HapMap based imputation approaches. An association between rs112635299 and alpha-1 globulin near the SERPINA gene represented the known association between rs28929474 (MAF = 0.007) and alpha1-antitrypsin that predisposes to emphysema (P = 2.5×10(-12)). Our data provide important proof of principle that 1000 Genomes imputation will detect novel, low frequency-large effect associations.
GACT: a Genome build and Allele definition Conversion Tool for SNP imputation and meta-analysis in genetic association studies.

Science.gov (United States)

Sulovari, Arvis; Li, Dawei

2014-07-19

Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions. In this study, we have developed a tool, GACT, which stands for Genome build and Allele definition Conversion Tool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate > 0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs. GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of genotypes from SNP-arrays or deep
Genotypic Variation of Early Maturing Soybean Genotypes for Phosphorus Utilization Efficiency under Field Grown Conditions

Energy Technology Data Exchange (ETDEWEB)

Abaidoo, R. C. [Kwame Nkrumah University of Technology, Kumasi (Ghana); International Institute of Tropical Agriculture, Ibadan (Nigeria); Opoku, A.; Boahen, S. [Kwame Nkrumah University of Technology, Kumasi (Ghana); Dare, M. O. [Federal University of Agriculture, Abeokuta (Nigeria)

2013-11-15

Variability in the utilization of phosphorus (P) by 64 early-maturing soybean (Glycine max L. Merr.) genotypes under low-P soil conditions were evaluated in 2009 and 2010 at Shika, Nigeria. Fifteen phenotypic variables; number of nodules, nodule dry weight, grain yield, plant biomass, total biomass, biomass N and P content, Phosphorus Utilization Index (PUI), shoot P Utilization efficiency (PUIS), grain P Utilization efficiency (PUIG), Harvest Index (HI), Biological N fixed (BNF), total N fixed and N and P uptake were measured. The four clusters revealed by cluster analysis were basically divided along (1) plant biomass and uptake, (2) nutrient acquisition and utilization and (3) nodulation components. Three early maturing genotypes, TGx1842-14E, TGx1912-11F and TGx1913-5F, were identified as having high P utilization index and low P uptake. These genotypes could be a potential source for breeding for P use efficiency in early maturing soybean genotypes. (author)
Comparison of different methods for imputing genome-wide marker genotypes in Swedish and Finnish Red Cattle

DEFF Research Database (Denmark)

Ma, Peipei; Brøndum, Rasmus Froberg; Qin, Zahng

2013-01-01

This study investigated the imputation accuracy of different methods, considering both the minor allele frequency and relatedness between individuals in the reference and test data sets. Two data sets from the combined population of Swedish and Finnish Red Cattle were used to test the influence...... coefficient was lower when the minor allele frequency was lower. The results indicate that Beagle and IMPUTE2 provide the most robust and accurate imputation accuracies, but considering computing time and memory usage, FImpute is another alternative method....
Genomic evaluations with many more genotypes

Directory of Open Access Journals (Sweden)

Wiggans George R

2011-03-01

Full Text Available Abstract Background Genomic evaluations in Holstein dairy cattle have quickly become more reliable over the last two years in many countries as more animals have been genotyped for 50,000 markers. Evaluations can also include animals genotyped with more or fewer markers using new tools such as the 777,000 or 2,900 marker chips recently introduced for cattle. Gains from more markers can be predicted using simulation, whereas strategies to use fewer markers have been compared using subsets of actual genotypes. The overall cost of selection is reduced by genotyping most animals at less than the highest density and imputing their missing genotypes using haplotypes. Algorithms to combine different densities need to be efficient because numbers of genotyped animals and markers may continue to grow quickly. Methods Genotypes for 500,000 markers were simulated for the 33,414 Holsteins that had 50,000 marker genotypes in the North American database. Another 86,465 non-genotyped ancestors were included in the pedigree file, and linkage disequilibrium was generated directly in the base population. Mixed density datasets were created by keeping 50,000 (every tenth of the markers for most animals. Missing genotypes were imputed using a combination of population haplotyping and pedigree haplotyping. Reliabilities of genomic evaluations using linear and nonlinear methods were compared. Results Differing marker sets for a large population were combined with just a few hours of computation. About 95% of paternal alleles were determined correctly, and > 95% of missing genotypes were called correctly. Reliability of breeding values was already high (84.4% with 50,000 simulated markers. The gain in reliability from increasing the number of markers to 500,000 was only 1.6%, but more than half of that gain resulted from genotyping just 1,406 young bulls at higher density. Linear genomic evaluations had reliabilities 1.5% lower than the nonlinear evaluations with 50
Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations

DEFF Research Database (Denmark)

Dassonneville, R; Brøndum, Rasmus Froberg; Druet, T

2011-01-01

The purpose of this study was to investigate the imputation error and loss of reliability of direct genomic values (DGV) or genomically enhanced breeding values (GEBV) when using genotypes imputed from a 3,000-marker single nucleotide polymorphism (SNP) panel to a 50,000-marker SNP panel. Data...... of missing markers and prediction of breeding values were performed using 2 different reference populations in each country: either a national reference population or a combined EuroGenomics reference population. Validation for accuracy of imputation and genomic prediction was done based on national test...... with a national reference data set gave an absolute loss of 0.05 in mean reliability of GEBV in the French study, whereas a loss of 0.03 was obtained for reliability of DGV in the Nordic study. When genotypes were imputed using the EuroGenomics reference, a loss of 0.02 in mean reliability of GEBV was detected...

A web-based approach to data imputation

KAUST Repository

Li, Zhixu

2013-10-24

In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Moreover, several optimization techniques are also proposed to reduce the cost of estimating the confidence of imputation queries at both the tuple-level and the database-level. Experiments based on several real-world data collections demonstrate not only the effectiveness of WebPut compared to existing approaches, but also the efficiency of our proposed algorithms and optimization techniques. © 2013 Springer Science+Business Media New York.
Increasing imputation and prediction accuracy for Chinese Holsteins using joint Chinese-Nordic reference population

DEFF Research Database (Denmark)

Ma, Peipei; Lund, Mogens Sandø; Ding, X

2015-01-01

This study investigated the effect of including Nordic Holsteins in the reference population on the imputation accuracy and prediction accuracy for Chinese Holsteins. The data used in this study include 85 Chinese Holstein bulls genotyped with both 54K chip and 777K (HD) chip, 2862 Chinese cows...... was improved slightly when using the marker data imputed based on the combined HD reference data, compared with using the marker data imputed based on the Chinese HD reference data only. On the other hand, when using the combined reference population including 4398 Nordic Holstein bulls, the accuracy...... to increase reference population rather than increasing marker density...
Imputation of the rare HOXB13 G84E mutation and cancer risk in a large population-based cohort.

Directory of Open Access Journals (Sweden)

Thomas J Hoffmann

2015-01-01

Full Text Available An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into large well-phenotyped cohorts with existing genome-wide genotype data using large sequenced referenced panels. The success of this approach hinges on the accuracy of rare variant imputation, which remains controversial. For example, a recent study suggested that one cannot adequately impute the HOXB13 G84E mutation associated with prostate cancer risk (carrier frequency of 0.0034 in European ancestry participants in the 1000 Genomes Project. We show that by utilizing the 1000 Genomes Project data plus an enriched reference panel of mutation carriers we were able to accurately impute the G84E mutation into a large cohort of 83,285 non-Hispanic White participants from the Kaiser Permanente Research Program on Genes, Environment and Health Genetic Epidemiology Research on Adult Health and Aging cohort. Imputation authenticity was confirmed via a novel classification and regression tree method, and then empirically validated analyzing a subset of these subjects plus an additional 1,789 men from Kaiser specifically genotyped for the G84E mutation (r2 = 0.57, 95% CI = 0.37–0.77. We then show the value of this approach by using the imputed data to investigate the impact of the G84E mutation on age-specific prostate cancer risk and on risk of fourteen other cancers in the cohort. The age-specific risk of prostate cancer among G84E mutation carriers was higher than among non-carriers. Risk estimates from Kaplan-Meier curves were 36.7% versus 13.6% by age 72, and 64.2% versus 24.2% by age 80, for G84E mutation carriers and non-carriers, respectively (p = 3.4x10-12. The G84E mutation was also associated with an increase in risk for the fourteen other most common cancers considered collectively (p = 5.8x10-4 and more so in cases diagnosed with multiple cancer types, both those including and not including prostate cancer, strongly suggesting
Practical considerations for sensitivity analysis after multiple imputation applied to epidemiological studies with incomplete data

Science.gov (United States)

2012-01-01

Background Multiple Imputation as usually implemented assumes that data are Missing At Random (MAR), meaning that the underlying missing data mechanism, given the observed data, is independent of the unobserved data. To explore the sensitivity of the inferences to departures from the MAR assumption, we applied the method proposed by Carpenter et al. (2007). This approach aims to approximate inferences under a Missing Not At random (MNAR) mechanism by reweighting estimates obtained after multiple imputation where the weights depend on the assumed degree of departure from the MAR assumption. Methods The method is illustrated with epidemiological data from a surveillance system of hepatitis C virus (HCV) infection in France during the 2001–2007 period. The subpopulation studied included 4343 HCV infected patients who reported drug use. Risk factors for severe liver disease were assessed. After performing complete-case and multiple imputation analyses, we applied the sensitivity analysis to 3 risk factors of severe liver disease: past excessive alcohol consumption, HIV co-infection and infection with HCV genotype 3. Results In these data, the association between severe liver disease and HIV was underestimated, if given the observed data the chance of observing HIV status is high when this is positive. Inference for two other risk factors were robust to plausible local departures from the MAR assumption. Conclusions We have demonstrated the practical utility of, and advocate, a pragmatic widely applicable approach to exploring plausible departures from the MAR assumption post multiple imputation. We have developed guidelines for applying this approach to epidemiological studies. PMID:22681630
R package imputeTestbench to compare imputations methods for univariate time series

OpenAIRE

Bokde, Neeraj; Kulat, Kishore; Beck, Marcus W; Asencio-Cortés, Gualberto

2016-01-01

This paper describes the R package imputeTestbench that provides a testbench for comparing imputation methods for missing data in univariate time series. The imputeTestbench package can be used to simulate the amount and type of missing data in a complete dataset and compare filled data using different imputation methods. The user has the option to simulate missing data by removing observations completely at random or in blocks of different sizes. Several default imputation methods are includ...
Multiply-Imputed Synthetic Data: Advice to the Imputer

Directory of Open Access Journals (Sweden)

Loong Bronwyn

2017-12-01

Full Text Available Several statistical agencies have started to use multiply-imputed synthetic microdata to create public-use data in major surveys. The purpose of doing this is to protect the confidentiality of respondents’ identities and sensitive attributes, while allowing standard complete-data analyses of microdata. A key challenge, faced by advocates of synthetic data, is demonstrating that valid statistical inferences can be obtained from such synthetic data for non-confidential questions. Large discrepancies between observed-data and synthetic-data analytic results for such questions may arise because of uncongeniality; that is, differences in the types of inputs available to the imputer, who has access to the actual data, and to the analyst, who has access only to the synthetic data. Here, we discuss a simple, but possibly canonical, example of uncongeniality when using multiple imputation to create synthetic data, which specifically addresses the choices made by the imputer. An initial, unanticipated but not surprising, conclusion is that non-confidential design information used to impute synthetic data should be released with the confidential synthetic data to allow users of synthetic data to avoid possible grossly conservative inferences.
The utility of imputed matched sets. Analyzing probabilistically linked databases in a low information setting.

Science.gov (United States)

Thomas, A M; Cook, L J; Dean, J M; Olson, L M

2014-01-01

To compare results from high probability matched sets versus imputed matched sets across differing levels of linkage information. A series of linkages with varying amounts of available information were performed on two simulated datasets derived from multiyear motor vehicle crash (MVC) and hospital databases, where true matches were known. Distributions of high probability and imputed matched sets were compared against the true match population for occupant age, MVC county, and MVC hour. Regression models were fit to simulated log hospital charges and hospitalization status. High probability and imputed matched sets were not significantly different from occupant age, MVC county, and MVC hour in high information settings (p > 0.999). In low information settings, high probability matched sets were significantly different from occupant age and MVC county (p sets were not (p > 0.493). High information settings saw no significant differences in inference of simulated log hospital charges and hospitalization status between the two methods. High probability and imputed matched sets were significantly different from the outcomes in low information settings; however, imputed matched sets were more robust. The level of information available to a linkage is an important consideration. High probability matched sets are suitable for high to moderate information settings and for situations involving case-specific analysis. Conversely, imputed matched sets are preferable for low information settings when conducting population-based analyses.
Genotyping common and rare variation using overlapping pool sequencing

Directory of Open Access Journals (Sweden)

Pasaniuc Bogdan

2011-07-01

Full Text Available Abstract Background Recent advances in sequencing technologies set the stage for large, population based studies, in which the ANA or RNA of thousands of individuals will be sequenced. Currently, however, such studies are still infeasible using a straightforward sequencing approach; as a result, recently a few multiplexing schemes have been suggested, in which a small number of ANA pools are sequenced, and the results are then deconvoluted using compressed sensing or similar approaches. These methods, however, are limited to the detection of rare variants. Results In this paper we provide a new algorithm for the deconvolution of DNA pools multiplexing schemes. The presented algorithm utilizes a likelihood model and linear programming. The approach allows for the addition of external data, particularly imputation data, resulting in a flexible environment that is suitable for different applications. Conclusions Particularly, we demonstrate that both low and high allele frequency SNPs can be accurately genotyped when the DNA pooling scheme is performed in conjunction with microarray genotyping and imputation. Additionally, we demonstrate the use of our framework for the detection of cancer fusion genes from RNA sequences.
A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods.

Science.gov (United States)

Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A

2015-12-01

Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.
Quick, “Imputation-free” meta-analysis with proxy-SNPs

Directory of Open Access Journals (Sweden)

Meesters Christian

2012-09-01

Full Text Available Abstract Background Meta-analysis (MA is widely used to pool genome-wide association studies (GWASes in order to a increase the power to detect strong or weak genotype effects or b as a result verification method. As a consequence of differing SNP panels among genotyping chips, imputation is the method of choice within GWAS consortia to avoid losing too many SNPs in a MA. YAMAS (Yet Another Meta Analysis Software, however, enables cross-GWAS conclusions prior to finished and polished imputation runs, which eventually are time-consuming. Results Here we present a fast method to avoid forfeiting SNPs present in only a subset of studies, without relying on imputation. This is accomplished by using reference linkage disequilibrium data from 1,000 Genomes/HapMap projects to find proxy-SNPs together with in-phase alleles for SNPs missing in at least one study. MA is conducted by combining association effect estimates of a SNP and those of its proxy-SNPs. Our algorithm is implemented in the MA software YAMAS. Association results from GWAS analysis applications can be used as input files for MA, tremendously speeding up MA compared to the conventional imputation approach. We show that our proxy algorithm is well-powered and yields valuable ad hoc results, possibly providing an incentive for follow-up studies. We propose our method as a quick screening step prior to imputation-based MA, as well as an additional main approach for studies without available reference data matching the ethnicities of study participants. As a proof of principle, we analyzed six dbGaP Type II Diabetes GWAS and found that the proxy algorithm clearly outperforms naïve MA on the p-value level: for 17 out of 23 we observe an improvement on the p-value level by a factor of more than two, and a maximum improvement by a factor of 2127. Conclusions YAMAS is an efficient and fast meta-analysis program which offers various methods, including conventional MA as well as inserting proxy
Plant genotypic diversity reduces the rate of consumer resource utilization.

Science.gov (United States)

McArt, Scott H; Thaler, Jennifer S

2013-07-07

While plant species diversity can reduce herbivore densities and herbivory, little is known regarding how plant genotypic diversity alters resource utilization by herbivores. Here, we show that an invasive folivore--the Japanese beetle (Popillia japonica)--increases 28 per cent in abundance, but consumes 24 per cent less foliage in genotypic polycultures compared with monocultures of the common evening primrose (Oenothera biennis). We found strong complementarity for reduced herbivore damage among plant genotypes growing in polycultures and a weak dominance effect of particularly resistant genotypes. Sequential feeding by P. japonica on different genotypes from polycultures resulted in reduced consumption compared with feeding on different plants of the same genotype from monocultures. Thus, diet mixing among plant genotypes reduced herbivore consumption efficiency. Despite positive complementarity driving an increase in fruit production in polycultures, we observed a trade-off between complementarity for increased plant productivity and resistance to herbivory, suggesting costs in the complementary use of resources by plant genotypes may manifest across trophic levels. These results elucidate mechanisms for how plant genotypic diversity simultaneously alters resource utilization by both producers and consumers, and show that population genotypic diversity can increase the resistance of a native plant to an invasive herbivore.
Inclusion of Population-specific Reference Panel from India to the 1000 Genomes Phase 3 Panel Improves Imputation Accuracy.

Science.gov (United States)

Ahmad, Meraj; Sinha, Anubhav; Ghosh, Sreya; Kumar, Vikrant; Davila, Sonia; Yajnik, Chittaranjan S; Chandak, Giriraj R

2017-07-27

Imputation is a computational method based on the principle of haplotype sharing allowing enrichment of genome-wide association study datasets. It depends on the haplotype structure of the population and density of the genotype data. The 1000 Genomes Project led to the generation of imputation reference panels which have been used globally. However, recent studies have shown that population-specific panels provide better enrichment of genome-wide variants. We compared the imputation accuracy using 1000 Genomes phase 3 reference panel and a panel generated from genome-wide data on 407 individuals from Western India (WIP). The concordance of imputed variants was cross-checked with next-generation re-sequencing data on a subset of genomic regions. Further, using the genome-wide data from 1880 individuals, we demonstrate that WIP works better than the 1000 Genomes phase 3 panel and when merged with it, significantly improves the imputation accuracy throughout the minor allele frequency range. We also show that imputation using only South Asian component of the 1000 Genomes phase 3 panel works as good as the merged panel, making it computationally less intensive job. Thus, our study stresses that imputation accuracy using 1000 Genomes phase 3 panel can be further improved by including population-specific reference panels from South Asia.
Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

NARCIS (Netherlands)

J. Huang (Jie); B. Howie (Bryan); S. McCarthy (Shane); Y. Memari (Yasin); K. Walter (Klaudia); J.L. Min (Josine L.); P. Danecek (Petr); G. Malerba (Giovanni); E. Trabetti (Elisabetta); H.-F. Zheng (Hou-Feng); G. Gambaro (Giovanni); J.B. Richards (Brent); R. Durbin (Richard); N.J. Timpson (Nicholas); J. Marchini (Jonathan); N. Soranzo (Nicole); S.H. Al Turki (Saeed); A. Amuzu (Antoinette); C. Anderson (Carl); R. Anney (Richard); D. Antony (Dinu); M.S. Artigas; M. Ayub (Muhammad); S. Bala (Senduran); J.C. Barrett (Jeffrey); I.E. Barroso (Inês); P.L. Beales (Philip); M. Benn (Marianne); J. Bentham (Jamie); S. Bhattacharya (Shoumo); E. Birney (Ewan); D.H.R. Blackwood (Douglas); M. Bobrow (Martin); E. Bochukova (Elena); P.F. Bolton (Patrick F.); R. Bounds (Rebecca); C. Boustred (Chris); G. Breen (Gerome); M. Calissano (Mattia); K. Carss (Keren); J.P. Casas (Juan Pablo); J.C. Chambers (John C.); R. Charlton (Ruth); K. Chatterjee (Krishna); L. Chen (Lu); A. Ciampi (Antonio); S. Cirak (Sebahattin); P. Clapham (Peter); G. Clement (Gail); G. Coates (Guy); M. Cocca (Massimiliano); D.A. Collier (David); C. Cosgrove (Catherine); T. Cox (Tony); N.J. Craddock (Nick); L. Crooks (Lucy); S. Curran (Sarah); D. Curtis (David); A. Daly (Allan); I.N.M. Day (Ian N.M.); A.G. Day-Williams (Aaron); G.V. Dedoussis (George); T. Down (Thomas); Y. Du (Yuanping); C.M. van Duijn (Cornelia); I. Dunham (Ian); T. Edkins (Ted); R. Ekong (Rosemary); P. Ellis (Peter); D.M. Evans (David); I.S. Farooqi (I. Sadaf); D.R. Fitzpatrick (David R.); P. Flicek (Paul); J. Floyd (James); A.R. Foley (A. Reghan); C.S. Franklin (Christopher S.); M. Futema (Marta); L. Gallagher (Louise); P. Gasparini (Paolo); T.R. Gaunt (Tom); M. Geihs (Matthias); D. Geschwind (Daniel); C.M.T. Greenwood (Celia); H. Griffin (Heather); D. Grozeva (Detelina); X. Guo (Xiaosen); X. Guo (Xueqin); H. Gurling (Hugh); D. Hart (Deborah); A.E. Hendricks (Audrey E.); P.A. Holmans (Peter A.); L. Huang (Liren); T. Hubbard (Tim); S.E. Humphries (Steve E.); M.E. Hurles (Matthew); P.G. Hysi (Pirro); V. Iotchkova (Valentina); A. Isaacs (Aaron); D.K. Jackson (David K.); Y. Jamshidi (Yalda); J. Johnson (Jon); C. Joyce (Chris); K.J. Karczewski (Konrad); J. Kaye (Jane); T. Keane (Thomas); J.P. Kemp (John); K. Kennedy (Karen); A. Kent (Alastair); J. Keogh (Julia); F. Khawaja (Farrah); M.E. Kleber (Marcus); M. Van Kogelenberg (Margriet); A. Kolb-Kokocinski (Anja); J.S. Kooner (Jaspal S.); G. Lachance (Genevieve); C. Langenberg (Claudia); C. Langford (Cordelia); D. Lawson (Daniel); I. Lee (Irene); E.M. van Leeuwen (Elisa); M. Lek (Monkol); R. Li (Rui); Y. Li (Yingrui); J. Liang (Jieqin); H. Lin (Hong); R. Liu (Ryan); J. Lönnqvist (Jouko); L.R. Lopes (Luis R.); M.C. Lopes (Margarida); J. Luan; D.G. MacArthur (Daniel G.); M. Mangino (Massimo); G. Marenne (Gaëlle); W. März (Winfried); J. Maslen (John); A. Matchan (Angela); I. Mathieson (Iain); P. McGuffin (Peter); A.M. McIntosh (Andrew); A.G. McKechanie (Andrew G.); A. McQuillin (Andrew); S. Metrustry (Sarah); N. Migone (Nicola); H.M. Mitchison (Hannah M.); A. Moayyeri (Alireza); J. Morris (James); R. Morris (Richard); D. Muddyman (Dawn); F. Muntoni; B.G. Nordestgaard (Børge G.); K. Northstone (Kate); M.C. O'donovan (Michael); S. O'Rahilly (Stephen); A. Onoufriadis (Alexandros); K. Oualkacha (Karim); M.J. Owen (Michael J.); A. Palotie (Aarno); K. Panoutsopoulou (Kalliope); V. Parker (Victoria); J.R. Parr (Jeremy R.); L. Paternoster (Lavinia); T. Paunio (Tiina); F. Payne (Felicity); S.J. Payne (Stewart J.); J.R.B. Perry (John); O.P.H. Pietiläinen (Olli); V. Plagnol (Vincent); R.C. Pollitt (Rebecca C.); S. Povey (Sue); M.A. Quail (Michael A.); L. Quaye (Lydia); L. Raymond (Lucy); K. Rehnström (Karola); C.K. Ridout (Cheryl K.); S.M. Ring (Susan); G.R.S. Ritchie (Graham R.S.); N. Roberts (Nicola); R.L. Robinson (Rachel L.); D.B. Savage (David); P.J. Scambler (Peter); S. Schiffels (Stephan); M. Schmidts (Miriam); N. Schoenmakers (Nadia); R.H. Scott (Richard H.); R.A. Scott (Robert); R.K. Semple (Robert K.); E. Serra (Eva); S.I. Sharp (Sally I.); A.C. Shaw (Adam C.); H.A. Shihab (Hashem A.); S.-Y. Shin (So-Youn); D. Skuse (David); K.S. Small (Kerrin); C. Smee (Carol); G.D. Smith; L. Southam (Lorraine); O. Spasic-Boskovic (Olivera); T.D. Spector (Timothy); D. St. Clair (David); B. St Pourcain (Beate); J. Stalker (Jim); E. Stevens (Elizabeth); J. Sun (Jianping); G. Surdulescu (Gabriela); J. Suvisaari (Jaana); P. Syrris (Petros); I. Tachmazidou (Ioanna); R. Taylor (Rohan); J. Tian (Jing); M.D. Tobin (Martin); D. Toniolo (Daniela); M. Traglia (Michela); A. Tybjaerg-Hansen; A.M. Valdes; A.M. Vandersteen (Anthony M.); A. Varbo (Anette); P. Vijayarangakannan (Parthiban); P.M. Visscher (Peter); L.V. Wain (Louise); J.T. Walters (James); G. Wang (Guangbiao); J. Wang (Jun); Y. Wang (Yu); K. Ward (Kirsten); E. Wheeler (Eleanor); P.H. Whincup (Peter); T. Whyte (Tamieka); H.J. Williams (Hywel J.); K.A. Williamson (Kathleen); C. Wilson (Crispian); S.G. Wilson (Scott); K. Wong (Kim); C. Xu (Changjiang); J. Yang (Jian); G. Zaza (Gianluigi); E. Zeggini (Eleftheria); F. Zhang (Feng); P. Zhang (Pingbo); W. Zhang (Weihua)

2015-01-01

textabstractImputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced
A comparison of different algorithms for phasing haplotypes using Holstein cattle genotypes and pedigree data.

Science.gov (United States)

Miar, Younes; Sargolzaei, Mehdi; Schenkel, Flavio S

2017-04-01

Phasing genotypes to haplotypes is becoming increasingly important due to its applications in the study of diseases, population and evolutionary genetics, imputation, and so on. Several studies have focused on the development of computational methods that infer haplotype phase from population genotype data. The aim of this study was to compare phasing algorithms implemented in Beagle, Findhap, FImpute, Impute2, and ShapeIt2 software using 50k and 777k (HD) genotyping data. Six scenarios were considered: no-parents, sire-progeny pairs, sire-dam-progeny trios, each with and without pedigree information in Holstein cattle. Algorithms were compared with respect to their phasing accuracy and computational efficiency. In the studied population, Beagle and FImpute were more accurate than other phasing algorithms. Across scenarios, phasing accuracies for Beagle and FImpute were 99.49-99.90% and 99.44-99.99% for 50k, respectively, and 99.90-99.99% and 99.87-99.99% for HD, respectively. Generally, FImpute resulted in higher accuracy when genotypic information of at least one parent was available. In the absence of parental genotypes and pedigree information, Beagle and Impute2 (with double the default number of states) were slightly more accurate than FImpute. Findhap gave high phasing accuracy when parents' genotypes and pedigree information were available. In terms of computing time, Findhap was the fastest algorithm followed by FImpute. FImpute was 30 to 131, 87 to 786, and 353 to 1,400 times faster across scenarios than Beagle, ShapeIt2, and Impute2, respectively. In summary, FImpute and Beagle were the most accurate phasing algorithms. Moreover, the low computational requirement of FImpute makes it an attractive algorithm for phasing genotypes of large livestock populations. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

DEFF Research Database (Denmark)

Huang, Jie; Howie, Bryan; Mccarthy, Shane

2015-01-01

Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low de...
The Ability of Different Imputation Methods to Preserve the Significant Genes and Pathways in Cancer

Directory of Open Access Journals (Sweden)

Rosa Aghdam

2017-12-01

Full Text Available Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving significant genes and pathways. In the first step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next, 10 well-known imputation methods were applied to the complete datasets. The significance analysis of microarrays (SAM method was applied to detect the significant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving significant genes. To determine the impact of different imputation methods on the identification of important genes, the chi-squared test was used to compare the proportions of overlaps between significant genes detected from original data and those detected from the imputed datasets. Additionally, the significant genes are tested for their enrichment in important pathways, using the ConsensusPathDB. Our results showed that almost all the significant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no significant difference in the performance of various imputation methods tested. The source code and selected datasets are available on http://profiles.bs.ipm.ir/softwares/imputation_methods/.
Using imputed genotype data in the joint score tests for genetic association and gene-environment interactions in case-control studies.

Science.gov (United States)

Song, Minsun; Wheeler, William; Caporaso, Neil E; Landi, Maria Teresa; Chatterjee, Nilanjan

2018-03-01

Genome-wide association studies (GWAS) are now routinely imputed for untyped single nucleotide polymorphisms (SNPs) based on various powerful statistical algorithms for imputation trained on reference datasets. The use of predicted allele counts for imputed SNPs as the dosage variable is known to produce valid score test for genetic association. In this paper, we investigate how to best handle imputed SNPs in various modern complex tests for genetic associations incorporating gene-environment interactions. We focus on case-control association studies where inference for an underlying logistic regression model can be performed using alternative methods that rely on varying degree on an assumption of gene-environment independence in the underlying population. As increasingly large-scale GWAS are being performed through consortia effort where it is preferable to share only summary-level information across studies, we also describe simple mechanisms for implementing score tests based on standard meta-analysis of "one-step" maximum-likelihood estimates across studies. Applications of the methods in simulation studies and a dataset from GWAS of lung cancer illustrate ability of the proposed methods to maintain type-I error rates for the underlying testing procedures. For analysis of imputed SNPs, similar to typed SNPs, the retrospective methods can lead to considerable efficiency gain for modeling of gene-environment interactions under the assumption of gene-environment independence. Methods are made available for public use through CGEN R software package. © 2017 WILEY PERIODICALS, INC.
3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data.

Science.gov (United States)

Luo, Yuan; Szolovits, Peter; Dighe, Anand S; Baron, Jason M

2018-06-01

A key challenge in clinical data mining is that most clinical datasets contain missing data. Since many commonly used machine learning algorithms require complete datasets (no missing data), clinical analytic approaches often entail an imputation procedure to "fill in" missing data. However, although most clinical datasets contain a temporal component, most commonly used imputation methods do not adequately accommodate longitudinal time-based data. We sought to develop a new imputation algorithm, 3-dimensional multiple imputation with chained equations (3D-MICE), that can perform accurate imputation of missing clinical time series data. We extracted clinical laboratory test results for 13 commonly measured analytes (clinical laboratory tests). We imputed missing test results for the 13 analytes using 3 imputation methods: multiple imputation with chained equations (MICE), Gaussian process (GP), and 3D-MICE. 3D-MICE utilizes both MICE and GP imputation to integrate cross-sectional and longitudinal information. To evaluate imputation method performance, we randomly masked selected test results and imputed these masked results alongside results missing from our original data. We compared predicted results to measured results for masked data points. 3D-MICE performed significantly better than MICE and GP-based imputation in a composite of all 13 analytes, predicting missing results with a normalized root-mean-square error of 0.342, compared to 0.373 for MICE alone and 0.358 for GP alone. 3D-MICE offers a novel and practical approach to imputing clinical laboratory time series data. 3D-MICE may provide an additional tool for use as a foundation in clinical predictive analytics and intelligent clinical decision support.
The Ability of Different Imputation Methods to Preserve the Significant Genes and Pathways in Cancer.

Science.gov (United States)

Aghdam, Rosa; Baghfalaki, Taban; Khosravi, Pegah; Saberi Ansari, Elnaz

2017-12-01

Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving significant genes and pathways. In the first step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next, 10 well-known imputation methods were applied to the complete datasets. The significance analysis of microarrays (SAM) method was applied to detect the significant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving significant genes. To determine the impact of different imputation methods on the identification of important genes, the chi-squared test was used to compare the proportions of overlaps between significant genes detected from original data and those detected from the imputed datasets. Additionally, the significant genes are tested for their enrichment in important pathways, using the ConsensusPathDB. Our results showed that almost all the significant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no significant difference in the performance of various imputation methods tested. The source code and selected datasets are available on http://profiles.bs.ipm.ir/softwares/imputation_methods/. Copyright © 2017. Production and hosting by Elsevier B.V.
Cost reduction for web-based data imputation

KAUST Repository

Li, Zhixu

2014-01-01

Web-based Data Imputation enables the completion of incomplete data sets by retrieving absent field values from the Web. In particular, complete fields can be used as keywords in imputation queries for absent fields. However, due to the ambiguity of these keywords and the data complexity on the Web, different queries may retrieve different answers to the same absent field value. To decide the most probable right answer to each absent filed value, existing method issues quite a few available imputation queries for each absent value, and then vote on deciding the most probable right answer. As a result, we have to issue a large number of imputation queries for filling all absent values in an incomplete data set, which brings a large overhead. In this paper, we work on reducing the cost of Web-based Data Imputation in two aspects: First, we propose a query execution scheme which can secure the most probable right answer to an absent field value by issuing as few imputation queries as possible. Second, we recognize and prune queries that probably will fail to return any answers a priori. Our extensive experimental evaluation shows that our proposed techniques substantially reduce the cost of Web-based Imputation without hurting its high imputation accuracy. © 2014 Springer International Publishing Switzerland.

Evaluating imputation algorithms for low-depth genotyping-by-sequencing (GBS) data

Science.gov (United States)

Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordabl...
Data driven estimation of imputation error-a strategy for imputation with a reject option

DEFF Research Database (Denmark)

Bak, Nikolaj; Hansen, Lars Kai

2016-01-01

Missing data is a common problem in many research fields and is a challenge that always needs careful considerations. One approach is to impute the missing values, i.e., replace missing values with estimates. When imputation is applied, it is typically applied to all records with missing values i...
Genotype call for chromosomal deletions using read-depth from whole genome sequence variants in cattle

DEFF Research Database (Denmark)

Mesbah-Uddin, Md; Guldbrandtsen, Bernt; Lund, Mogens Sandø

2018-01-01

We presented a deletion genotyping (copy-number estimation) method that leverages population-scale whole genome sequence variants data from 1K bull genomes project (1KBGP) to build reference panel for imputation. To estimate deletion-genotype likelihood, we extracted read-depth (RD) data of all...
Construction and application of a Korean reference panel for imputing classical alleles and amino acids of human leukocyte antigen genes.

Science.gov (United States)

Kim, Kwangwoo; Bang, So-Young; Lee, Hye-Soon; Bae, Sang-Cheol

2014-01-01

Genetic variations of human leukocyte antigen (HLA) genes within the major histocompatibility complex (MHC) locus are strongly associated with disease susceptibility and prognosis for many diseases, including many autoimmune diseases. In this study, we developed a Korean HLA reference panel for imputing classical alleles and amino acid residues of several HLA genes. An HLA reference panel has potential for use in identifying and fine-mapping disease associations with the MHC locus in East Asian populations, including Koreans. A total of 413 unrelated Korean subjects were analyzed for single nucleotide polymorphisms (SNPs) at the MHC locus and six HLA genes, including HLA-A, -B, -C, -DRB1, -DPB1, and -DQB1. The HLA reference panel was constructed by phasing the 5,858 MHC SNPs, 233 classical HLA alleles, and 1,387 amino acid residue markers from 1,025 amino acid positions as binary variables. The imputation accuracy of the HLA reference panel was assessed by measuring concordance rates between imputed and genotyped alleles of the HLA genes from a subset of the study subjects and East Asian HapMap individuals. Average concordance rates were 95.6% and 91.1% at 2-digit and 4-digit allele resolutions, respectively. The imputation accuracy was minimally affected by SNP density of a test dataset for imputation. In conclusion, the Korean HLA reference panel we developed was highly suitable for imputing HLA alleles and amino acids from MHC SNPs in East Asians, including Koreans.
Construction and application of a Korean reference panel for imputing classical alleles and amino acids of human leukocyte antigen genes.

Directory of Open Access Journals (Sweden)

Kwangwoo Kim

Full Text Available Genetic variations of human leukocyte antigen (HLA genes within the major histocompatibility complex (MHC locus are strongly associated with disease susceptibility and prognosis for many diseases, including many autoimmune diseases. In this study, we developed a Korean HLA reference panel for imputing classical alleles and amino acid residues of several HLA genes. An HLA reference panel has potential for use in identifying and fine-mapping disease associations with the MHC locus in East Asian populations, including Koreans. A total of 413 unrelated Korean subjects were analyzed for single nucleotide polymorphisms (SNPs at the MHC locus and six HLA genes, including HLA-A, -B, -C, -DRB1, -DPB1, and -DQB1. The HLA reference panel was constructed by phasing the 5,858 MHC SNPs, 233 classical HLA alleles, and 1,387 amino acid residue markers from 1,025 amino acid positions as binary variables. The imputation accuracy of the HLA reference panel was assessed by measuring concordance rates between imputed and genotyped alleles of the HLA genes from a subset of the study subjects and East Asian HapMap individuals. Average concordance rates were 95.6% and 91.1% at 2-digit and 4-digit allele resolutions, respectively. The imputation accuracy was minimally affected by SNP density of a test dataset for imputation. In conclusion, the Korean HLA reference panel we developed was highly suitable for imputing HLA alleles and amino acids from MHC SNPs in East Asians, including Koreans.
Learning-Based Adaptive Imputation Methodwith kNN Algorithm for Missing Power Data

Directory of Open Access Journals (Sweden)

Minkyung Kim

2017-10-01

Full Text Available This paper proposes a learning-based adaptive imputation method (LAI for imputing missing power data in an energy system. This method estimates the missing power data by using the pattern that appears in the collected data. Here, in order to capture the patterns from past power data, we newly model a feature vector by using past data and its variations. The proposed LAI then learns the optimal length of the feature vector and the optimal historical length, which are significant hyper parameters of the proposed method, by utilizing intentional missing data. Based on a weighted distance between feature vectors representing a missing situation and past situation, missing power data are estimated by referring to the k most similar past situations in the optimal historical length. We further extend the proposed LAI to alleviate the effect of unexpected variation in power data and refer to this new approach as the extended LAI method (eLAI. The eLAI selects a method between linear interpolation (LI and the proposed LAI to improve accuracy under unexpected variations. Finally, from a simulation under various energy consumption profiles, we verify that the proposed eLAI achieves about a 74% reduction of the average imputation error in an energy system, compared to the existing imputation methods.
Gaussian mixture clustering and imputation of microarray data.

Science.gov (United States)

Ouyang, Ming; Welsh, William J; Georgopoulos, Panos

2004-04-12

In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.
Missing value imputation for epistatic MAPs

LENUS (Irish Health Repository)

Ryan, Colm

2010-04-20

Abstract Background Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data. Results We identify different categories for the missing data based on their underlying cause, and show that values from the largest category can be imputed effectively. We compare local and global imputation approaches across a variety of distinct E-MAP datasets, showing that both are competitive and preferable to filling in with zeros. In addition we show that these methods are effective in an E-MAP from a different species, suggesting that pairwise imputation techniques will be increasingly useful as analogous epistasis mapping techniques are developed in different species. We show that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions. Finally we show that imputed interactions, generated using nearest neighbor methods, are enriched for annotations in the same manner as measured interactions. Therefore our method potentially
Correlation of Lactobacillus rhamnosus Genotypes and Carbohydrate Utilization Signatures Determined by Phenotype Profiling.

Science.gov (United States)

Ceapa, Corina; Lambert, Jolanda; van Limpt, Kees; Wels, Michiel; Smokvina, Tamara; Knol, Jan; Kleerebezem, Michiel

2015-08-15

Lactobacillus rhamnosus is a bacterial species commonly colonizing the gastrointestinal (GI) tract of humans and also frequently used in food products. While some strains have been studied extensively, physiological variability among isolates of the species found in healthy humans or their diet is largely unexplored. The aim of this study was to characterize the diversity of carbohydrate utilization capabilities of human isolates and food-derived strains of L. rhamnosus in relation to their niche of isolation and genotype. We investigated the genotypic and phenotypic diversity of 25 out of 65 L. rhamnosus strains from various niches, mainly human feces and fermented dairy products. Genetic fingerprinting of the strains by amplified fragment length polymorphism (AFLP) identified 11 distinct subgroups at 70% similarity and suggested niche enrichment within particular genetic clades. High-resolution carbohydrate utilization profiling (OmniLog) identified 14 carbon sources that could be used by all of the strains tested for growth, while the utilization of 58 carbon sources differed significantly between strains, enabling the stratification of L. rhamnosus strains into three metabolic clusters that partially correlate with the genotypic clades but appear uncorrelated with the strain's origin of isolation. Draft genome sequences of 8 strains were generated and employed in a gene-trait matching (GTM) analysis together with the publicly available genomes of L. rhamnosus GG (ATCC 53103) and HN001 for several carbohydrates that were distinct for the different metabolic clusters: l-rhamnose, cellobiose, l-sorbose, and α-methyl-d-glucoside. From the analysis, candidate genes were identified that correlate with l-sorbose and α-methyl-d-glucoside utilization, and the proposed function of these genes could be confirmed by heterologous expression in a strain lacking the genes. This study expands our insight into the phenotypic and genotypic diversity of the species L. rhamnosus
Applying an efficient K-nearest neighbor search to forest attribute imputation

Science.gov (United States)

Andrew O. Finley; Ronald E. McRoberts; Alan R. Ek

2006-01-01

This paper explores the utility of an efficient nearest neighbor (NN) search algorithm for applications in multi-source kNN forest attribute imputation. The search algorithm reduces the number of distance calculations between a given target vector and each reference vector, thereby, decreasing the time needed to discover the NN subset. Results of five trials show gains...
The multiple imputation method: a case study involving secondary data analysis.

Science.gov (United States)

Walani, Salimah R; Cleland, Charles M

2015-05-01

To illustrate with the example of a secondary data analysis study the use of the multiple imputation method to replace missing data. Most large public datasets have missing data, which need to be handled by researchers conducting secondary data analysis studies. Multiple imputation is a technique widely used to replace missing values while preserving the sample size and sampling variability of the data. The 2004 National Sample Survey of Registered Nurses. The authors created a model to impute missing values using the chained equation method. They used imputation diagnostics procedures and conducted regression analysis of imputed data to determine the differences between the log hourly wages of internationally educated and US-educated registered nurses. The authors used multiple imputation procedures to replace missing values in a large dataset with 29,059 observations. Five multiple imputed datasets were created. Imputation diagnostics using time series and density plots showed that imputation was successful. The authors also present an example of the use of multiple imputed datasets to conduct regression analysis to answer a substantive research question. Multiple imputation is a powerful technique for imputing missing values in large datasets while preserving the sample size and variance of the data. Even though the chained equation method involves complex statistical computations, recent innovations in software and computation have made it possible for researchers to conduct this technique on large datasets. The authors recommend nurse researchers use multiple imputation methods for handling missing data to improve the statistical power and external validity of their studies.
Estimating the accuracy of geographical imputation

Directory of Open Access Journals (Sweden)

Boscoe Francis P

2008-01-01

Full Text Available Abstract Background To reduce the number of non-geocoded cases researchers and organizations sometimes include cases geocoded to postal code centroids along with cases geocoded with the greater precision of a full street address. Some analysts then use the postal code to assign information to the cases from finer-level geographies such as a census tract. Assignment is commonly completed using either a postal centroid or by a geographical imputation method which assigns a location by using both the demographic characteristics of the case and the population characteristics of the postal delivery area. To date no systematic evaluation of geographical imputation methods ("geo-imputation" has been completed. The objective of this study was to determine the accuracy of census tract assignment using geo-imputation. Methods Using a large dataset of breast, prostate and colorectal cancer cases reported to the New Jersey Cancer Registry, we determined how often cases were assigned to the correct census tract using alternate strategies of demographic based geo-imputation, and using assignments obtained from postal code centroids. Assignment accuracy was measured by comparing the tract assigned with the tract originally identified from the full street address. Results Assigning cases to census tracts using the race/ethnicity population distribution within a postal code resulted in more correctly assigned cases than when using postal code centroids. The addition of age characteristics increased the match rates even further. Match rates were highly dependent on both the geographic distribution of race/ethnicity groups and population density. Conclusion Geo-imputation appears to offer some advantages and no serious drawbacks as compared with the alternative of assigning cases to census tracts based on postal code centroids. For a specific analysis, researchers will still need to consider the potential impact of geocoding quality on their results and evaluate
Cost reduction for web-based data imputation

KAUST Repository

Li, Zhixu; Shang, Shuo; Xie, Qing; Zhang, Xiangliang

2014-01-01

Web-based Data Imputation enables the completion of incomplete data sets by retrieving absent field values from the Web. In particular, complete fields can be used as keywords in imputation queries for absent fields. However, due to the ambiguity
Multiple imputation in the presence of non-normal data.

Science.gov (United States)

Lee, Katherine J; Carlin, John B

2017-02-20

Multiple imputation (MI) is becoming increasingly popular for handling missing data. Standard approaches for MI assume normality for continuous variables (conditionally on the other variables in the imputation model). However, it is unclear how to impute non-normally distributed continuous variables. Using simulation and a case study, we compared various transformations applied prior to imputation, including a novel non-parametric transformation, to imputation on the raw scale and using predictive mean matching (PMM) when imputing non-normal data. We generated data from a range of non-normal distributions, and set 50% to missing completely at random or missing at random. We then imputed missing values on the raw scale, following a zero-skewness log, Box-Cox or non-parametric transformation and using PMM with both type 1 and 2 matching. We compared inferences regarding the marginal mean of the incomplete variable and the association with a fully observed outcome. We also compared results from these approaches in the analysis of depression and anxiety symptoms in parents of very preterm compared with term-born infants. The results provide novel empirical evidence that the decision regarding how to impute a non-normal variable should be based on the nature of the relationship between the variables of interest. If the relationship is linear in the untransformed scale, transformation can introduce bias irrespective of the transformation used. However, if the relationship is non-linear, it may be important to transform the variable to accurately capture this relationship. A useful alternative is to impute the variable using PMM with type 1 matching. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Phosphate rock utilization by soybean genotypes on a low-P ...

African Journals Online (AJOL)

Thirteen promiscuous soybean genotypes were evaluated in a low-P soil at Fashola in the derived savanna of Nigeria to compare their ability to acquire and utilize P from phosphate rock (PR) and single superphosphate (SSP). Changes in soil P fractions after a subsequent maize crop were also assessed. The treatments ...
Multiple imputation and its application

CERN Document Server

Carpenter, James

2013-01-01

A practical guide to analysing partially observed data. Collecting, analysing and drawing inferences from data is central to research in the medical and social sciences. Unfortunately, it is rarely possible to collect all the intended data. The literature on inference from the resulting incomplete data is now huge, and continues to grow both as methods are developed for large and complex data structures, and as increasing computer power and suitable software enable researchers to apply these methods. This book focuses on a particular statistical method for analysing and drawing inferences from incomplete data, called Multiple Imputation (MI). MI is attractive because it is both practical and widely applicable. The authors aim is to clarify the issues raised by missing data, describing the rationale for MI, the relationship between the various imputation models and associated algorithms and its application to increasingly complex data structures. Multiple Imputation and its Application: Discusses the issues ...
Comparison of different Methods for Univariate Time Series Imputation in R

OpenAIRE

Moritz, Steffen; Sardá, Alexis; Bartz-Beielstein, Thomas; Zaefferer, Martin; Stork, Jörg

2015-01-01

Missing values in datasets are a well-known problem and there are quite a lot of R packages offering imputation functions. But while imputation in general is well covered within R, it is hard to find functions for imputation of univariate time series. The problem is, most standard imputation techniques can not be applied directly. Most algorithms rely on inter-attribute correlations, while univariate time series imputation needs to employ time dependencies. This paper provides an overview of ...
Imputation-based analysis of association studies: candidate regions and quantitative traits.

Directory of Open Access Journals (Sweden)

Bertrand Servin

2007-07-01

Full Text Available We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest with genotype data at tag SNPs collected on a phenotyped study sample, to estimate ("impute" unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene, the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html.
Factors associated with low birth weight in Nepal using multiple imputation

Directory of Open Access Journals (Sweden)

Usha Singh

2017-02-01

Full Text Available Abstract Background Survey data from low income countries on birth weight usually pose a persistent problem. The studies conducted on birth weight have acknowledged missing data on birth weight, but they are not included in the analysis. Furthermore, other missing data presented on determinants of birth weight are not addressed. Thus, this study tries to identify determinants that are associated with low birth weight (LBW using multiple imputation to handle missing data on birth weight and its determinants. Methods The child dataset from Nepal Demographic and Health Survey (NDHS, 2011 was utilized in this study. A total of 5,240 children were born between 2006 and 2011, out of which 87% had at least one measured variable missing and 21% had no recorded birth weight. All the analyses were carried out in R version 3.1.3. Transform-then impute method was applied to check for interaction between explanatory variables and imputed missing data. Survey package was applied to each imputed dataset to account for survey design and sampling method. Survey logistic regression was applied to identify the determinants associated with LBW. Results The prevalence of LBW was 15.4% after imputation. Women with the highest autonomy on their own health compared to those with health decisions involving husband or others (adjusted odds ratio (OR 1.87, 95% confidence interval (95% CI = 1.31, 2.67, and husband and women together (adjusted OR 1.57, 95% CI = 1.05, 2.35 were less likely to give birth to LBW infants. Mothers using highly polluting cooking fuels (adjusted OR 1.49, 95% CI = 1.03, 2.22 were more likely to give birth to LBW infants than mothers using non-polluting cooking fuels. Conclusion The findings of this study suggested that obtaining the prevalence of LBW from only the sample of measured birth weight and ignoring missing data results in underestimation.
Flexible Imputation of Missing Data

CERN Document Server

van Buuren, Stef

2012-01-01

Missing data form a problem in every scientific discipline, yet the techniques required to handle them are complicated and often lacking. One of the great ideas in statistical science--multiple imputation--fills gaps in the data with plausible values, the uncertainty of which is coded in the data itself. It also solves other problems, many of which are missing data problems in disguise. Flexible Imputation of Missing Data is supported by many examples using real data taken from the author's vast experience of collaborative research, and presents a practical guide for handling missing data unde

[Characteristics of dry matter production and nitrogen accumulation in barley genotypes with high nitrogen utilization efficiency].

Science.gov (United States)

Huang, Yi; Li, Ting-Xuan; Zhang, Xi-Zhou; Ji, Lin

2014-07-01

A pot experiment was conducted under low (125 mg x kg-1) and normal (250 mg x kg(-1)) nitrogen treatments. The nitrogen uptake and utilization efficiency of 22 barley cultivars were investigated, and the characteristics of dry matter production and nitrogen accumulation in barley were analyzed. The results showed that nitrogen uptake and utilization efficiency were different for barley under two nitrogen levels. The maximal values of grain yield, nitrogen utilization efficiency for grain and nitrogen harvest index were 2.87, 2.91 and 2.47 times as those of the lowest under the low nitrogen treatment. Grain yield and nitrogen utilization efficiency for grain and nitrogen harvest index of barley genotype with high nitrogen utilization efficiency were significantly greater than low nitrogen utilization efficiency, and the parameters of high nitrogen utilization efficiency genotype were 82.1%, 61.5% and 50.5% higher than low nitrogen utilization efficiency genotype under the low nitrogen treatment. Dry matter mass and nitrogen utilization of high nitrogen utilization efficiency was significantly higher than those of low nitrogen utilization efficiency. A peak of dry matter mass of high nitrogen utilization efficiency occurred during jointing to heading stage, while that of nitrogen accumulation appeared before jointing. Under the low nitrogen treatment, dry matter mass of DH61 and DH121+ was 34.4% and 38.3%, and nitrogen accumulation was 54. 8% and 58.0% higher than DH80, respectively. Dry matter mass and nitrogen accumulation seriously affected yield before jointing stage, and the contribution rates were 47.9% and 54.7% respectively under the low nitrogen treatment. The effect of dry matter and nitrogen accumulation on nitrogen utilization efficiency for grain was the largest during heading to mature stages, followed by sowing to jointing stages, with the contribution rate being 29.5% and 48.7%, 29.0% and 15.8%, respectively. In conclusion, barley genotype with high
Dealing with missing data in a multi-question depression scale: a comparison of imputation methods

Directory of Open Access Journals (Sweden)

Stuart Heather

2006-12-01

Full Text Available Abstract Background Missing data present a challenge to many research projects. The problem is often pronounced in studies utilizing self-report scales, and literature addressing different strategies for dealing with missing data in such circumstances is scarce. The objective of this study was to compare six different imputation techniques for dealing with missing data in the Zung Self-reported Depression scale (SDS. Methods 1580 participants from a surgical outcomes study completed the SDS. The SDS is a 20 question scale that respondents complete by circling a value of 1 to 4 for each question. The sum of the responses is calculated and respondents are classified as exhibiting depressive symptoms when their total score is over 40. Missing values were simulated by randomly selecting questions whose values were then deleted (a missing completely at random simulation. Additionally, a missing at random and missing not at random simulation were completed. Six imputation methods were then considered; 1 multiple imputation, 2 single regression, 3 individual mean, 4 overall mean, 5 participant's preceding response, and 6 random selection of a value from 1 to 4. For each method, the imputed mean SDS score and standard deviation were compared to the population statistics. The Spearman correlation coefficient, percent misclassified and the Kappa statistic were also calculated. Results When 10% of values are missing, all the imputation methods except random selection produce Kappa statistics greater than 0.80 indicating 'near perfect' agreement. MI produces the most valid imputed values with a high Kappa statistic (0.89, although both single regression and individual mean imputation also produced favorable results. As the percent of missing information increased to 30%, or when unbalanced missing data were introduced, MI maintained a high Kappa statistic. The individual mean and single regression method produced Kappas in the 'substantial agreement' range
Two-pass imputation algorithm for missing value estimation in gene expression time series.

Science.gov (United States)

Tsiporkova, Elena; Boeva, Veselka

2007-10-01

Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different
Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa.

Directory of Open Access Journals (Sweden)

Katya L Masconi

Full Text Available Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation.Data from the Cape Town Bellville-South cohort served as the basis for this study. Imputation methods and models were identified via recent systematic reviews. Models' discrimination was assessed and compared using C-statistic and non-parametric methods, before and after recalibration through simple intercept adjustment.The study sample consisted of 1256 individuals, of whom 173 were excluded due to previously diagnosed diabetes. Of the final 1083 individuals, 329 (30.4% had missing data. Family history had the highest proportion of missing data (25%. Imputation of the outcome, undiagnosed diabetes, was highest in stochastic regression imputation (163 individuals. Overall, deletion resulted in the lowest model performances while simple imputation yielded the highest C-statistic for the Cambridge Diabetes Risk model, Kuwaiti Risk model, Omani Diabetes Risk model and Rotterdam Predictive model. Multiple imputation only yielded the highest C-statistic for the Rotterdam Predictive model, which were matched by simpler imputation methods.Deletion was confirmed as a poor technique for handling missing data. However, despite the emphasized disadvantages of simpler imputation methods, this study showed that implementing these methods results in similar predictive utility for undiagnosed diabetes when compared to multiple imputation.
Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets

Directory of Open Access Journals (Sweden)

Min-Wei Huang

2018-01-01

Full Text Available Many real-world medical datasets contain some proportion of missing (attribute values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.
The use of multiple imputation for the accurate measurements of individual feed intake by electronic feeders.

Science.gov (United States)

Jiao, S; Tiezzi, F; Huang, Y; Gray, K A; Maltecca, C

2016-02-01

Obtaining accurate individual feed intake records is the key first step in achieving genetic progress toward more efficient nutrient utilization in pigs. Feed intake records collected by electronic feeding systems contain errors (erroneous and abnormal values exceeding certain cutoff criteria), which are due to feeder malfunction or animal-feeder interaction. In this study, we examined the use of a novel data-editing strategy involving multiple imputation to minimize the impact of errors and missing values on the quality of feed intake data collected by an electronic feeding system. Accuracy of feed intake data adjustment obtained from the conventional linear mixed model (LMM) approach was compared with 2 alternative implementations of multiple imputation by chained equation, denoted as MI (multiple imputation) and MICE (multiple imputation by chained equation). The 3 methods were compared under 3 scenarios, where 5, 10, and 20% feed intake error rates were simulated. Each of the scenarios was replicated 5 times. Accuracy of the alternative error adjustment was measured as the correlation between the true daily feed intake (DFI; daily feed intake in the testing period) or true ADFI (the mean DFI across testing period) and the adjusted DFI or adjusted ADFI. In the editing process, error cutoff criteria are used to define if a feed intake visit contains errors. To investigate the possibility that the error cutoff criteria may affect any of the 3 methods, the simulation was repeated with 2 alternative error cutoff values. Multiple imputation methods outperformed the LMM approach in all scenarios with mean accuracies of 96.7, 93.5, and 90.2% obtained with MI and 96.8, 94.4, and 90.1% obtained with MICE compared with 91.0, 82.6, and 68.7% using LMM for DFI. Similar results were obtained for ADFI. Furthermore, multiple imputation methods consistently performed better than LMM regardless of the cutoff criteria applied to define errors. In conclusion, multiple imputation
Data imputation analysis for Cosmic Rays time series

Science.gov (United States)

Fernandes, R. C.; Lucio, P. S.; Fernandez, J. H.

2017-05-01

The occurrence of missing data concerning Galactic Cosmic Rays time series (GCR) is inevitable since loss of data is due to mechanical and human failure or technical problems and different periods of operation of GCR stations. The aim of this study was to perform multiple dataset imputation in order to depict the observational dataset. The study has used the monthly time series of GCR Climax (CLMX) and Roma (ROME) from 1960 to 2004 to simulate scenarios of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of missing data compared to observed ROME series, with 50 replicates. Then, the CLMX station as a proxy for allocation of these scenarios was used. Three different methods for monthly dataset imputation were selected: AMÉLIA II - runs the bootstrap Expectation Maximization algorithm, MICE - runs an algorithm via Multivariate Imputation by Chained Equations and MTSDI - an Expectation Maximization algorithm-based method for imputation of missing values in multivariate normal time series. The synthetic time series compared with the observed ROME series has also been evaluated using several skill measures as such as RMSE, NRMSE, Agreement Index, R, R2, F-test and t-test. The results showed that for CLMX and ROME, the R2 and R statistics were equal to 0.98 and 0.96, respectively. It was observed that increases in the number of gaps generate loss of quality of the time series. Data imputation was more efficient with MTSDI method, with negligible errors and best skill coefficients. The results suggest a limit of about 60% of missing data for imputation, for monthly averages, no more than this. It is noteworthy that CLMX, ROME and KIEL stations present no missing data in the target period. This methodology allowed reconstructing 43 time series.
Bootstrap inference when using multiple imputation.

Science.gov (United States)

Schomaker, Michael; Heumann, Christian

2018-04-16

Many modern estimators require bootstrapping to calculate confidence intervals because either no analytic standard error is available or the distribution of the parameter of interest is nonsymmetric. It remains however unclear how to obtain valid bootstrap inference when dealing with multiple imputation to address missing data. We present 4 methods that are intuitively appealing, easy to implement, and combine bootstrap estimation with multiple imputation. We show that 3 of the 4 approaches yield valid inference, but that the performance of the methods varies with respect to the number of imputed data sets and the extent of missingness. Simulation studies reveal the behavior of our approaches in finite samples. A topical analysis from HIV treatment research, which determines the optimal timing of antiretroviral treatment initiation in young children, demonstrates the practical implications of the 4 methods in a sophisticated and realistic setting. This analysis suffers from missing data and uses the g-formula for inference, a method for which no standard errors are available. Copyright © 2018 John Wiley & Sons, Ltd.
Clinical characteristics, healthcare costs, and resource utilization in hepatitis C vary by genotype.

Science.gov (United States)

Goolsby Hunter, Alyssa; Rosenblatt, Lisa; Patel, Chad; Blauer-Peterson, Cori; Anduze-Faris, Beatrice

2017-05-01

In the United States, approximately 3 million people are infected with hepatitis C virus (HCV). Genotypes of HCV variably affect disease progression and treatment response. However, the relationships between HCV genotypes and liver disease progression, healthcare resource utilization, and healthcare costs have not been fully explored. In this retrospective study of patients with chronic hepatitis C (CHC), healthcare claims from a large US health plan were used to collect data on patient demographic and clinical characteristics. Main outcome measures include healthcare resource utilization (HCRU) and healthcare costs. Linked laboratory data provided genotype and select measures to determine liver disease severity. The sample (mean age 50.6 years, 63.5% male) included 10,331 patients, of whom 79.1% had genotype (GT)1, 12.8% had GT2, and 8.1% had GT3. Descriptive analyses demonstrated variation by HCV genotype in liver and non-liver related comorbidities, liver disease severity, and healthcare costs. The highest percentage of patients with liver-related comorbidities and advanced liver disease was found among those with GT3. Meanwhile, patients with GT2 had lower HCRU and the lowest costs, and patients with GT1 had the highest total all-cause costs. These differences may reflect differing rates of non-liver-related comorbidities and all-cause care. Multivariable analyses showed that genotype was a significant predictor of costs and liver disease severity: compared with patients having GT1, those with GT3 were significantly more likely to have advanced liver disease. Patients with GT2 were significantly less likely to have advanced disease and more likely to have lower all-cause costs. Results may not be generalizable to patients outside the represented commercial insurance plans, and analysis of a prevalent population may underestimate HCRU and costs relative to a sample of treated patients. These results suggest that liver disease progression varies by genotype and
Imputing data that are missing at high rates using a boosting algorithm

Energy Technology Data Exchange (ETDEWEB)

Cauthen, Katherine Regina [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Lambert, Gregory [Apple Inc., Cupertino, CA (United States); Ray, Jaideep [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Lefantzi, Sophia [Sandia National Lab. (SNL-CA), Livermore, CA (United States)

2016-09-01

Traditional multiple imputation approaches may perform poorly for datasets with high rates of missingness unless many m imputations are used. This paper implements an alternative machine learning-based approach to imputing data that are missing at high rates. Here, we use boosting to create a strong learner from a weak learner fitted to a dataset missing many observations. This approach may be applied to a variety of types of learners (models). The approach is demonstrated by application to a spatiotemporal dataset for predicting dengue outbreaks in India from meteorological covariates. A Bayesian spatiotemporal CAR model is boosted to produce imputations, and the overall RMSE from a k-fold cross-validation is used to assess imputation accuracy.
Comparative study of P uptake and utilization from P fertilizers by Chilean wheat genotypes in volcanic ash soils

International Nuclear Information System (INIS)

Pino, I.; Parada, A.M.; Zapata, F.; Navia, M.; Luzio, W.

2002-01-01

The intensification of the agricultural production in Southern Chile demand the application of P fertilizers to volcanic ash soils for optimum plant growth and crop yields. Due to the high P sorption capacities of these soils, high amounts of water-soluble phosphatic fertilizers need to be applied. Therefore, the direct application of locally available Bahia Inglesa phosphate rock has been utilized to supply P to crops in these acid soils. Phosphate rock is a very efficient P input for crops with long growth cycles or crop rotations nevertheless water-soluble P fertilizers must still be applied to crops of short growth cycle. Combined with these strategic P inputs, the use of acid-tolerant and P-efficient genotypes can further contribute to agricultural sustainability. Greenhouse studies were undertaken to explore and identify genotypic variations in P efficiency of wheat grown in Andisols of Southern Chile. 32 P isotopic techniques were utilized to measure the uptake of P from triple superphosphate, a water-soluble P fertilizer and the locally available Bahia Inglesa phosphate rock. Substantial genotypic variations in P use efficiency were found among the Chilean wheat genotypes tested. The utilization of the 32 P isotopic techniques enabled to quantify the P taken up from the P fertilizer and the assessment of differences among the genotypes. Significant genotypic differences were obtained in the P uptake from the local phosphate rock Bahia Inglesa. Much higher applications of phosphate rock were required in Santa Barbara soil series (Andisol) due to its high P retention. A sustainable strategy for agricultural production in the Andisols of Chile would therefore, be the combined utilization of those efficient wheat genotypes and the local phosphate rock Bahia Inglesa. As P efficiency is a multi-faceted trait, which interacts with a range of environmental factors, further field-testing and validation is required accompanied by in depth studies to assess the
Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans.

Science.gov (United States)

Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne; Bourgeois, Stephane; Svensson, Peter J; Wadelius, Mia; Deloukas, Panos; Montgomery, Stephen B; Altman, Russ B

2017-11-24

Genome-wide association studies are useful for discovering genotype-phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into "gene level" effects. Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression-on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort.
A Comparison of Joint Model and Fully Conditional Specification Imputation for Multilevel Missing Data

Science.gov (United States)

Mistler, Stephen A.; Enders, Craig K.

2017-01-01

Multiple imputation methods can generally be divided into two broad frameworks: joint model (JM) imputation and fully conditional specification (FCS) imputation. JM draws missing values simultaneously for all incomplete variables using a multivariate distribution, whereas FCS imputes variables one at a time from a series of univariate conditional…
Public Undertakings and Imputability

DEFF Research Database (Denmark)

Ølykke, Grith Skovgaard

2013-01-01

In this article, the issue of impuability to the State of public undertakings’ decision-making is analysed and discussed in the context of the DSBFirst case. DSBFirst is owned by the independent public undertaking DSB and the private undertaking FirstGroup plc and won the contracts in the 2008...... Oeresund tender for the provision of passenger transport by railway. From the start, the services were provided at a loss, and in the end a part of DSBFirst was wound up. In order to frame the problems illustrated by this case, the jurisprudence-based imputability requirement in the definition of State aid...... in Article 107(1) TFEU is analysed. It is concluded that where the public undertaking transgresses the control system put in place by the State, conditions for imputability are not fulfilled, and it is argued that in the current state of law, there is no conditional link between the level of control...
Hepatitis C Virus Resistance Testing in Genotype 1: The Changing Role in Clinical Utility.

Science.gov (United States)

Molino, Suzanne; Martin, Michelle T

2017-09-01

To review the role and utility of baseline resistance testing with currently available and pipeline genotype 1 hepatitis C virus (HCV) treatment. Authors reviewed liver meeting abstracts for data on currently-available and pipeline genotype 1 retreatment regimens from January 1, 2015, to March 23, 2017. Additional trials were identified from a review of clinicaltrials.gov using the pipeline medication names. Authors identified reports of current and pipeline genotype 1 retreatment regimens. Seven references were clinical study results presented at the meetings of the American Association for the Study of Liver Diseases and the European Association for the Study of the Liver, and 2 studies were from clinicaltrials.gov . Retreatment trial data of currently available salvage regimens indicate that baseline NS5A resistance-associated substitutions (RASs) may decrease sustained virological response (SVR) rates when retreating with ledipasvir/sofosbuvir but are not affected when using elbasvir/grazoprevir + sofosbuvir + ribavirin, paritaprevir/ritonavir/ombitasvir + dasabuvir + sofosbuvir, or sofosbuvir/velpatasvir + ribavirin. Pipeline data indicate that baseline NS5A RASs do not affect SVR rates when retreating with sofosbuvir/velpatasvir/voxilaprevir or glecaprevir/pibrentasvir. Baseline resistance testing was used for decisional support for 3 clinical scenarios in patients with HCV genotype 1 infection at the time of manuscript submission. Pending the approval of 2 new direct-acting antiviral regimens in the third quarter of 2017, the rapidly evolving HCV treatment guidelines will likely reflect a decreased clinical utility for resistance testing.
Accounting for one-channel depletion improves missing value imputation in 2-dye microarray data.

Science.gov (United States)

Ritz, Cecilia; Edén, Patrik

2008-01-19

For 2-dye microarray platforms, some missing values may arise from an un-measurably low RNA expression in one channel only. Information of such "one-channel depletion" is so far not included in algorithms for imputation of missing values. Calculating the mean deviation between imputed values and duplicate controls in five datasets, we show that KNN-based imputation gives a systematic bias of the imputed expression values of one-channel depleted spots. Evaluating the correction of this bias by cross-validation showed that the mean square deviation between imputed values and duplicates were reduced up to 51%, depending on dataset. By including more information in the imputation step, we more accurately estimate missing expression values.
Using imputation to provide location information for nongeocoded addresses.

Directory of Open Access Journals (Sweden)

Frank C Curriero

2010-02-01

Full Text Available The importance of geography as a source of variation in health research continues to receive sustained attention in the literature. The inclusion of geographic information in such research often begins by adding data to a map which is predicated by some knowledge of location. A precise level of spatial information is conventionally achieved through geocoding, the geographic information system (GIS process of translating mailing address information to coordinates on a map. The geocoding process is not without its limitations, though, since there is always a percentage of addresses which cannot be converted successfully (nongeocodable. This raises concerns regarding bias since traditionally the practice has been to exclude nongeocoded data records from analysis.In this manuscript we develop and evaluate a set of imputation strategies for dealing with missing spatial information from nongeocoded addresses. The strategies are developed assuming a known zip code with increasing use of collateral information, namely the spatial distribution of the population at risk. Strategies are evaluated using prostate cancer data obtained from the Maryland Cancer Registry. We consider total case enumerations at the Census county, tract, and block group level as the outcome of interest when applying and evaluating the methods. Multiple imputation is used to provide estimated total case counts based on complete data (geocodes plus imputed nongeocodes with a measure of uncertainty. Results indicate that the imputation strategy based on using available population-based age, gender, and race information performed the best overall at the county, tract, and block group levels.The procedure allows for the potentially biased and likely under reported outcome, case enumerations based on only the geocoded records, to be presented with a statistically adjusted count (imputed count with a measure of uncertainty that are based on all the case data, the geocodes and imputed
Missing value imputation for microarray gene expression data using histone acetylation information

Directory of Open Access Journals (Sweden)

Feng Jihua

2008-05-01

Full Text Available Abstract Background It is an important pre-processing step to accurately estimate missing values in microarray data, because complete datasets are required in numerous expression profile analysis in bioinformatics. Although several methods have been suggested, their performances are not satisfactory for datasets with high missing percentages. Results The paper explores the feasibility of doing missing value imputation with the help of gene regulatory mechanism. An imputation framework called histone acetylation information aided imputation method (HAIimpute method is presented. It incorporates the histone acetylation information into the conventional KNN(k-nearest neighbor and LLS(local least square imputation algorithms for final prediction of the missing values. The experimental results indicated that the use of acetylation information can provide significant improvements in microarray imputation accuracy. The HAIimpute methods consistently improve the widely used methods such as KNN and LLS in terms of normalized root mean squared error (NRMSE. Meanwhile, the genes imputed by HAIimpute methods are more correlated with the original complete genes in terms of Pearson correlation coefficients. Furthermore, the proposed methods also outperform GOimpute, which is one of the existing related methods that use the functional similarity as the external information. Conclusion We demonstrated that the using of histone acetylation information could greatly improve the performance of the imputation especially at high missing percentages. This idea can be generalized to various imputation methods to facilitate the performance. Moreover, with more knowledge accumulated on gene regulatory mechanism in addition to histone acetylation, the performance of our approach can be further improved and verified.
Missing value imputation: with application to handwriting data

Science.gov (United States)

Xu, Zhen; Srihari, Sargur N.

2015-01-01

Missing values make pattern analysis difficult, particularly with limited available data. In longitudinal research, missing values accumulate, thereby aggravating the problem. Here we consider how to deal with temporal data with missing values in handwriting analysis. In the task of studying development of individuality of handwriting, we encountered the fact that feature values are missing for several individuals at several time instances. Six algorithms, i.e., random imputation, mean imputation, most likely independent value imputation, and three methods based on Bayesian network (static Bayesian network, parameter EM, and structural EM), are compared with children's handwriting data. We evaluate the accuracy and robustness of the algorithms under different ratios of missing data and missing values, and useful conclusions are given. Specifically, static Bayesian network is used for our data which contain around 5% missing data to provide adequate accuracy and low computational cost.
Imputed prices of greenhouse gases and land forests

International Nuclear Information System (INIS)

Uzawa, Hirofumi

1993-01-01

The theory of dynamic optimum formulated by Maeler gives us the basic theoretical framework within which it is possible to analyse the economic and, possibly, political circumstances under which the phenomenon of global warming occurs, and to search for the policy and institutional arrangements whereby it would be effectively arrested. The analysis developed here is an application of Maeler's theory to atmospheric quality. In the analysis a central role is played by the concept of imputed price in the dynamic context. Our determination of imputed prices of atmospheric carbon dioxide and land forests takes into account the difference in the stages of economic development. Indeed, the ratios of the imputed prices of atmospheric carbon dioxide and land forests over the per capita level of real national income are identical for all countries involved. (3 figures, 2 tables) (Author)

Multiple Improvements of Multiple Imputation Likelihood Ratio Tests

OpenAIRE

Chan, Kin Wai; Meng, Xiao-Li

2017-01-01

Multiple imputation (MI) inference handles missing data by first properly imputing the missing values $m$ times, and then combining the $m$ analysis results from applying a complete-data procedure to each of the completed datasets. However, the existing method for combining likelihood ratio tests has multiple defects: (i) the combined test statistic can be negative in practice when the reference null distribution is a standard $F$ distribution; (ii) it is not invariant to re-parametrization; ...
48 CFR 1830.7002-4 - Determining imputed cost of money.

Science.gov (United States)

2010-10-01

... money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS AND... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the representative...
Optimized Use of Low-Depth Genotyping-by-Sequencing for Genomic Prediction Among Multi-Parental Family Pools and Single Plants in Perennial Ryegrass (Lolium perenne L.

Directory of Open Access Journals (Sweden)

Fabio Cericola

2018-03-01

Full Text Available Ryegrass single plants, bi-parental family pools, and multi-parental family pools are often genotyped, based on allele-frequencies using genotyping-by-sequencing (GBS assays. GBS assays can be performed at low-coverage depth to reduce costs. However, reducing the coverage depth leads to a higher proportion of missing data, and leads to a reduction in accuracy when identifying the allele-frequency at each locus. As a consequence of the latter, genomic relationship matrices (GRMs will be biased. This bias in GRMs affects variance estimates and the accuracy of GBLUP for genomic prediction (GBLUP-GP. We derived equations that describe the bias from low-coverage sequencing as an effect of binomial sampling of sequence reads, and allowed for any ploidy level of the sample considered. This allowed us to combine individual and pool genotypes in one GRM, treating pool-genotypes as a polyploid genotype, equal to the total ploidy-level of the parents of the pool. Using simulated data, we verified the magnitude of the GRM bias at different coverage depths for three different kinds of ryegrass breeding material: individual genotypes from single plants, pool-genotypes from F2 families, and pool-genotypes from synthetic varieties. To better handle missing data, we also tested imputation procedures, which are suited for analyzing allele-frequency genomic data. The relative advantages of the bias-correction and the imputation of missing data were evaluated using real data. We examined a large dataset, including single plants, F2 families, and synthetic varieties genotyped in three GBS assays, each with a different coverage depth, and evaluated them for heading date, crown rust resistance, and seed yield. Cross validations were used to test the accuracy using GBLUP approaches, demonstrating the feasibility of predicting among different breeding material. Bias-corrected GRMs proved to increase predictive accuracies when compared with standard approaches to
Synthetic Multiple-Imputation Procedure for Multistage Complex Samples

Directory of Open Access Journals (Sweden)

Zhou Hanzhi

2016-03-01

Full Text Available Multiple imputation (MI is commonly used when item-level missing data are present. However, MI requires that survey design information be built into the imputation models. For multistage stratified clustered designs, this requires dummy variables to represent strata as well as primary sampling units (PSUs nested within each stratum in the imputation model. Such a modeling strategy is not only operationally burdensome but also inferentially inefficient when there are many strata in the sample design. Complexity only increases when sampling weights need to be modeled. This article develops a generalpurpose analytic strategy for population inference from complex sample designs with item-level missingness. In a simulation study, the proposed procedures demonstrate efficient estimation and good coverage properties. We also consider an application to accommodate missing body mass index (BMI data in the analysis of BMI percentiles using National Health and Nutrition Examination Survey (NHANES III data. We argue that the proposed methods offer an easy-to-implement solution to problems that are not well-handled by current MI techniques. Note that, while the proposed method borrows from the MI framework to develop its inferential methods, it is not designed as an alternative strategy to release multiply imputed datasets for complex sample design data, but rather as an analytic strategy in and of itself.
TRIP: An interactive retrieving-inferring data imputation approach

KAUST Repository

Li, Zhixu

2016-06-25

Data imputation aims at filling in missing attribute values in databases. Existing imputation approaches to nonquantitive string data can be roughly put into two categories: (1) inferring-based approaches [2], and (2) retrieving-based approaches [1]. Specifically, the inferring-based approaches find substitutes or estimations for the missing ones from the complete part of the data set. However, they typically fall short in filling in unique missing attribute values which do not exist in the complete part of the data set [1]. The retrieving-based approaches resort to external resources for help by formulating proper web search queries to retrieve web pages containing the missing values from the Web, and then extracting the missing values from the retrieved web pages [1]. This webbased retrieving approach reaches a high imputation precision and recall, but on the other hand, issues a large number of web search queries, which brings a large overhead [1]. © 2016 IEEE.
TRIP: An interactive retrieving-inferring data imputation approach

KAUST Repository

Li, Zhixu; Qin, Lu; Cheng, Hong; Zhang, Xiangliang; Zhou, Xiaofang

2016-01-01

Data imputation aims at filling in missing attribute values in databases. Existing imputation approaches to nonquantitive string data can be roughly put into two categories: (1) inferring-based approaches [2], and (2) retrieving-based approaches [1]. Specifically, the inferring-based approaches find substitutes or estimations for the missing ones from the complete part of the data set. However, they typically fall short in filling in unique missing attribute values which do not exist in the complete part of the data set [1]. The retrieving-based approaches resort to external resources for help by formulating proper web search queries to retrieve web pages containing the missing values from the Web, and then extracting the missing values from the retrieved web pages [1]. This webbased retrieving approach reaches a high imputation precision and recall, but on the other hand, issues a large number of web search queries, which brings a large overhead [1]. © 2016 IEEE.
An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.

Science.gov (United States)

Liu, Yuzhe; Gopalakrishnan, Vanathi

2017-03-01

Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.
Comparison of nitrate accumulation, nitrogen uptake and utilization efficiency among different spinach (Spinacia oleracea L. genotypes

Directory of Open Access Journals (Sweden)

Zhou Jianjian

2017-10-01

Full Text Available A hydroponic experiment was conducted to study the difference of nitrate accumulation,nitrogen uptake and utilization efficiency between four spinach (Spinacia oleracea L. genotypes (So10,So13,So18 and So57. Results showed that So13 had the highest nitrate contents under two nitrate (NO3--N level (0.5 mmol·L-1,15 mmol·L-1 conditions,whereas So10 had the lowest nitrate contents. So18 had the similar nitrate contents with So13 under low NO3- level,while it showed no significant difference of nitrate contents with So57 under high NO3- treatment. The 15NO3--N uptake rates of So13 were the highest one among four genotypes,while the N utilization efficiency (NutE and N utilization ratio (NUR of So13 were significantly lower than those of So18 and So57. The shoot dry mass,nitrate reductase activity,NutE,NUR of So18 and So57 were higher than those of So13 and So10,while their 15NO3--N uptake rates were lower than those of So13. The shoot dry mass,nitrate reductase activity,NutE,N utilization ratio of So10 were significantly lower than those of So18 and So57,and its 15NO3--N uptake rate was significantly lower than those of So13. Among the four spinach genotypes,the So57 can be selected as elite germplasm using for spinach production for its relatively lower nitrate content and higher N efficiency.
Rape genotypic differences in P uptake and utilization from phosphate rocks in an Andisol of Chile

International Nuclear Information System (INIS)

Montenegro, A.; Zapata, F.

2002-01-01

A main constraint to agricultural productivity in the southern regions of Chile is the low available soil P exacerbated by the high P sorption capacity of the predominant Andisols. Therefore, substantial amounts of P fertilizers must be applied to obtain optimum growth and crop yields. One cost-effective strategy followed to supply P to crops grown in these soils is the direct application of the local Bahia Inglesa PR source. However, a more sustainable strategy would be to combine the use of the local PR with the crop species and cultivars that are able to grow in these acid soils and can utilize efficiently PR. Rape is reported to be very efficient in utilising P from PR sources due to its capacity to exude organic acids to the rhizosphere. Therefore, the present study was conducted to evaluate the ability of five rape cultivars grown in an Andisol of southern Chile in utilising P from two PR sources (Bahia Inglesa and Bayovar) and triple superphosphate, a water-soluble P fertilizer. It was found that rape was able to absorb significant amounts of P from the PR sources and much less from the TSP and soil P. Both Bahia Inglesa and Bayovar PRs were found to be as effective as TSP for the rape genotypes in the Andisol Pemehue. The use of the 32 P isotope technique enabled to assess the ability of the genotypes tested to utilize P from the different P fertilizers applied. The genotypes G2 and G3 showed increased P acquisition from the PR than the genotype G5. Combined utilization of P efficient genotypes and direct application of the Bahia Inglesa PR seems to be a promising technology for attaining sustainable agricultural productivity in the Andisols of Chile. Further field trials for validating these findings at the level of cropping systems are needed. This agronomic testing should be accompanied by in-depth studies to assess the relative importance of the morphological and physiological traits determining a higher P efficiency. (author)
Efficient genome-wide genotyping strategies and data integration in crop plants.

Science.gov (United States)

Torkamaneh, Davoud; Boyle, Brian; Belzile, François

2018-03-01

Next-generation sequencing (NGS) has revolutionized plant and animal research by providing powerful genotyping methods. This review describes and discusses the advantages, challenges and, most importantly, solutions to facilitate data processing, the handling of missing data, and cross-platform data integration. Next-generation sequencing technologies provide powerful and flexible genotyping methods to plant breeders and researchers. These methods offer a wide range of applications from genome-wide analysis to routine screening with a high level of accuracy and reproducibility. Furthermore, they provide a straightforward workflow to identify, validate, and screen genetic variants in a short time with a low cost. NGS-based genotyping methods include whole-genome re-sequencing, SNP arrays, and reduced representation sequencing, which are widely applied in crops. The main challenges facing breeders and geneticists today is how to choose an appropriate genotyping method and how to integrate genotyping data sets obtained from various sources. Here, we review and discuss the advantages and challenges of several NGS methods for genome-wide genetic marker development and genotyping in crop plants. We also discuss how imputation methods can be used to both fill in missing data in genotypic data sets and to integrate data sets obtained using different genotyping tools. It is our hope that this synthetic view of genotyping methods will help geneticists and breeders to integrate these NGS-based methods in crop plant breeding and research.
[Imputing missing data in public health: general concepts and application to dichotomous variables].

Science.gov (United States)

Hernández, Gilma; Moriña, David; Navarro, Albert

The presence of missing data in collected variables is common in health surveys, but the subsequent imputation thereof at the time of analysis is not. Working with imputed data may have certain benefits regarding the precision of the estimators and the unbiased identification of associations between variables. The imputation process is probably still little understood by many non-statisticians, who view this process as highly complex and with an uncertain goal. To clarify these questions, this note aims to provide a straightforward, non-exhaustive overview of the imputation process to enable public health researchers ascertain its strengths. All this in the context of dichotomous variables which are commonplace in public health. To illustrate these concepts, an example in which missing data is handled by means of simple and multiple imputation is introduced. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.
Missing data imputation using statistical and machine learning methods in a real breast cancer problem.

Science.gov (United States)

Jerez, José M; Molina, Ignacio; García-Laencina, Pedro J; Alba, Emilio; Ribelles, Nuria; Martín, Miguel; Franco, Leonardo

2010-10-01

Missing data imputation is an important task in cases where it is crucial to use all available data and not discard records with missing values. This work evaluates the performance of several statistical and machine learning imputation methods that were used to predict recurrence in patients in an extensive real breast cancer data set. Imputation methods based on statistical techniques, e.g., mean, hot-deck and multiple imputation, and machine learning techniques, e.g., multi-layer perceptron (MLP), self-organisation maps (SOM) and k-nearest neighbour (KNN), were applied to data collected through the "El Álamo-I" project, and the results were then compared to those obtained from the listwise deletion (LD) imputation method. The database includes demographic, therapeutic and recurrence-survival information from 3679 women with operable invasive breast cancer diagnosed in 32 different hospitals belonging to the Spanish Breast Cancer Research Group (GEICAM). The accuracies of predictions on early cancer relapse were measured using artificial neural networks (ANNs), in which different ANNs were estimated using the data sets with imputed missing values. The imputation methods based on machine learning algorithms outperformed imputation statistical methods in the prediction of patient outcome. Friedman's test revealed a significant difference (p=0.0091) in the observed area under the ROC curve (AUC) values, and the pairwise comparison test showed that the AUCs for MLP, KNN and SOM were significantly higher (p=0.0053, p=0.0048 and p=0.0071, respectively) than the AUC from the LD-based prognosis model. The methods based on machine learning techniques were the most suited for the imputation of missing values and led to a significant enhancement of prognosis accuracy compared to imputation methods based on statistical procedures. Copyright © 2010 Elsevier B.V. All rights reserved.
Missing Value Imputation Based on Gaussian Mixture Model for the Internet of Things

Directory of Open Access Journals (Sweden)

Xiaobo Yan

2015-01-01

Full Text Available This paper addresses missing value imputation for the Internet of Things (IoT. Nowadays, the IoT has been used widely and commonly by a variety of domains, such as transportation and logistics domain and healthcare domain. However, missing values are very common in the IoT for a variety of reasons, which results in the fact that the experimental data are incomplete. As a result of this, some work, which is related to the data of the IoT, can’t be carried out normally. And it leads to the reduction in the accuracy and reliability of the data analysis results. This paper, for the characteristics of the data itself and the features of missing data in IoT, divides the missing data into three types and defines three corresponding missing value imputation problems. Then, we propose three new models to solve the corresponding problems, and they are model of missing value imputation based on context and linear mean (MCL, model of missing value imputation based on binary search (MBS, and model of missing value imputation based on Gaussian mixture model (MGI. Experimental results showed that the three models can improve the accuracy, reliability, and stability of missing value imputation greatly and effectively.
Multiple imputation by chained equations for systematically and sporadically missing multilevel data.

Science.gov (United States)

Resche-Rigon, Matthieu; White, Ian R

2018-06-01

In multilevel settings such as individual participant data meta-analysis, a variable is 'systematically missing' if it is wholly missing in some clusters and 'sporadically missing' if it is partly missing in some clusters. Previously proposed methods to impute incomplete multilevel data handle either systematically or sporadically missing data, but frequently both patterns are observed. We describe a new multiple imputation by chained equations (MICE) algorithm for multilevel data with arbitrary patterns of systematically and sporadically missing variables. The algorithm is described for multilevel normal data but can easily be extended for other variable types. We first propose two methods for imputing a single incomplete variable: an extension of an existing method and a new two-stage method which conveniently allows for heteroscedastic data. We then discuss the difficulties of imputing missing values in several variables in multilevel data using MICE, and show that even the simplest joint multilevel model implies conditional models which involve cluster means and heteroscedasticity. However, a simulation study finds that the proposed methods can be successfully combined in a multilevel MICE procedure, even when cluster means are not included in the imputation models.
Analysis of Case-Control Association Studies: SNPs, Imputation and Haplotypes

KAUST Repository

Chatterjee, Nilanjan; Chen, Yi-Hau; Luo, Sheng; Carroll, Raymond J.

2009-01-01

Although prospective logistic regression is the standard method of analysis for case-control data, it has been recently noted that in genetic epidemiologic studies one can use the "retrospective" likelihood to gain major power by incorporating various population genetics model assumptions such as Hardy-Weinberg-Equilibrium (HWE), gene-gene and gene-environment independence. In this article we review these modern methods and contrast them with the more classical approaches through two types of applications (i) association tests for typed and untyped single nucleotide polymorphisms (SNPs) and (ii) estimation of haplotype effects and haplotype-environment interactions in the presence of haplotype-phase ambiguity. We provide novel insights to existing methods by construction of various score-tests and pseudo-likelihoods. In addition, we describe a novel two-stage method for analysis of untyped SNPs that can use any flexible external algorithm for genotype imputation followed by a powerful association test based on the retrospective likelihood. We illustrate applications of the methods using simulated and real data. © Institute of Mathematical Statistics, 2009.
Analysis of Case-Control Association Studies: SNPs, Imputation and Haplotypes

KAUST Repository

Chatterjee, Nilanjan

2009-11-01

Although prospective logistic regression is the standard method of analysis for case-control data, it has been recently noted that in genetic epidemiologic studies one can use the "retrospective" likelihood to gain major power by incorporating various population genetics model assumptions such as Hardy-Weinberg-Equilibrium (HWE), gene-gene and gene-environment independence. In this article we review these modern methods and contrast them with the more classical approaches through two types of applications (i) association tests for typed and untyped single nucleotide polymorphisms (SNPs) and (ii) estimation of haplotype effects and haplotype-environment interactions in the presence of haplotype-phase ambiguity. We provide novel insights to existing methods by construction of various score-tests and pseudo-likelihoods. In addition, we describe a novel two-stage method for analysis of untyped SNPs that can use any flexible external algorithm for genotype imputation followed by a powerful association test based on the retrospective likelihood. We illustrate applications of the methods using simulated and real data. © Institute of Mathematical Statistics, 2009.
Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.

Science.gov (United States)

Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A

2016-01-01

Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.
Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

NARCIS (Netherlands)

I. Tachmazidou (Ioanna); Süveges, D. (Dániel); J. Min (Josine); G.R.S. Ritchie (Graham R.S.); Steinberg, J. (Julia); K. Walter (Klaudia); V. Iotchkova (Valentina); J.A. Schwartzentruber (Jeremy); J. Huang (Jian); Y. Memari (Yasin); McCarthy, S. (Shane); Crawford, A.A. (Andrew A.); C. Bombieri (Cristina); M. Cocca (Massimiliano); A.-E. Farmaki (Aliki-Eleni); T.R. Gaunt (Tom); P. Jousilahti (Pekka); M.N. Kooijman (Marjolein ); Lehne, B. (Benjamin); G. Malerba (Giovanni); S. Männistö (Satu); A. Matchan (Angela); M.C. Medina-Gomez (Carolina); S. Metrustry (Sarah); A. Nag (Abhishek); I. Ntalla (Ioanna); L. Paternoster (Lavinia); N.W. Rayner (Nigel William); C. Sala (Cinzia); W.R. Scott (William R.); H.A. Shihab (Hashem A.); L. Southam (Lorraine); B. St Pourcain (Beate); M. Traglia (Michela); K. Trajanoska (Katerina); Zaza, G. (Gialuigi); W. Zhang (Weihua); M.S. Artigas; Bansal, N. (Narinder); M. Benn (Marianne); Chen, Z. (Zhongsheng); P. Danecek (Petr); Lin, W.-Y. (Wei-Yu); A. Locke (Adam); J. Luan (Jian'An); A.K. Manning (Alisa); Mulas, A. (Antonella); C. Sidore (Carlo); A. Tybjaerg-Hansen; A. Varbo (Anette); M. Zoledziewska (Magdalena); C. Finan (Chris); Hatzikotoulas, K. (Konstantinos); A.E. Hendricks (Audrey E.); J.P. Kemp (John); A. Moayyeri (Alireza); Panoutsopoulou, K. (Kalliope); Szpak, M. (Michal); S.G. Wilson (Scott); M. Boehnke (Michael); F. Cucca (Francesco); Di Angelantonio, E. (Emanuele); C. Langenberg (Claudia); C.M. Lindgren (Cecilia M.); McCarthy, M.I. (Mark I.); A.P. Morris (Andrew); B.G. Nordestgaard (Børge); R.A. Scott (Robert); M.D. Tobin (Martin); N.J. Wareham (Nick); P.R. Burton (Paul); J.C. Chambers (John); Smith, G.D. (George Davey); G.V. Dedoussis (George); J.F. Felix (Janine); O.H. Franco (Oscar); Gambaro, G. (Giovanni); P. Gasparini (Paolo); C.J. Hammond (Christopher J.); A. Hofman (Albert); V.W.V. Jaddoe (Vincent); M.E. Kleber (Marcus); J.S. Kooner (Jaspal S.); M. Perola (Markus); C.L. Relton (Caroline); S.M. Ring (Susan); F. Rivadeneira Ramirez (Fernando); V. Salomaa (Veikko); T.D. Spector (Timothy); O. Stegle (Oliver); D. Toniolo (Daniela); A.G. Uitterlinden (André); I.E. Barroso (Inês); C.M.T. Greenwood (Celia); Perry, J.R.B. (John R.B.); Walker, B.R. (Brian R.); A.S. Butterworth (Adam); Y. Xue (Yali); R. Durbin (Richard); K.S. Small (Kerrin); N. Soranzo (Nicole); N.J. Timpson (Nicholas); E. Zeggini (Eleftheria)

2016-01-01

textabstractDeep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the
Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

DEFF Research Database (Denmark)

Tachmazidou, Ioanna; Süveges, Dániel; Min, Josine L

2017-01-01

Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader alleli...
Missing value imputation in DNA microarrays based on conjugate gradient method.

Science.gov (United States)

Dorri, Fatemeh; Azmi, Paeiz; Dorri, Faezeh

2012-02-01

Analysis of gene expression profiles needs a complete matrix of gene array values; consequently, imputation methods have been suggested. In this paper, an algorithm that is based on conjugate gradient (CG) method is proposed to estimate missing values. k-nearest neighbors of the missed entry are first selected based on absolute values of their Pearson correlation coefficient. Then a subset of genes among the k-nearest neighbors is labeled as the best similar ones. CG algorithm with this subset as its input is then used to estimate the missing values. Our proposed CG based algorithm (CGimpute) is evaluated on different data sets. The results are compared with sequential local least squares (SLLSimpute), Bayesian principle component analysis (BPCAimpute), local least squares imputation (LLSimpute), iterated local least squares imputation (ILLSimpute) and adaptive k-nearest neighbors imputation (KNNKimpute) methods. The average of normalized root mean squares error (NRMSE) and relative NRMSE in different data sets with various missing rates shows CGimpute outperforms other methods. Copyright © 2011 Elsevier Ltd. All rights reserved.

Clustering with Missing Values: No Imputation Required

Science.gov (United States)

Wagstaff, Kiri

2004-01-01

Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.
Time Series Imputation via L1 Norm-Based Singular Spectrum Analysis

Science.gov (United States)

Kalantari, Mahdi; Yarmohammadi, Masoud; Hassani, Hossein; Silva, Emmanuel Sirimal

Missing values in time series data is a well-known and important problem which many researchers have studied extensively in various fields. In this paper, a new nonparametric approach for missing value imputation in time series is proposed. The main novelty of this research is applying the L1 norm-based version of Singular Spectrum Analysis (SSA), namely L1-SSA which is robust against outliers. The performance of the new imputation method has been compared with many other established methods. The comparison is done by applying them to various real and simulated time series. The obtained results confirm that the SSA-based methods, especially L1-SSA can provide better imputation in comparison to other methods.
Differential network analysis with multiply imputed lipidomic data.

Directory of Open Access Journals (Sweden)

Maiju Kujala

Full Text Available The importance of lipids for cell function and health has been widely recognized, e.g., a disorder in the lipid composition of cells has been related to atherosclerosis caused cardiovascular disease (CVD. Lipidomics analyses are characterized by large yet not a huge number of mutually correlated variables measured and their associations to outcomes are potentially of a complex nature. Differential network analysis provides a formal statistical method capable of inferential analysis to examine differences in network structures of the lipids under two biological conditions. It also guides us to identify potential relationships requiring further biological investigation. We provide a recipe to conduct permutation test on association scores resulted from partial least square regression with multiple imputed lipidomic data from the LUdwigshafen RIsk and Cardiovascular Health (LURIC study, particularly paying attention to the left-censored missing values typical for a wide range of data sets in life sciences. Left-censored missing values are low-level concentrations that are known to exist somewhere between zero and a lower limit of quantification. To make full use of the LURIC data with the missing values, we utilize state of the art multiple imputation techniques and propose solutions to the challenges that incomplete data sets bring to differential network analysis. The customized network analysis helps us to understand the complexities of the underlying biological processes by identifying lipids and lipid classes that interact with each other, and by recognizing the most important differentially expressed lipids between two subgroups of coronary artery disease (CAD patients, the patients that had a fatal CVD event and the ones who remained stable during two year follow-up.
Assessment of imputation methods using varying ecological information to fill the gaps in a tree functional trait database

Science.gov (United States)

Poyatos, Rafael; Sus, Oliver; Vilà-Cabrera, Albert; Vayreda, Jordi; Badiella, Llorenç; Mencuccini, Maurizio; Martínez-Vilalta, Jordi

2016-04-01

Plant functional traits are increasingly being used in ecosystem ecology thanks to the growing availability of large ecological databases. However, these databases usually contain a large fraction of missing data because measuring plant functional traits systematically is labour-intensive and because most databases are compilations of datasets with different sampling designs. As a result, within a given database, there is an inevitable variability in the number of traits available for each data entry and/or the species coverage in a given geographical area. The presence of missing data may severely bias trait-based analyses, such as the quantification of trait covariation or trait-environment relationships and may hamper efforts towards trait-based modelling of ecosystem biogeochemical cycles. Several data imputation (i.e. gap-filling) methods have been recently tested on compiled functional trait databases, but the performance of imputation methods applied to a functional trait database with a regular spatial sampling has not been thoroughly studied. Here, we assess the effects of data imputation on five tree functional traits (leaf biomass to sapwood area ratio, foliar nitrogen, maximum height, specific leaf area and wood density) in the Ecological and Forest Inventory of Catalonia, an extensive spatial database (covering 31900 km2). We tested the performance of species mean imputation, single imputation by the k-nearest neighbors algorithm (kNN) and a multiple imputation method, Multivariate Imputation with Chained Equations (MICE) at different levels of missing data (10%, 30%, 50%, and 80%). We also assessed the changes in imputation performance when additional predictors (species identity, climate, forest structure, spatial structure) were added in kNN and MICE imputations. We evaluated the imputed datasets using a battery of indexes describing departure from the complete dataset in trait distribution, in the mean prediction error, in the correlation matrix
A nonparametric multiple imputation approach for missing categorical data

Directory of Open Access Journals (Sweden)

Muhan Zhou

2017-06-01

Full Text Available Abstract Background Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness probabilities. Methods We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model and the other fits a logistic regression for predicting missingness probabilities (the missingness model. A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. Results The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. Conclusions We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with
Fully conditional specification in multivariate imputation

NARCIS (Netherlands)

van Buuren, S.; Brand, J. P.L.; Groothuis-Oudshoorn, C. G.M.; Rubin, D. B.

2006-01-01

The use of the Gibbs sampler with fully conditionally specified models, where the distribution of each variable given the other variables is the starting point, has become a popular method to create imputations in incomplete multivariate data. The theoretical weakness of this approach is that the
Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information

Science.gov (United States)

Poyatos, Rafael; Sus, Oliver; Badiella, Llorenç; Mencuccini, Maurizio; Martínez-Vilalta, Jordi

2018-05-01

The ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. At the same time, they offer specific challenges in terms of data imputation. Here we compare statistical imputation approaches, using varying levels of environmental information, for five plant traits (leaf biomass to sapwood area ratio, leaf nitrogen content, maximum tree height, leaf mass per area and wood density) in a spatially explicit plant trait dataset of temperate and Mediterranean tree species (Ecological and Forest Inventory of Catalonia, IEFC, dataset for Catalonia, north-east Iberian Peninsula, 31 900 km2). We simulated gaps at different missingness levels (10-80 %) in a complete trait matrix, and we used overall trait means, species means, k nearest neighbours (kNN), ordinary and regression kriging, and multivariate imputation using chained equations (MICE) to impute missing trait values. We assessed these methods in terms of their accuracy and of their ability to preserve trait distributions, multi-trait correlation structure and bivariate trait relationships. The relatively good performance of mean and species mean imputations in terms of accuracy masked a poor representation of trait distributions and multivariate trait structure. Species identity improved MICE imputations for all traits, whereas forest structure and topography improved imputations for some traits. No method performed best consistently for the five studied traits, but, considering all traits and performance metrics, MICE informed by relevant ecological variables gave the best results. However, at higher missingness (> 30 %), species mean imputations and regression kriging tended to outperform MICE for some traits. MICE informed by relevant ecological variables allowed us to fill the gaps in
A Maximum-Likelihood Method to Correct for Allelic Dropout in Microsatellite Data with No Replicate Genotypes

Science.gov (United States)

Wang, Chaolong; Schroeder, Kari B.; Rosenberg, Noah A.

2012-01-01

Allelic dropout is a commonly observed source of missing data in microsatellite genotypes, in which one or both allelic copies at a locus fail to be amplified by the polymerase chain reaction. Especially for samples with poor DNA quality, this problem causes a downward bias in estimates of observed heterozygosity and an upward bias in estimates of inbreeding, owing to mistaken classifications of heterozygotes as homozygotes when one of the two copies drops out. One general approach for avoiding allelic dropout involves repeated genotyping of homozygous loci to minimize the effects of experimental error. Existing computational alternatives often require replicate genotyping as well. These approaches, however, are costly and are suitable only when enough DNA is available for repeated genotyping. In this study, we propose a maximum-likelihood approach together with an expectation-maximization algorithm to jointly estimate allelic dropout rates and allele frequencies when only one set of nonreplicated genotypes is available. Our method considers estimates of allelic dropout caused by both sample-specific factors and locus-specific factors, and it allows for deviation from Hardy–Weinberg equilibrium owing to inbreeding. Using the estimated parameters, we correct the bias in the estimation of observed heterozygosity through the use of multiple imputations of alleles in cases where dropout might have occurred. With simulated data, we show that our method can (1) effectively reproduce patterns of missing data and heterozygosity observed in real data; (2) correctly estimate model parameters, including sample-specific dropout rates, locus-specific dropout rates, and the inbreeding coefficient; and (3) successfully correct the downward bias in estimating the observed heterozygosity. We find that our method is fairly robust to violations of model assumptions caused by population structure and by genotyping errors from sources other than allelic dropout. Because the data sets
Different methods for analysing and imputation missing values in wind speed series; La problematica de la calidad de la informacion en series de velocidad del viento-metodologias de analisis y imputacion de datos faltantes

Energy Technology Data Exchange (ETDEWEB)

Ferreira, A. M.

2004-07-01

This study concerns about different methods for analysing and imputation missing values in wind speed series. The algorithm EM and a methodology derivated from the sequential hot deck have been utilized. Series with missing values imputed are compared with original and complete series, using several criteria, such the wind potential; and appears to exist a significant goodness of fit between the estimates and real values. (Author)
Fine scale mapping of the 17q22 breast cancer locus using dense SNPs, genotyped within the Collaborative Oncological Gene-Environment Study (COGs)

OpenAIRE

Darabi, Hatef; Beesley, Jonathan; Droit, Arnaud; Kar, Siddhartha; Nord, Silje; Moradi Marjaneh, Mahdi; Soucy, Penny; Michailidou, Kyriaki; Ghoussaini, Maya; Fues Wahl, Hanna; Bolla, Manjeet K.; Wang, Qin; Dennis, Joe; Alonso, M Rosario; Andrulis, Irene L.

2016-01-01

Genome-wide association studies have found SNPs at 17q22 to be associated with breast cancer risk. To identify potential causal variants related to breast cancer risk, we performed a high resolution fine-mapping analysis that involved genotyping 517 SNPs using a custom Illumina iSelect array (iCOGS) followed by imputation of genotypes for 3,134 SNPs in more than 89,000 participants of European ancestry from the Breast Cancer Association Consortium (BCAC). We identified 28 highly correlated co...
Relative efficiency of joint-model and full-conditional-specification multiple imputation when conditional models are compatible: The general location model.

Science.gov (United States)

Seaman, Shaun R; Hughes, Rachael A

2018-06-01

Estimating the parameters of a regression model of interest is complicated by missing data on the variables in that model. Multiple imputation is commonly used to handle these missing data. Joint model multiple imputation and full-conditional specification multiple imputation are known to yield imputed data with the same asymptotic distribution when the conditional models of full-conditional specification are compatible with that joint model. We show that this asymptotic equivalence of imputation distributions does not imply that joint model multiple imputation and full-conditional specification multiple imputation will also yield asymptotically equally efficient inference about the parameters of the model of interest, nor that they will be equally robust to misspecification of the joint model. When the conditional models used by full-conditional specification multiple imputation are linear, logistic and multinomial regressions, these are compatible with a restricted general location joint model. We show that multiple imputation using the restricted general location joint model can be substantially more asymptotically efficient than full-conditional specification multiple imputation, but this typically requires very strong associations between variables. When associations are weaker, the efficiency gain is small. Moreover, full-conditional specification multiple imputation is shown to be potentially much more robust than joint model multiple imputation using the restricted general location model to mispecification of that model when there is substantial missingness in the outcome variable.
A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers

Directory of Open Access Journals (Sweden)

Concepción Crespo Turrado

2015-12-01

Full Text Available Nowadays, data collection is a key process in the study of electrical power networks when searching for harmonics and a lack of balance among phases. In this context, the lack of data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, and current in each phase and power factor adversely affects any time series study performed. When this occurs, a data imputation process must be accomplished in order to substitute the data that is missing for estimated values. This paper presents a novel missing data imputation method based on multivariate adaptive regression splines (MARS and compares it with the well-known technique called multivariate imputation by chained equations (MICE. The results obtained demonstrate how the proposed method outperforms the MICE algorithm.
Imputation of missing data in time series for air pollutants

Science.gov (United States)

Junger, W. L.; Ponce de Leon, A.

2015-02-01

Missing data are major concerns in epidemiological studies of the health effects of environmental air pollutants. This article presents an imputation-based method that is suitable for multivariate time series data, which uses the EM algorithm under the assumption of normal distribution. Different approaches are considered for filtering the temporal component. A simulation study was performed to assess validity and performance of proposed method in comparison with some frequently used methods. Simulations showed that when the amount of missing data was as low as 5%, the complete data analysis yielded satisfactory results regardless of the generating mechanism of the missing data, whereas the validity began to degenerate when the proportion of missing values exceeded 10%. The proposed imputation method exhibited good accuracy and precision in different settings with respect to the patterns of missing observations. Most of the imputations obtained valid results, even under missing not at random. The methods proposed in this study are implemented as a package called mtsdi for the statistical software system R.
Imputation methods for filling missing data in urban air pollution data for Malaysia

Directory of Open Access Journals (Sweden)

Nur Afiqah Zakaria

2018-06-01

Full Text Available The air quality measurement data obtained from the continuous ambient air quality monitoring (CAAQM station usually contained missing data. The missing observations of the data usually occurred due to machine failure, routine maintenance and human error. In this study, the hourly monitoring data of CO, O3, PM10, SO2, NOx, NO2, ambient temperature and humidity were used to evaluate four imputation methods (Mean Top Bottom, Linear Regression, Multiple Imputation and Nearest Neighbour. The air pollutants observations were simulated into four percentages of simulated missing data i.e. 5%, 10%, 15% and 20%. Performance measures namely the Mean Absolute Error, Root Mean Squared Error, Coefficient of Determination and Index of Agreement were used to describe the goodness of fit of the imputation methods. From the results of the performance measures, Mean Top Bottom method was selected as the most appropriate imputation method for filling in the missing values in air pollutants data.
Genetic evaluation with major genes and polygenic inheritance when some animals are not genotyped using gene content multiple-trait BLUP.

Science.gov (United States)

Legarra, Andrés; Vitezica, Zulma G

2015-11-17

In pedigreed populations with a major gene segregating for a quantitative trait, it is not clear how to use pedigree, genotype and phenotype information when some individuals are not genotyped. We propose to consider gene content at the major gene as a second trait correlated to the quantitative trait, in a gene content multiple-trait best linear unbiased prediction (GCMTBLUP) method. The genetic covariance between the trait and gene content at the major gene is a function of the substitution effect of the gene. This genetic covariance can be written in a multiple-trait form that accommodates any pattern of missing values for either genotype or phenotype data. Effects of major gene alleles and the genetic covariance between genotype at the major gene and the phenotype can be estimated using standard EM-REML or Gibbs sampling. Prediction of breeding values with genotypes at the major gene can use multiple-trait BLUP software. Major genes with more than two alleles can be considered by including negative covariances between gene contents at each different allele. We simulated two scenarios: a selected and an unselected trait with heritabilities of 0.05 and 0.5, respectively. In both cases, the major gene explained half the genetic variation. Competing methods used imputed gene contents derived by the method of Gengler et al. or by iterative peeling. Imputed gene contents, in contrast to GCMTBLUP, do not consider information on the quantitative trait for genotype prediction. GCMTBLUP gave unbiased estimates of the gene effect, in contrast to the other methods, with less bias and better or equal accuracy of prediction. GCMTBLUP improved estimation of genotypes in non-genotyped individuals, in particular if these individuals had own phenotype records and the trait had a high heritability. Ignoring the major gene in genetic evaluation led to serious biases and decreased prediction accuracy. CGMTBLUP is the best linear predictor of additive genetic merit including
On multivariate imputation and forecasting of decadal wind speed missing data.

Science.gov (United States)

Wesonga, Ronald

2015-01-01

This paper demonstrates the application of multiple imputations by chained equations and time series forecasting of wind speed data. The study was motivated by the high prevalence of missing wind speed historic data. Findings based on the fully conditional specification under multiple imputations by chained equations, provided reliable wind speed missing data imputations. Further, the forecasting model shows, the smoothing parameter, alpha (0.014) close to zero, confirming that recent past observations are more suitable for use to forecast wind speeds. The maximum decadal wind speed for Entebbe International Airport was estimated to be 17.6 metres per second at a 0.05 level of significance with a bound on the error of estimation of 10.8 metres per second. The large bound on the error of estimations confirms the dynamic tendencies of wind speed at the airport under study.
Comparison of missing value imputation methods in time series: the case of Turkish meteorological data

Science.gov (United States)

Yozgatligil, Ceylan; Aslan, Sipan; Iyigun, Cem; Batmaz, Inci

2013-04-01

This study aims to compare several imputation methods to complete the missing values of spatio-temporal meteorological time series. To this end, six imputation methods are assessed with respect to various criteria including accuracy, robustness, precision, and efficiency for artificially created missing data in monthly total precipitation and mean temperature series obtained from the Turkish State Meteorological Service. Of these methods, simple arithmetic average, normal ratio (NR), and NR weighted with correlations comprise the simple ones, whereas multilayer perceptron type neural network and multiple imputation strategy adopted by Monte Carlo Markov Chain based on expectation-maximization (EM-MCMC) are computationally intensive ones. In addition, we propose a modification on the EM-MCMC method. Besides using a conventional accuracy measure based on squared errors, we also suggest the correlation dimension (CD) technique of nonlinear dynamic time series analysis which takes spatio-temporal dependencies into account for evaluating imputation performances. Depending on the detailed graphical and quantitative analysis, it can be said that although computational methods, particularly EM-MCMC method, are computationally inefficient, they seem favorable for imputation of meteorological time series with respect to different missingness periods considering both measures and both series studied. To conclude, using the EM-MCMC algorithm for imputing missing values before conducting any statistical analyses of meteorological data will definitely decrease the amount of uncertainty and give more robust results. Moreover, the CD measure can be suggested for the performance evaluation of missing data imputation particularly with computational methods since it gives more precise results in meteorological time series.
Multiple imputation of missing passenger boarding data in the national census of ferry operators

Science.gov (United States)

2008-08-01

This report presents findings from the 2006 National Census of Ferry Operators (NCFO) augmented with imputed values for passengers and passenger miles. Due to the imputation procedures used to calculate missing data, totals in Table 1 may not corresp...
Meta-analysis of sequence-based association studies across three cattle breeds reveals 25 QTL for fat and protein percentages in milk at nucleotide resolution.

Science.gov (United States)

Pausch, Hubert; Emmerling, Reiner; Gredler-Grandl, Birgit; Fries, Ruedi; Daetwyler, Hans D; Goddard, Michael E

2017-11-09

Genotyping and whole-genome sequencing data have been generated for hundreds of thousands of cattle. International consortia used these data to compile imputation reference panels that facilitate the imputation of sequence variant genotypes for animals that have been genotyped using dense microarrays. Association studies with imputed sequence variant genotypes allow for the characterization of quantitative trait loci (QTL) at nucleotide resolution particularly when individuals from several breeds are included in the mapping populations. We imputed genotypes for 28 million sequence variants in 17,229 cattle of the Braunvieh, Fleckvieh and Holstein breeds in order to compile large mapping populations that provide high power to identify QTL for milk production traits. Association tests between imputed sequence variant genotypes and fat and protein percentages in milk uncovered between six and thirteen QTL (P < 1e-8) per breed. Eight of the detected QTL were significant in more than one breed. We combined the results across breeds using meta-analysis and identified a total of 25 QTL including six that were not significant in the within-breed association studies. Two missense mutations in the ABCG2 (p.Y581S, rs43702337, P = 4.3e-34) and GHR (p.F279Y, rs385640152, P = 1.6e-74) genes were the top variants at QTL on chromosomes 6 and 20. Another known causal missense mutation in the DGAT1 gene (p.A232K, rs109326954, P = 8.4e-1436) was the second top variant at a QTL on chromosome 14 but its allelic substitution effects were inconsistent across breeds. It turned out that the conflicting allelic substitution effects resulted from flaws in the imputed genotypes due to the use of a multi-breed reference population for genotype imputation. Many QTL for milk production traits segregate across breeds and across-breed meta-analysis has greater power to detect such QTL than within-breed association testing. Association testing between imputed sequence variant genotypes and
Sequence imputation of HPV16 genomes for genetic association studies.

Directory of Open Access Journals (Sweden)

Benjamin Smith

Full Text Available Human Papillomavirus type 16 (HPV16 causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs is unknown. In addition, we neither know which nucleotides vary across and within HPV types and lineages, nor which of the single nucleotide polymorphisms (SNPs determine oncogenicity.A reference set of 62 HPV16 complete genome sequences was established and used to examine patterns of evolutionary relatedness amongst variants using a pairwise identity heatmap and HPV16 phylogeny. A BLAST-based algorithm was developed to impute complete genome data from partial sequence information using the reference database. To interrogate the oncogenic risk of determined and imputed HPV16 SNPs, odds-ratios for each SNP were calculated in a case-control viral genome-wide association study (VWAS using biopsy confirmed high-grade cervix neoplasia and self-limited HPV16 infections from Guanacaste, Costa Rica.HPV16 variants display evolutionarily stable lineages that contain conserved diagnostic SNPs. The imputation algorithm indicated that an average of 97.5±1.03% of SNPs could be accurately imputed. The VWAS revealed specific HPV16 viral SNPs associated with variant lineages and elevated odds ratios; however, individual causal SNPs could not be distinguished with certainty due to the nature of HPV evolution.Conserved and lineage-specific SNPs can be imputed with a high degree of accuracy from limited viral polymorphic data due to the lack of recombination and the stochastic mechanism of variation accumulation in the HPV genome. However, to determine the role of novel variants or non-lineage-specific SNPs by VWAS will require direct sequence analysis. The investigation of patterns of genetic variation and the identification of diagnostic SNPs for lineages of HPV16 variants provides a valuable resource for future studies of HPV16

Gap-filling a spatially explicit plant trait database: comparing imputation methods and different levels of environmental information

Directory of Open Access Journals (Sweden)

R. Poyatos

2018-05-01

Full Text Available The ubiquity of missing data in plant trait databases may hinder trait-based analyses of ecological patterns and processes. Spatially explicit datasets with information on intraspecific trait variability are rare but offer great promise in improving our understanding of functional biogeography. At the same time, they offer specific challenges in terms of data imputation. Here we compare statistical imputation approaches, using varying levels of environmental information, for five plant traits (leaf biomass to sapwood area ratio, leaf nitrogen content, maximum tree height, leaf mass per area and wood density in a spatially explicit plant trait dataset of temperate and Mediterranean tree species (Ecological and Forest Inventory of Catalonia, IEFC, dataset for Catalonia, north-east Iberian Peninsula, 31 900 km2. We simulated gaps at different missingness levels (10–80 % in a complete trait matrix, and we used overall trait means, species means, k nearest neighbours (kNN, ordinary and regression kriging, and multivariate imputation using chained equations (MICE to impute missing trait values. We assessed these methods in terms of their accuracy and of their ability to preserve trait distributions, multi-trait correlation structure and bivariate trait relationships. The relatively good performance of mean and species mean imputations in terms of accuracy masked a poor representation of trait distributions and multivariate trait structure. Species identity improved MICE imputations for all traits, whereas forest structure and topography improved imputations for some traits. No method performed best consistently for the five studied traits, but, considering all traits and performance metrics, MICE informed by relevant ecological variables gave the best results. However, at higher missingness (> 30 %, species mean imputations and regression kriging tended to outperform MICE for some traits. MICE informed by relevant ecological variables
An Imputation Model for Dropouts in Unemployment Data

Directory of Open Access Journals (Sweden)

Nilsson Petra

2016-09-01

Full Text Available Incomplete unemployment data is a fundamental problem when evaluating labour market policies in several countries. Many unemployment spells end for unknown reasons; in the Swedish Public Employment Service’s register as many as 20 percent. This leads to an ambiguity regarding destination states (employment, unemployment, retired, etc.. According to complete combined administrative data, the employment rate among dropouts was close to 50 for the years 1992 to 2006, but from 2007 the employment rate has dropped to 40 or less. This article explores an imputation approach. We investigate imputation models estimated both on survey data from 2005/2006 and on complete combined administrative data from 2005/2006 and 2011/2012. The models are evaluated in terms of their ability to make correct predictions. The models have relatively high predictive power.
Flexible Modeling of Survival Data with Covariates Subject to Detection Limits via Multiple Imputation.

Science.gov (United States)

Bernhardt, Paul W; Wang, Huixia Judy; Zhang, Daowen

2014-01-01

Models for survival data generally assume that covariates are fully observed. However, in medical studies it is not uncommon for biomarkers to be censored at known detection limits. A computationally-efficient multiple imputation procedure for modeling survival data with covariates subject to detection limits is proposed. This procedure is developed in the context of an accelerated failure time model with a flexible seminonparametric error distribution. The consistency and asymptotic normality of the multiple imputation estimator are established and a consistent variance estimator is provided. An iterative version of the proposed multiple imputation algorithm that approximates the EM algorithm for maximum likelihood is also suggested. Simulation studies demonstrate that the proposed multiple imputation methods work well while alternative methods lead to estimates that are either biased or more variable. The proposed methods are applied to analyze the dataset from a recently-conducted GenIMS study.
A suggested approach for imputation of missing dietary data for young children in daycare.

Science.gov (United States)

Stevens, June; Ou, Fang-Shu; Truesdale, Kimberly P; Zeng, Donglin; Vaughn, Amber E; Pratt, Charlotte; Ward, Dianne S

2015-01-01

Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult. The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method. Data were from children aged 2-5 years in the My Parenting SOS project (n=308; 870 24-h diet recalls). Mixed models were used to simultaneously predict breakfast, dinner, and evening snacks (B+D+ES); lunch; and daytime snacks for all children after adjusting for age, sex, and body mass index (BMI). From these models, we imputed the missing weekday daycare lunches by interpolation using the mean lunch to B+D+ES [L/(B+D+ES)] ratio among non-daycare children on weekdays and the L/(B+D+ES) ratio for all children on weekends. Daytime snack data were used to impute snacks. The reported mean (± standard deviation) weekday intake was lower for daycare children [725 (±324) kcal] compared to non-daycare children [1,048 (±463) kcal]. Weekend intake for all children was 1,173 (±427) kcal. After imputation, weekday caloric intake for daycare children was 1,230 (±409) kcal. Daily intakes that included imputed data were associated with age and sex but not with BMI. This work indicates that imputation is a promising method for improving the precision of daily nutrient data from young children.
Limitations in Using Multiple Imputation to Harmonize Individual Participant Data for Meta-Analysis.

Science.gov (United States)

Siddique, Juned; de Chavez, Peter J; Howe, George; Cruden, Gracelyn; Brown, C Hendricks

2018-02-01

Individual participant data (IPD) meta-analysis is a meta-analysis in which the individual-level data for each study are obtained and used for synthesis. A common challenge in IPD meta-analysis is when variables of interest are measured differently in different studies. The term harmonization has been coined to describe the procedure of placing variables on the same scale in order to permit pooling of data from a large number of studies. Using data from an IPD meta-analysis of 19 adolescent depression trials, we describe a multiple imputation approach for harmonizing 10 depression measures across the 19 trials by treating those depression measures that were not used in a study as missing data. We then apply diagnostics to address the fit of our imputation model. Even after reducing the scale of our application, we were still unable to produce accurate imputations of the missing values. We describe those features of the data that made it difficult to harmonize the depression measures and provide some guidelines for using multiple imputation for harmonization in IPD meta-analysis.
UniFIeD Univariate Frequency-based Imputation for Time Series Data

OpenAIRE

Friese, Martina; Stork, Jörg; Ramos Guerra, Ricardo; Bartz-Beielstein, Thomas; Thaker, Soham; Flasch, Oliver; Zaefferer, Martin

2013-01-01

This paper introduces UniFIeD, a new data preprocessing method for time series. UniFIeD can cope with large intervals of missing data. A scalable test function generator, which allows the simulation of time series with different gap sizes, is presented additionally. An experimental study demonstrates that (i) UniFIeD shows a significant better performance than simple imputation methods and (ii) UniFIeD is able to handle situations, where advanced imputation methods fail. The results are indep...
A suggested approach for imputation of missing dietary data for young children in daycare

Directory of Open Access Journals (Sweden)

June Stevens

2015-12-01

Full Text Available Background: Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult. Objective: The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method. Design: Data were from children aged 2-5 years in the My Parenting SOS project (n=308; 870 24-h diet recalls. Mixed models were used to simultaneously predict breakfast, dinner, and evening snacks (B+D+ES; lunch; and daytime snacks for all children after adjusting for age, sex, and body mass index (BMI. From these models, we imputed the missing weekday daycare lunches by interpolation using the mean lunch to B+D+ES [L/(B+D+ES] ratio among non-daycare children on weekdays and the L/(B+D+ES ratio for all children on weekends. Daytime snack data were used to impute snacks. Results: The reported mean (± standard deviation weekday intake was lower for daycare children [725 (±324 kcal] compared to non-daycare children [1,048 (±463 kcal]. Weekend intake for all children was 1,173 (±427 kcal. After imputation, weekday caloric intake for daycare children was 1,230 (±409 kcal. Daily intakes that included imputed data were associated with age and sex but not with BMI. Conclusion: This work indicates that imputation is a promising method for improving the precision of daily nutrient data from young children.
Imputing forest carbon stock estimates from inventory plots to a nationally continuous coverage

Directory of Open Access Journals (Sweden)

Wilson Barry Tyler

2013-01-01

Full Text Available Abstract The U.S. has been providing national-scale estimates of forest carbon (C stocks and stock change to meet United Nations Framework Convention on Climate Change (UNFCCC reporting requirements for years. Although these currently are provided as national estimates by pool and year to meet greenhouse gas monitoring requirements, there is growing need to disaggregate these estimates to finer scales to enable strategic forest management and monitoring activities focused on various ecosystem services such as C storage enhancement. Through application of a nearest-neighbor imputation approach, spatially extant estimates of forest C density were developed for the conterminous U.S. using the U.S.’s annual forest inventory. Results suggest that an existing forest inventory plot imputation approach can be readily modified to provide raster maps of C density across a range of pools (e.g., live tree to soil organic carbon and spatial scales (e.g., sub-county to biome. Comparisons among imputed maps indicate strong regional differences across C pools. The C density of pools closely related to detrital input (e.g., dead wood is often highest in forests suffering from recent mortality events such as those in the northern Rocky Mountains (e.g., beetle infestations. In contrast, live tree carbon density is often highest on the highest quality forest sites such as those found in the Pacific Northwest. Validation results suggest strong agreement between the estimates produced from the forest inventory plots and those from the imputed maps, particularly when the C pool is closely associated with the imputation model (e.g., aboveground live biomass and live tree basal area, with weaker agreement for detrital pools (e.g., standing dead trees. Forest inventory imputed plot maps provide an efficient and flexible approach to monitoring diverse C pools at national (e.g., UNFCCC and regional scales (e.g., Reducing Emissions from Deforestation and Forest
New insights into the pharmacogenomics of antidepressant response from the GENDEP and STAR*D studies: rare variant analysis and high-density imputation.

Science.gov (United States)

Fabbri, C; Tansey, K E; Perlis, R H; Hauser, J; Henigsberg, N; Maier, W; Mors, O; Placentino, A; Rietschel, M; Souery, D; Breen, G; Curtis, C; Sang-Hyuk, L; Newhouse, S; Patel, H; Guipponi, M; Perroud, N; Bondolfi, G; O'Donovan, M; Lewis, G; Biernacka, J M; Weinshilboum, R M; Farmer, A; Aitchison, K J; Craig, I; McGuffin, P; Uher, R; Lewis, C M

2017-11-21

Genome-wide association studies have generally failed to identify polymorphisms associated with antidepressant response. Possible reasons include limited coverage of genetic variants that this study tried to address by exome genotyping and dense imputation. A meta-analysis of Genome-Based Therapeutic Drugs for Depression (GENDEP) and Sequenced Treatment Alternatives to Relieve Depression (STAR*D) studies was performed at the single-nucleotide polymorphism (SNP), gene and pathway levels. Coverage of genetic variants was increased compared with previous studies by adding exome genotypes to previously available genome-wide data and using the Haplotype Reference Consortium panel for imputation. Standard quality control was applied. Phenotypes were symptom improvement and remission after 12 weeks of antidepressant treatment. Significant findings were investigated in NEWMEDS consortium samples and Pharmacogenomic Research Network Antidepressant Medication Pharmacogenomic Study (PGRN-AMPS) for replication. A total of 7062 950 SNPs were analyzed in GENDEP (n=738) and STAR*D (n=1409). rs116692768 (P=1.80e-08, ITGA9 (integrin α9)) and rs76191705 (P=2.59e-08, NRXN3 (neurexin 3)) were significantly associated with symptom improvement during citalopram/escitalopram treatment. At the gene level, no consistent effect was found. At the pathway level, the Gene Ontology (GO) terms GO: 0005694 (chromosome) and GO: 0044427 (chromosomal part) were associated with improvement (corrected P=0.007 and 0.045, respectively). The association between rs116692768 and symptom improvement was replicated in PGRN-AMPS (P=0.047), whereas rs76191705 was not. The two SNPs did not replicate in NEWMEDS. ITGA9 codes for a membrane receptor for neurotrophins and NRXN3 is a transmembrane neuronal adhesion receptor involved in synaptic differentiation. Despite their meaningful biological rationale for being involved in antidepressant effect, replication was partial. Further studies may help in clarifying
TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION.

Science.gov (United States)

Allen, Genevera I; Tibshirani, Robert

2010-06-01

Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable , meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal , in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.
Comparison of results from different imputation techniques for missing data from an anti-obesity drug trial

DEFF Research Database (Denmark)

Jørgensen, Anders W.; Lundstrøm, Lars H; Wetterslev, Jørn

2014-01-01

BACKGROUND: In randomised trials of medical interventions, the most reliable analysis follows the intention-to-treat (ITT) principle. However, the ITT analysis requires that missing outcome data have to be imputed. Different imputation techniques may give different results and some may lead to bias...... of handling missing data in a 60-week placebo controlled anti-obesity drug trial on topiramate. METHODS: We compared an analysis of complete cases with datasets where missing body weight measurements had been replaced using three different imputation methods: LOCF, baseline carried forward (BOCF) and MI...
A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation.

Science.gov (United States)

Välikangas, Tommi; Suomi, Tomi; Elo, Laura L

2017-05-31

Label-free mass spectrometry (MS) has developed into an important tool applied in various fields of biological and life sciences. Several software exist to process the raw MS data into quantified protein abundances, including open source and commercial solutions. Each software includes a set of unique algorithms for different tasks of the MS data processing workflow. While many of these algorithms have been compared separately, a thorough and systematic evaluation of their overall performance is missing. Moreover, systematic information is lacking about the amount of missing values produced by the different proteomics software and the capabilities of different data imputation methods to account for them.In this study, we evaluated the performance of five popular quantitative label-free proteomics software workflows using four different spike-in data sets. Our extensive testing included the number of proteins quantified and the number of missing values produced by each workflow, the accuracy of detecting differential expression and logarithmic fold change and the effect of different imputation and filtering methods on the differential expression results. We found that the Progenesis software performed consistently well in the differential expression analysis and produced few missing values. The missing values produced by the other software decreased their performance, but this difference could be mitigated using proper data filtering or imputation methods. Among the imputation methods, we found that the local least squares (lls) regression imputation consistently increased the performance of the software in the differential expression analysis, and a combination of both data filtering and local least squares imputation increased performance the most in the tested data sets. © The Author 2017. Published by Oxford University Press.
Accuracy of hemoglobin A1c imputation using fasting plasma glucose in diabetes research using electronic health records data

Directory of Open Access Journals (Sweden)

Stanley Xu

2014-05-01

Full Text Available In studies that use electronic health record data, imputation of important data elements such as Glycated hemoglobin (A1c has become common. However, few studies have systematically examined the validity of various imputation strategies for missing A1c values. We derived a complete dataset using an incident diabetes population that has no missing values in A1c, fasting and random plasma glucose (FPG and RPG, age, and gender. We then created missing A1c values under two assumptions: missing completely at random (MCAR and missing at random (MAR. We then imputed A1c values, compared the imputed values to the true A1c values, and used these data to assess the impact of A1c on initiation of antihyperglycemic therapy. Under MCAR, imputation of A1c based on FPG 1 estimated a continuous A1c within ± 1.88% of the true A1c 68.3% of the time; 2 estimated a categorical A1c within ± one category from the true A1c about 50% of the time. Including RPG in imputation slightly improved the precision but did not improve the accuracy. Under MAR, including gender and age in addition to FPG improved the accuracy of imputed continuous A1c but not categorical A1c. Moreover, imputation of up to 33% of missing A1c values did not change the accuracy and precision and did not alter the impact of A1c on initiation of antihyperglycemic therapy. When using A1c values as a predictor variable, a simple imputation algorithm based only on age, sex, and fasting plasma glucose gave acceptable results.
Refining QTL with high-density SNP genotyping and whole genome sequence in three cattle breeds

DEFF Research Database (Denmark)

Sahana, Goutam; Guldbrandtsen, Bernt; Lund, Mogens Sandø

2012-01-01

Genome-wide association study was carried out in Nordic Holsteins, Nordic Red and Jersey breeds for functional traits using BovineHD Genotyping BreadChip (Illumina, San Diego, CA). The association analyses were carried out using both linear mixed model approach and a Bayesian variable selection...... method. Principal components were used to account for population structure. The QTL segregating in all three breeds were selected and a few of the most significant ones were followed in further analyses. The polymorphisms in the identified QTL regions were imputed using 90 whole genome sequences...
Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes

Directory of Open Access Journals (Sweden)

Lotz Meredith J

2008-01-01

Full Text Available Abstract Background Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method has its own advantages, but the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of eight current imputation methods on multiple types of microarray experiments, including time series, multiple exposures, and multiple exposures × time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data set. Results We found that the optimal imputation algorithms (LSA, LLS, and BPCA are all highly competitive with each other, and that no method is uniformly superior in all the data sets we examined. The success of each method can also depend on the underlying "complexity" of the expression data, where we take complexity to indicate the difficulty in mapping the gene expression matrix to a lower-dimensional subspace. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS scheme is useful for selecting an appropriate imputation algorithm. We further propose a simulation-based self-training selection (STS scheme. This technique has been used previously for microarray data imputation, but for different purposes. The scheme selects the optimal or near-optimal method with high accuracy but at an increased computational cost. Conclusion Our findings provide insight into the problem of which imputation method is optimal for a given data set. Three top-performing methods (LSA, LLS and BPCA are competitive with each other. Global-based imputation methods (PLS, SVD, BPCA
Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes.

Science.gov (United States)

Brock, Guy N; Shaffer, John R; Blakesley, Richard E; Lotz, Meredith J; Tseng, George C

2008-01-10

Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method has its own advantages, but the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of eight current imputation methods on multiple types of microarray experiments, including time series, multiple exposures, and multiple exposures x time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data set. We found that the optimal imputation algorithms (LSA, LLS, and BPCA) are all highly competitive with each other, and that no method is uniformly superior in all the data sets we examined. The success of each method can also depend on the underlying "complexity" of the expression data, where we take complexity to indicate the difficulty in mapping the gene expression matrix to a lower-dimensional subspace. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS) scheme is useful for selecting an appropriate imputation algorithm. We further propose a simulation-based self-training selection (STS) scheme. This technique has been used previously for microarray data imputation, but for different purposes. The scheme selects the optimal or near-optimal method with high accuracy but at an increased computational cost. Our findings provide insight into the problem of which imputation method is optimal for a given data set. Three top-performing methods (LSA, LLS and BPCA) are competitive with each other. Global-based imputation methods (PLS, SVD, BPCA) performed better on mcroarray data with lower complexity
Missing in space: an evaluation of imputation methods for missing data in spatial analysis of risk factors for type II diabetes.

Science.gov (United States)

Baker, Jannah; White, Nicole; Mengersen, Kerrie

2014-11-20

Spatial analysis is increasingly important for identifying modifiable geographic risk factors for disease. However, spatial health data from surveys are often incomplete, ranging from missing data for only a few variables, to missing data for many variables. For spatial analyses of health outcomes, selection of an appropriate imputation method is critical in order to produce the most accurate inferences. We present a cross-validation approach to select between three imputation methods for health survey data with correlated lifestyle covariates, using as a case study, type II diabetes mellitus (DM II) risk across 71 Queensland Local Government Areas (LGAs). We compare the accuracy of mean imputation to imputation using multivariate normal and conditional autoregressive prior distributions. Choice of imputation method depends upon the application and is not necessarily the most complex method. Mean imputation was selected as the most accurate method in this application. Selecting an appropriate imputation method for health survey data, after accounting for spatial correlation and correlation between covariates, allows more complete analysis of geographic risk factors for disease with more confidence in the results to inform public policy decision-making.
VIGAN: Missing View Imputation with Generative Adversarial Networks.

Science.gov (United States)

Shang, Chao; Palmer, Aaron; Sun, Jiangwen; Chen, Ko-Shin; Lu, Jin; Bi, Jinbo

2017-01-01

In an era when big data are becoming the norm, there is less concern with the quantity but more with the quality and completeness of the data. In many disciplines, data are collected from heterogeneous sources, resulting in multi-view or multi-modal datasets. The missing data problem has been challenging to address in multi-view data analysis. Especially, when certain samples miss an entire view of data, it creates the missing view problem. Classic multiple imputations or matrix completion methods are hardly effective here when no information can be based on in the specific view to impute data for such samples. The commonly-used simple method of removing samples with a missing view can dramatically reduce sample size, thus diminishing the statistical power of a subsequent analysis. In this paper, we propose a novel approach for view imputation via generative adversarial networks (GANs), which we name by VIGAN. This approach first treats each view as a separate domain and identifies domain-to-domain mappings via a GAN using randomly-sampled data from each view, and then employs a multi-modal denoising autoencoder (DAE) to reconstruct the missing view from the GAN outputs based on paired data across the views. Then, by optimizing the GAN and DAE jointly, our model enables the knowledge integration for domain mappings and view correspondences to effectively recover the missing view. Empirical results on benchmark datasets validate the VIGAN approach by comparing against the state of the art. The evaluation of VIGAN in a genetic study of substance use disorders further proves the effectiveness and usability of this approach in life science.
The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits.

Directory of Open Access Journals (Sweden)

Benjamin F Voight

Full Text Available Genome-wide association studies have identified hundreds of loci for type 2 diabetes, coronary artery disease and myocardial infarction, as well as for related traits such as body mass index, glucose and insulin levels, lipid levels, and blood pressure. These studies also have pointed to thousands of loci with promising but not yet compelling association evidence. To establish association at additional loci and to characterize the genome-wide significant loci by fine-mapping, we designed the "Metabochip," a custom genotyping array that assays nearly 200,000 SNP markers. Here, we describe the Metabochip and its component SNP sets, evaluate its performance in capturing variation across the allele-frequency spectrum, describe solutions to methodological challenges commonly encountered in its analysis, and evaluate its performance as a platform for genotype imputation. The metabochip achieves dramatic cost efficiencies compared to designing single-trait follow-up reagents, and provides the opportunity to compare results across a range of related traits. The metabochip and similar custom genotyping arrays offer a powerful and cost-effective approach to follow-up large-scale genotyping and sequencing studies and advance our understanding of the genetic basis of complex human diseases and traits.
iVAR: a program for imputing missing data in multivariate time series using vector autoregressive models.

Science.gov (United States)

Liu, Siwei; Molenaar, Peter C M

2014-12-01

This article introduces iVAR, an R program for imputing missing data in multivariate time series on the basis of vector autoregressive (VAR) models. We conducted a simulation study to compare iVAR with three methods for handling missing data: listwise deletion, imputation with sample means and variances, and multiple imputation ignoring time dependency. The results showed that iVAR produces better estimates for the cross-lagged coefficients than do the other three methods. We demonstrate the use of iVAR with an empirical example of time series electrodermal activity data and discuss the advantages and limitations of the program.

Use of Multiple Imputation Method to Improve Estimation of Missing Baseline Serum Creatinine in Acute Kidney Injury Research

Science.gov (United States)

Peterson, Josh F.; Eden, Svetlana K.; Moons, Karel G.; Ikizler, T. Alp; Matheny, Michael E.

2013-01-01

Summary Background and objectives Baseline creatinine (BCr) is frequently missing in AKI studies. Common surrogate estimates can misclassify AKI and adversely affect the study of related outcomes. This study examined whether multiple imputation improved accuracy of estimating missing BCr beyond current recommendations to apply assumed estimated GFR (eGFR) of 75 ml/min per 1.73 m2 (eGFR 75). Design, setting, participants, & measurements From 41,114 unique adult admissions (13,003 with and 28,111 without BCr data) at Vanderbilt University Hospital between 2006 and 2008, a propensity score model was developed to predict likelihood of missing BCr. Propensity scoring identified 6502 patients with highest likelihood of missing BCr among 13,003 patients with known BCr to simulate a “missing” data scenario while preserving actual reference BCr. Within this cohort (n=6502), the ability of various multiple-imputation approaches to estimate BCr and classify AKI were compared with that of eGFR 75. Results All multiple-imputation methods except the basic one more closely approximated actual BCr than did eGFR 75. Total AKI misclassification was lower with multiple imputation (full multiple imputation + serum creatinine) (9.0%) than with eGFR 75 (12.3%; Pcreatinine) (15.3%) versus eGFR 75 (40.5%; P<0.001). Multiple imputation improved specificity and positive predictive value for detecting AKI at the expense of modestly decreasing sensitivity relative to eGFR 75. Conclusions Multiple imputation can improve accuracy in estimating missing BCr and reduce misclassification of AKI beyond currently proposed methods. PMID:23037980
Multiple Imputation of Predictor Variables Using Generalized Additive Models

NARCIS (Netherlands)

de Jong, Roel; van Buuren, Stef; Spiess, Martin

2016-01-01

The sensitivity of multiple imputation methods to deviations from their distributional assumptions is investigated using simulations, where the parameters of scientific interest are the coefficients of a linear regression model, and values in predictor variables are missing at random. The
Analyzing the changing gender wage gap based on multiply imputed right censored wages

OpenAIRE

Gartner, Hermann; Rässler, Susanne

2005-01-01

"In order to analyze the gender wage gap with the German IAB-employment register we have to solve the problem of censored wages at the upper limit of the social security system. We treat this problem as a missing data problem. We regard the missingness mechanism as not missing at random (NMAR, according to Little and Rubin, 1987, 2002) as well as missing by design. The censored wages are multiply imputed by draws of a random variable from a truncated distribution. The multiple imputation is b...
Multiple imputation to account for missing data in a survey: estimating the prevalence of osteoporosis.

Science.gov (United States)

Kmetic, Andrew; Joseph, Lawrence; Berger, Claudie; Tenenhouse, Alan

2002-07-01

Nonresponse bias is a concern in any epidemiologic survey in which a subset of selected individuals declines to participate. We reviewed multiple imputation, a widely applicable and easy to implement Bayesian methodology to adjust for nonresponse bias. To illustrate the method, we used data from the Canadian Multicentre Osteoporosis Study, a large cohort study of 9423 randomly selected Canadians, designed in part to estimate the prevalence of osteoporosis. Although subjects were randomly selected, only 42% of individuals who were contacted agreed to participate fully in the study. The study design included a brief questionnaire for those invitees who declined further participation in order to collect information on the major risk factors for osteoporosis. These risk factors (which included age, sex, previous fractures, family history of osteoporosis, and current smoking status) were then used to estimate the missing osteoporosis status for nonparticipants using multiple imputation. Both ignorable and nonignorable imputation models are considered. Our results suggest that selection bias in the study is of concern, but only slightly, in very elderly (age 80+ years), both women and men. Epidemiologists should consider using multiple imputation more often than is current practice.
Missing Data Imputation of Solar Radiation Data under Different Atmospheric Conditions

Science.gov (United States)

Turrado, Concepción Crespo; López, María del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; de Cos Juez, Francisco Javier

2014-01-01

Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW. PMID:25356644
Missing Data Imputation of Solar Radiation Data under Different Atmospheric Conditions

Directory of Open Access Journals (Sweden)

Concepción Crespo Turrado

2014-10-01

Full Text Available Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE. This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW and Multiple Linear Regression (MLR. The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW.
Missing data imputation of solar radiation data under different atmospheric conditions.

Science.gov (United States)

Turrado, Concepción Crespo; López, María Del Carmen Meizoso; Lasheras, Fernando Sánchez; Gómez, Benigno Antonio Rodríguez; Rollé, José Luis Calvo; Juez, Francisco Javier de Cos

2014-10-29

Global solar broadband irradiance on a planar surface is measured at weather stations by pyranometers. In the case of the present research, solar radiation values from nine meteorological stations of the MeteoGalicia real-time observational network, captured and stored every ten minutes, are considered. In this kind of record, the lack of data and/or the presence of wrong values adversely affects any time series study. Consequently, when this occurs, a data imputation process must be performed in order to replace missing data with estimated values. This paper aims to evaluate the multivariate imputation of ten-minute scale data by means of the chained equations method (MICE). This method allows the network itself to impute the missing or wrong data of a solar radiation sensor, by using either all or just a group of the measurements of the remaining sensors. Very good results have been obtained with the MICE method in comparison with other methods employed in this field such as Inverse Distance Weighting (IDW) and Multiple Linear Regression (MLR). The average RMSE value of the predictions for the MICE algorithm was 13.37% while that for the MLR it was 28.19%, and 31.68% for the IDW.
Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data.

Science.gov (United States)

Rahman, Shah Atiqur; Huang, Yuxiao; Claassen, Jan; Heintzman, Nathaniel; Kleinberg, Samantha

2015-12-01

Most clinical and biomedical data contain missing values. A patient's record may be split across multiple institutions, devices may fail, and sensors may not be worn at all times. While these missing values are often ignored, this can lead to bias and error when the data are mined. Further, the data are not simply missing at random. Instead the measurement of a variable such as blood glucose may depend on its prior values as well as that of other variables. These dependencies exist across time as well, but current methods have yet to incorporate these temporal relationships as well as multiple types of missingness. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on three biological datasets (simulated and actual Type 1 diabetes datasets, and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length. Copyright © 2015 Elsevier Inc. All rights reserved.
Simple nuclear norm based algorithms for imputing missing data and forecasting in time series

OpenAIRE

Butcher, Holly Louise; Gillard, Jonathan William

2017-01-01

There has been much recent progress on the use of the nuclear norm for the so-called matrix completion problem (the problem of imputing missing values of a matrix). In this paper we investigate the use of the nuclear norm for modelling time series, with particular attention to imputing missing data and forecasting. We introduce a simple alternating projections type algorithm based on the nuclear norm for these tasks, and consider a number of practical examples.
Randomly and Non-Randomly Missing Renal Function Data in the Strong Heart Study: A Comparison of Imputation Methods.

Directory of Open Access Journals (Sweden)

Nawar Shara

Full Text Available Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS. Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989-1991, 2 (1993-1995, and 3 (1998-1999 was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results.
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.

Science.gov (United States)

Sehgal, Muhammad Shoaib B; Gondal, Iqbal; Dooley, Laurence S

2005-05-15

Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods. The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE
Data Editing and Imputation in Business Surveys Using “R”

Directory of Open Access Journals (Sweden)

Elena Romascanu

2014-06-01

Full Text Available Purpose – Missing data are a recurring problem that can cause bias or lead to inefficient analyses. The objective of this paper is a direct comparison between the two statistical software features R and SPSS, in order to take full advantage of the existing automated methods for data editing process and imputation in business surveys (with a proper design of consistency rules as a partial alternative to the manual editing of data. Approach – The comparison of different methods on editing surveys data, in R with the ‘editrules’ and ‘survey’ packages because inside those, exist commonly used transformations in ofﬁcial statistics, as visualization of missing values pattern using ‘Amelia’ and ‘VIM’ packages, imputation approaches for longitudinal data using ‘VIMGUI’ and a comparison of another statistical software performance on the same features, such as SPSS. Findings – Data on business statistics received by NIS’s (National Institute of Statistics are not ready to be used for direct analysis due to in-record inconsistencies, errors and missing values from the collected data sets. The appropriate automatic methods from R packages, offers the ability to set the erroneous fields in edit-violating records, to verify the results after the imputation of missing values providing for users a flexible, less time consuming approach and easy to perform automation in R than in SPSS Macros syntax situations, when macros are very handy.
Effects of forage types on digestibility, methane emissions, and nitrogen utilization efficiency in two genotypes of hill ewes.

Science.gov (United States)

Zhao, Y G; Annett, R; Yan, T

2017-08-01

Thirty-six nonpregnant hill ewes (18 pure Scottish Blackface and 18 Swaledale × Scottish Blackface) aged 18 mo and weighing 48 ± 4.8 kg were allocated to 3 forage treatments balanced for genotype and BW. Each genotype was offered 3 forages (pelleted ryegrass, fresh lowland grass, and fresh hill grass) ad libitum with 6 ewes for each of the 6 genotype × diet combination treatments. Pelleted ryegrass was sourced from a commercial supplier (Drygrass South Western Ltd, Burrington, UK). Fresh lowland grass was harvested daily in the morning from a third regrowth perennial ryegrass () sward. Fresh hill grass was harvested from a seminatural hill grassland every 2 d and stored in plastic bags at 4 to 5°C until offered. The animals were individually housed in pens and offered experimental diets for 14 d before being transferred to 6 individual respiration chambers for a further 4 d, during which feed intake, fecal and urine outputs, and CH emissions were measured. There was no interaction between genotype and forage types on any variable measured. In a comparison of effects of the 3 forages, pelleted ryegrass had the greatest ( reduce CH emissions per kilogram DMI. These equations add new information in predicting enteric CH emissions and N utilization efficiency and can be used to quantify the environmental footprint of hill sheep production systems.
Trend in BMI z-score among Private Schools’ Students in Delhi using Multiple Imputation for Growth Curve Model

Directory of Open Access Journals (Sweden)

Vinay K Gupta

2016-06-01

Full Text Available Objective: The aim of the study is to assess the trend in mean BMI z-score among private schools’ students from their anthropometric records when there were missing values in the outcome. Methodology: The anthropometric measurements of student from class 1 to 12 were taken from the records of two private schools in Delhi, India from 2005 to 2010. These records comprise of an unbalanced longitudinal data that is not all the students had measurements recorded at each year. The trend in mean BMI z-score was estimated through growth curve model. Prior to that, missing values of BMI z-score were imputed through multiple imputation using the same model. A complete case analysis was also performed after excluding missing values to compare the results with those obtained from analysis of multiply imputed data. Results: The mean BMI z-score among school student significantly decreased over time in imputed data (β= -0.2030, se=0.0889, p=0.0232 after adjusting age, gender, class and school. Complete case analysis also shows a decrease in mean BMI z-score though it was not statistically significant (β= -0.2861, se=0.0987, p=0.065. Conclusions: The estimates obtained from multiple imputation analysis were better than those of complete data after excluding missing values in terms of lower standard errors. We showed that anthropometric measurements from schools records can be used to monitor the weight status of children and adolescents and multiple imputation using growth curve model can be useful while analyzing such data
Family-based Association Analyses of Imputed Genotypes Reveal Genome-Wide Significant Association of Alzheimer’s disease with OSBPL6, PTPRG and PDCL3

Science.gov (United States)

Herold, Christine; Hooli, Basavaraj V.; Mullin, Kristina; Liu, Tian; Roehr, Johannes T; Mattheisen, Manuel; Parrado, Antonio R.; Bertram, Lars; Lange, Christoph; Tanzi, Rudolph E.

2015-01-01

The genetic basis of Alzheimer's disease (AD) is complex and heterogeneous. Over 200 highly penetrant pathogenic variants in the genes APP, PSEN1 and PSEN2 cause a subset of early-onset familial Alzheimer's disease (EOFAD). On the other hand, susceptibility to late-onset forms of AD (LOAD) is indisputably associated to the ε4 allele in the gene APOE, and more recently to variants in more than two-dozen additional genes identified in the large-scale genome-wide association studies (GWAS) and meta-analyses reports. Taken together however, although the heritability in AD is estimated to be as high as 80%, a large proportion of the underlying genetic factors still remain to be elucidated. In this study we performed a systematic family-based genome-wide association and meta-analysis on close to 15 million imputed variants from three large collections of AD families (~3,500 subjects from 1,070 families). Using a multivariate phenotype combining affection status and onset age, meta-analysis of the association results revealed three single nucleotide polymorphisms (SNPs) that achieved genome-wide significance for association with AD risk: rs7609954 in the gene PTPRG (P-value = 3.98·10−08), rs1347297 in the gene OSBPL6 (P-value = 4.53·10−08), and rs1513625 near PDCL3 (P-value = 4.28·10−08). In addition, rs72953347 in OSBPL6 (P-value = 6.36·10−07) and two SNPs in the gene CDKAL1 showed marginally significant association with LOAD (rs10456232, P-value: 4.76·10−07; rs62400067, P-value: 3.54·10−07). In summary, family-based GWAS meta-analysis of imputed SNPs revealed novel genomic variants in (or near) PTPRG, OSBPL6, and PDCL3 that influence risk for AD with genome-wide significance. PMID:26830138
BRITS: Bidirectional Recurrent Imputation for Time Series

OpenAIRE

Cao, Wei; Wang, Dong; Li, Jian; Zhou, Hao; Li, Lei; Li, Yitan

2018-01-01

Time series are widely used as signals in many classification/regression tasks. It is ubiquitous that time series contains many missing values. Given multiple correlated time series data, how to fill in missing values and to predict their class labels? Existing imputation methods often impose strong assumptions of the underlying data generating process, such as linear dynamics in the state space. In this paper, we propose BRITS, a novel method based on recurrent neural networks for missing va...
Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis

NARCIS (Netherlands)

Eekhout, I.; Wiel, M.A. van de; Heymans, M.W.

2017-01-01

Background. Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels
Joint genome-wide prediction in several populations accounting for randomness of genotypes: A hierarchical Bayes approach. I: Multivariate Gaussian priors for marker effects and derivation of the joint probability mass function of genotypes.

Science.gov (United States)

Martínez, Carlos Alberto; Khare, Kshitij; Banerjee, Arunava; Elzo, Mauricio A

2017-03-21

It is important to consider heterogeneity of marker effects and allelic frequencies in across population genome-wide prediction studies. Moreover, all regression models used in genome-wide prediction overlook randomness of genotypes. In this study, a family of hierarchical Bayesian models to perform across population genome-wide prediction modeling genotypes as random variables and allowing population-specific effects for each marker was developed. Models shared a common structure and differed in the priors used and the assumption about residual variances (homogeneous or heterogeneous). Randomness of genotypes was accounted for by deriving the joint probability mass function of marker genotypes conditional on allelic frequencies and pedigree information. As a consequence, these models incorporated kinship and genotypic information that not only permitted to account for heterogeneity of allelic frequencies, but also to include individuals with missing genotypes at some or all loci without the need for previous imputation. This was possible because the non-observed fraction of the design matrix was treated as an unknown model parameter. For each model, a simpler version ignoring population structure, but still accounting for randomness of genotypes was proposed. Implementation of these models and computation of some criteria for model comparison were illustrated using two simulated datasets. Theoretical and computational issues along with possible applications, extensions and refinements were discussed. Some features of the models developed in this study make them promising for genome-wide prediction, the use of information contained in the probability distribution of genotypes is perhaps the most appealing. Further studies to assess the performance of the models proposed here and also to compare them with conventional models used in genome-wide prediction are needed. Copyright © 2017 Elsevier Ltd. All rights reserved.
RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

KAUST Repository

Kim, Ji-Sung; Gao, Xin; Rzhetsky, Andrey

2018-01-01

are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race
Imputation by the mean score should be avoided when validating a Patient Reported Outcomes questionnaire by a Rasch model in presence of informative missing data

LENUS (Irish Health Repository)

Hardouin, Jean-Benoit

2011-07-14

Abstract Background Nowadays, more and more clinical scales consisting in responses given by the patients to some items (Patient Reported Outcomes - PRO), are validated with models based on Item Response Theory, and more specifically, with a Rasch model. In the validation sample, presence of missing data is frequent. The aim of this paper is to compare sixteen methods for handling the missing data (mainly based on simple imputation) in the context of psychometric validation of PRO by a Rasch model. The main indexes used for validation by a Rasch model are compared. Methods A simulation study was performed allowing to consider several cases, notably the possibility for the missing values to be informative or not and the rate of missing data. Results Several imputations methods produce bias on psychometrical indexes (generally, the imputation methods artificially improve the psychometric qualities of the scale). In particular, this is the case with the method based on the Personal Mean Score (PMS) which is the most commonly used imputation method in practice. Conclusions Several imputation methods should be avoided, in particular PMS imputation. From a general point of view, it is important to use an imputation method that considers both the ability of the patient (measured for example by his\\/her score), and the difficulty of the item (measured for example by its rate of favourable responses). Another recommendation is to always consider the addition of a random process in the imputation method, because such a process allows reducing the bias. Last, the analysis realized without imputation of the missing data (available case analyses) is an interesting alternative to the simple imputation in this context.

The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based methods.

Directory of Open Access Journals (Sweden)

Rui Martiniano

2017-07-01

Full Text Available We analyse new genomic data (0.05-2.95x from 14 ancient individuals from Portugal distributed from the Middle Neolithic (4200-3500 BC to the Middle Bronze Age (1740-1430 BC and impute genomewide diploid genotypes in these together with published ancient Eurasians. While discontinuity is evident in the transition to agriculture across the region, sensitive haplotype-based analyses suggest a significant degree of local hunter-gatherer contribution to later Iberian Neolithic populations. A more subtle genetic influx is also apparent in the Bronze Age, detectable from analyses including haplotype sharing with both ancient and modern genomes, D-statistics and Y-chromosome lineages. However, the limited nature of this introgression contrasts with the major Steppe migration turnovers within third Millennium northern Europe and echoes the survival of non-Indo-European language in Iberia. Changes in genomic estimates of individual height across Europe are also associated with these major cultural transitions, and ancestral components continue to correlate with modern differences in stature.
Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.

Science.gov (United States)

Lazar, Cosmin; Gatto, Laurent; Ferro, Myriam; Bruley, Christophe; Burger, Thomas

2016-04-01

Missing values are a genuine issue in label-free quantitative proteomics. Recent works have surveyed the different statistical methods to conduct imputation and have compared them on real or simulated data sets and recommended a list of missing value imputation methods for proteomics application. Although insightful, these comparisons do not account for two important facts: (i) depending on the proteomics data set, the missingness mechanism may be of different natures and (ii) each imputation method is devoted to a specific type of missingness mechanism. As a result, we believe that the question at stake is not to find the most accurate imputation method in general but instead the most appropriate one. We describe a series of comparisons that support our views: For instance, we show that a supposedly "under-performing" method (i.e., giving baseline average results), if applied at the "appropriate" time in the data-processing pipeline (before or after peptide aggregation) on a data set with the "appropriate" nature of missing values, can outperform a blindly applied, supposedly "better-performing" method (i.e., the reference method from the state-of-the-art). This leads us to formulate few practical guidelines regarding the choice and the application of an imputation method in a proteomics context.
DTW-APPROACH FOR UNCORRELATED MULTIVARIATE TIME SERIES IMPUTATION

OpenAIRE

Phan , Thi-Thu-Hong; Poisson Caillault , Emilie; Bigand , André; Lefebvre , Alain

2017-01-01

International audience; Missing data are inevitable in almost domains of applied sciences. Data analysis with missing values can lead to a loss of efficiency and unreliable results, especially for large missing sub-sequence(s). Some well-known methods for multivariate time series imputation require high correlations between series or their features. In this paper , we propose an approach based on the shape-behaviour relation in low/un-correlated multivariate time series under an assumption of...
Using mi impute chained to fit ANCOVA models in randomized trials with censored dependent and independent variables

DEFF Research Database (Denmark)

Andersen, Andreas; Rieckmann, Andreas

2016-01-01

In this article, we illustrate how to use mi impute chained with intreg to fit an analysis of covariance analysis of censored and nondetectable immunological concentrations measured in a randomized pretest–posttest design.......In this article, we illustrate how to use mi impute chained with intreg to fit an analysis of covariance analysis of censored and nondetectable immunological concentrations measured in a randomized pretest–posttest design....
Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes

Directory of Open Access Journals (Sweden)

Puett Robin C

2009-10-01

Full Text Available Abstract Background There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals and at higher group-levels (e.g., spatial distribution. Methods We evaluated the accuracy of eight geo-imputation methods for address allocation from ZIP codes to census tracts at the individual and group level. The spatial apportioning approaches underlying the imputation methods included four fixed (deterministic and four random (stochastic allocation methods using land area, total population, population under age 20, and race/ethnicity as weighting factors. Data included more than 2,000 geocoded cases of diabetes mellitus among youth aged 0-19 in four U.S. regions. The imputed distribution of cases across tracts was compared to the true distribution using a chi-squared statistic. Results At the individual level, population-weighted (total or under age 20 fixed allocation showed the greatest level of accuracy, with correct census tract assignments averaging 30.01% across all regions, followed by the race/ethnicity-weighted random method (23.83%. The true distribution of cases across census tracts was that 58.2% of tracts exhibited no cases, 26.2% had one case, 9.5% had two cases, and less than 3% had three or more. This distribution was best captured by random allocation methods, with no significant differences (p-value > 0.90. However, significant differences in distributions based on fixed allocation methods were found (p-value Conclusion Fixed imputation methods seemed to yield greatest accuracy at the individual level, suggesting use for studies on area-level environmental exposures. Fixed methods result in artificial clusters in single census tracts. For studies
Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals.

Science.gov (United States)

Stram, Daniel O; Leigh Pearce, Celeste; Bretsky, Phillip; Freedman, Matthew; Hirschhorn, Joel N; Altshuler, David; Kolonel, Laurence N; Henderson, Brian E; Thomas, Duncan C

2003-01-01

The US National Cancer Institute has recently sponsored the formation of a Cohort Consortium (http://2002.cancer.gov/scpgenes.htm) to facilitate the pooling of data on very large numbers of people, concerning the effects of genes and environment on cancer incidence. One likely goal of these efforts will be generate a large population-based case-control series for which a number of candidate genes will be investigated using SNP haplotype as well as genotype analysis. The goal of this paper is to outline the issues involved in choosing a method of estimating haplotype-specific risk estimates for such data that is technically appropriate and yet attractive to epidemiologists who are already comfortable with odds ratios and logistic regression. Our interest is to develop and evaluate extensions of methods, based on haplotype imputation, that have been recently described (Schaid et al., Am J Hum Genet, 2002, and Zaykin et al., Hum Hered, 2002) as providing score tests of the null hypothesis of no effect of SNP haplotypes upon risk, which may be used for more complex tasks, such as providing confidence intervals, and tests of equivalence of haplotype-specific risks in two or more separate populations. In order to do so we (1) develop a cohort approach towards odds ratio analysis by expanding the E-M algorithm to provide maximum likelihood estimates of haplotype-specific odds ratios as well as genotype frequencies; (2) show how to correct the cohort approach, to give essentially unbiased estimates for population-based or nested case-control studies by incorporating the probability of selection as a case or control into the likelihood, based on a simplified model of case and control selection, and (3) finally, in an example data set (CYP17 and breast cancer, from the Multiethnic Cohort Study) we compare likelihood-based confidence interval estimates from the two methods with each other, and with the use of the single-imputation approach of Zaykin et al. applied under both
Missing data treatments matter: an analysis of multiple imputation for anterior cervical discectomy and fusion procedures.

Science.gov (United States)

Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Cui, Jonathan J; Basques, Bryce A; Albert, Todd J; Grauer, Jonathan N

2018-04-09

The presence of missing data is a limitation of large datasets, including the National Surgical Quality Improvement Program (NSQIP). In addressing this issue, most studies use complete case analysis, which excludes cases with missing data, thus potentially introducing selection bias. Multiple imputation, a statistically rigorous approach that approximates missing data and preserves sample size, may be an improvement over complete case analysis. The present study aims to evaluate the impact of using multiple imputation in comparison with complete case analysis for assessing the associations between preoperative laboratory values and adverse outcomes following anterior cervical discectomy and fusion (ACDF) procedures. This is a retrospective review of prospectively collected data. Patients undergoing one-level ACDF were identified in NSQIP 2012-2015. Perioperative adverse outcome variables assessed included the occurrence of any adverse event, severe adverse events, and hospital readmission. Missing preoperative albumin and hematocrit values were handled using complete case analysis and multiple imputation. These preoperative laboratory levels were then tested for associations with 30-day postoperative outcomes using logistic regression. A total of 11,999 patients were included. Of this cohort, 63.5% of patients had missing preoperative albumin and 9.9% had missing preoperative hematocrit. When using complete case analysis, only 4,311 patients were studied. The removed patients were significantly younger, healthier, of a common body mass index, and male. Logistic regression analysis failed to identify either preoperative hypoalbuminemia or preoperative anemia as significantly associated with adverse outcomes. When employing multiple imputation, all 11,999 patients were included. Preoperative hypoalbuminemia was significantly associated with the occurrence of any adverse event and severe adverse events. Preoperative anemia was significantly associated with the
Nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks.

Science.gov (United States)

Li, YuanYuan; Parker, Lynne E

2014-01-01

Missing data is common in Wireless Sensor Networks (WSNs), especially with multi-hop communications. There are many reasons for this phenomenon, such as unstable wireless communications, synchronization issues, and unreliable sensors. Unfortunately, missing data creates a number of problems for WSNs. First, since most sensor nodes in the network are battery-powered, it is too expensive to have the nodes retransmit missing data across the network. Data re-transmission may also cause time delays when detecting abnormal changes in an environment. Furthermore, localized reasoning techniques on sensor nodes (such as machine learning algorithms to classify states of the environment) are generally not robust enough to handle missing data. Since sensor data collected by a WSN is generally correlated in time and space, we illustrate how replacing missing sensor values with spatially and temporally correlated sensor values can significantly improve the network's performance. However, our studies show that it is important to determine which nodes are spatially and temporally correlated with each other. Simple techniques based on Euclidean distance are not sufficient for complex environmental deployments. Thus, we have developed a novel Nearest Neighbor (NN) imputation method that estimates missing data in WSNs by learning spatial and temporal correlations between sensor nodes. To improve the search time, we utilize a k d-tree data structure, which is a non-parametric, data-driven binary search tree. Instead of using traditional mean and variance of each dimension for k d-tree construction, and Euclidean distance for k d-tree search, we use weighted variances and weighted Euclidean distances based on measured percentages of missing data. We have evaluated this approach through experiments on sensor data from a volcano dataset collected by a network of Crossbow motes, as well as experiments using sensor data from a highway traffic monitoring application. Our experimental
FCMPSO: An Imputation for Missing Data Features in Heart Disease Classification

Science.gov (United States)

Salleh, Mohd Najib Mohd; Ashikin Samat, Nurul

2017-08-01

The application of data mining and machine learning in directing clinical research into possible hidden knowledge is becoming greatly influential in medical areas. Heart Disease is a killer disease around the world, and early prevention through efficient methods can help to reduce the mortality number. Medical data may contain many uncertainties, as they are fuzzy and vague in nature. Nonetheless, imprecise features data such as no values and missing values can affect quality of classification results. Nevertheless, the other complete features are still capable to give information in certain features. Therefore, an imputation approach based on Fuzzy C-Means and Particle Swarm Optimization (FCMPSO) is developed in preprocessing stage to help fill in the missing values. Then, the complete dataset is trained in classification algorithm, Decision Tree. The experiment is trained with Heart Disease dataset and the performance is analysed using accuracy, precision, and ROC values. Results show that the performance of Decision Tree is increased after the application of FCMSPO for imputation.
Genome-wide association study with 1000 genomes imputation identifies signals for nine sex hormone-related phenotypes.

Science.gov (United States)

Ruth, Katherine S; Campbell, Purdey J; Chew, Shelby; Lim, Ee Mun; Hadlow, Narelle; Stuckey, Bronwyn G A; Brown, Suzanne J; Feenstra, Bjarke; Joseph, John; Surdulescu, Gabriela L; Zheng, Hou Feng; Richards, J Brent; Murray, Anna; Spector, Tim D; Wilson, Scott G; Perry, John R B

2016-02-01

Genetic factors contribute strongly to sex hormone levels, yet knowledge of the regulatory mechanisms remains incomplete. Genome-wide association studies (GWAS) have identified only a small number of loci associated with sex hormone levels, with several reproductive hormones yet to be assessed. The aim of the study was to identify novel genetic variants contributing to the regulation of sex hormones. We performed GWAS using genotypes imputed from the 1000 Genomes reference panel. The study used genotype and phenotype data from a UK twin register. We included 2913 individuals (up to 294 males) from the Twins UK study, excluding individuals receiving hormone treatment. Phenotypes were standardised for age, sex, BMI, stage of menstrual cycle and menopausal status. We tested 7,879,351 autosomal SNPs for association with levels of dehydroepiandrosterone sulphate (DHEAS), oestradiol, free androgen index (FAI), follicle-stimulating hormone (FSH), luteinizing hormone (LH), prolactin, progesterone, sex hormone-binding globulin and testosterone. Eight independent genetic variants reached genome-wide significance (P<5 × 10(-8)), with minor allele frequencies of 1.3-23.9%. Novel signals included variants for progesterone (P=7.68 × 10(-12)), oestradiol (P=1.63 × 10(-8)) and FAI (P=1.50 × 10(-8)). A genetic variant near the FSHB gene was identified which influenced both FSH (P=1.74 × 10(-8)) and LH (P=3.94 × 10(-9)) levels. A separate locus on chromosome 7 was associated with both DHEAS (P=1.82 × 10(-14)) and progesterone (P=6.09 × 10(-14)). This study highlights loci that are relevant to reproductive function and suggests overlap in the genetic basis of hormone regulation.
Estimation of Tree Lists from Airborne Laser Scanning Using Tree Model Clustering and k-MSN Imputation

Directory of Open Access Journals (Sweden)

Jörgen Wallerman

2013-04-01

Full Text Available Individual tree crowns may be delineated from airborne laser scanning (ALS data by segmentation of surface models or by 3D analysis. Segmentation of surface models benefits from using a priori knowledge about the proportions of tree crowns, which has not yet been utilized for 3D analysis to any great extent. In this study, an existing surface segmentation method was used as a basis for a new tree model 3D clustering method applied to ALS returns in 104 circular field plots with 12 m radius in pine-dominated boreal forest (64°14'N, 19°50'E. For each cluster below the tallest canopy layer, a parabolic surface was fitted to model a tree crown. The tree model clustering identified more trees than segmentation of the surface model, especially smaller trees below the tallest canopy layer. Stem attributes were estimated with k-Most Similar Neighbours (k-MSN imputation of the clusters based on field-measured trees. The accuracy at plot level from the k-MSN imputation (stem density root mean square error or RMSE 32.7%; stem volume RMSE 28.3% was similar to the corresponding results from the surface model (stem density RMSE 33.6%; stem volume RMSE 26.1% with leave-one-out cross-validation for one field plot at a time. Three-dimensional analysis of ALS data should also be evaluated in multi-layered forests since it identified a larger number of small trees below the tallest canopy layer.
Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications.

Directory of Open Access Journals (Sweden)

Xiao-Lin Wu

Full Text Available Low-density (LD single nucleotide polymorphism (SNP arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD or high-density (HD SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE or haplotype-averaged Shannon entropy (HASE and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus
Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research

Directory of Open Access Journals (Sweden)

Hardt Jochen

2012-12-01

Full Text Available Abstract Background Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. Methods A simulation study of a linear regression with a response Y and two predictors X1 and X2 was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10 vs. moderate correlations (r=.50 with X’s and Y. Results The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful. Conclusion More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.
Single-nucleotide polymorphism discovery by high-throughput sequencing in sorghum

Directory of Open Access Journals (Sweden)

White Frank F

2011-07-01

Full Text Available Abstract Background Eight diverse sorghum (Sorghum bicolor L. Moench accessions were subjected to short-read genome sequencing to characterize the distribution of single-nucleotide polymorphisms (SNPs. Two strategies were used for DNA library preparation. Missing SNP genotype data were imputed by local haplotype comparison. The effect of library type and genomic diversity on SNP discovery and imputation are evaluated. Results Alignment of eight genome equivalents (6 Gb to the public reference genome revealed 283,000 SNPs at ≥82% confirmation probability. Sequencing from libraries constructed to limit sequencing to start at defined restriction sites led to genotyping 10-fold more SNPs in all 8 accessions, and correctly imputing 11% more missing data, than from semirandom libraries. The SNP yield advantage of the reduced-representation method was less than expected, since up to one fifth of reads started at noncanonical restriction sites and up to one third of restriction sites predicted in silico to yield unique alignments were not sampled at near-saturation. For imputation accuracy, the availability of a genomically similar accession in the germplasm panel was more important than panel size or sequencing coverage. Conclusions A sequence quantity of 3 million 50-base reads per accession using a BsrFI library would conservatively provide satisfactory genotyping of 96,000 sorghum SNPs. For most reliable SNP-genotype imputation in shallowly sequenced genomes, germplasm panels should consist of pairs or groups of genomically similar entries. These results may help in designing strategies for economical genotyping-by-sequencing of large numbers of plant accessions.
Improved Correction of Misclassification Bias With Bootstrap Imputation.

Science.gov (United States)

van Walraven, Carl

2018-07-01

Diagnostic codes used in administrative database research can create bias due to misclassification. Quantitative bias analysis (QBA) can correct for this bias, requires only code sensitivity and specificity, but may return invalid results. Bootstrap imputation (BI) can also address misclassification bias but traditionally requires multivariate models to accurately estimate disease probability. This study compared misclassification bias correction using QBA and BI. Serum creatinine measures were used to determine severe renal failure status in 100,000 hospitalized patients. Prevalence of severe renal failure in 86 patient strata and its association with 43 covariates was determined and compared with results in which renal failure status was determined using diagnostic codes (sensitivity 71.3%, specificity 96.2%). Differences in results (misclassification bias) were then corrected with QBA or BI (using progressively more complex methods to estimate disease probability). In total, 7.4% of patients had severe renal failure. Imputing disease status with diagnostic codes exaggerated prevalence estimates [median relative change (range), 16.6% (0.8%-74.5%)] and its association with covariates [median (range) exponentiated absolute parameter estimate difference, 1.16 (1.01-2.04)]. QBA produced invalid results 9.3% of the time and increased bias in estimates of both disease prevalence and covariate associations. BI decreased misclassification bias with increasingly accurate disease probability estimates. QBA can produce invalid results and increase misclassification bias. BI avoids invalid results and can importantly decrease misclassification bias when accurate disease probability estimates are used.
Partitioning Heritability of Regulatory and Cell-Type-Specific Variants across 11 Common Diseases

DEFF Research Database (Denmark)

Gusev, Alexander; Lee, S Hong; Trynka, Gosia

2014-01-01

Regulatory and coding variants are known to be enriched with associations identified by genome-wide association studies (GWASs) of complex disease, but their contributions to trait heritability are currently unknown. We applied variance-component methods to imputed genotype data for 11 common...... diseases to partition the heritability explained by genotyped SNPs (hg(2)) across functional categories (while accounting for shared variance due to linkage disequilibrium). Extensive simulations showed that in contrast to current estimates from GWAS summary statistics, the variance-component approach...... partitions heritability accurately under a wide range of complex-disease architectures. Across the 11 diseases DNaseI hypersensitivity sites (DHSs) from 217 cell types spanned 16% of imputed SNPs (and 24% of genotyped SNPs) but explained an average of 79% (SE = 8%) of hg(2) from imputed SNPs (5.1× enrichment...
Partial F-tests with multiply imputed data in the linear regression framework via coefficient of determination.

Science.gov (United States)

Chaurasia, Ashok; Harel, Ofer

2015-02-10

Tests for regression coefficients such as global, local, and partial F-tests are common in applied research. In the framework of multiple imputation, there are several papers addressing tests for regression coefficients. However, for simultaneous hypothesis testing, the existing methods are computationally intensive because they involve calculation with vectors and (inversion of) matrices. In this paper, we propose a simple method based on the scalar entity, coefficient of determination, to perform (global, local, and partial) F-tests with multiply imputed data. The proposed method is evaluated using simulated data and applied to suicide prevention data. Copyright © 2014 John Wiley & Sons, Ltd.
Defining, evaluating, and removing bias induced by linear imputation in longitudinal clinical trials with MNAR missing data.

Science.gov (United States)

Helms, Ronald W; Reece, Laura Helms; Helms, Russell W; Helms, Mary W

2011-03-01

Missing not at random (MNAR) post-dropout missing data from a longitudinal clinical trial result in the collection of "biased data," which leads to biased estimators and tests of corrupted hypotheses. In a full rank linear model analysis the model equation, E[Y] = Xβ, leads to the definition of the primary parameter β = (X'X)(-1)X'E[Y], and the definition of linear secondary parameters of the form θ = Lβ = L(X'X)(-1)X'E[Y], including, for example, a parameter representing a "treatment effect." These parameters depend explicitly on E[Y], which raises the questions: What is E[Y] when some elements of the incomplete random vector Y are not observed and MNAR, or when such a Y is "completed" via imputation? We develop a rigorous, readily interpretable definition of E[Y] in this context that leads directly to definitions of β, Bias(β) = E[β] - β, Bias(θ) = E[θ] - Lβ, and the extent of hypothesis corruption. These definitions provide a basis for evaluating, comparing, and removing biases induced by various linear imputation methods for MNAR incomplete data from longitudinal clinical trials. Linear imputation methods use earlier data from a subject to impute values for post-dropout missing values and include "Last Observation Carried Forward" (LOCF) and "Baseline Observation Carried Forward" (BOCF), among others. We illustrate the methods of evaluating, comparing, and removing biases and the effects of testing corresponding corrupted hypotheses via a hypothetical but very realistic longitudinal analgesic clinical trial.
Local exome sequences facilitate imputation of less common variants and increase power of genome wide association studies.

Directory of Open Access Journals (Sweden)

Peter K Joshi

Full Text Available The analysis of less common variants in genome-wide association studies promises to elucidate complex trait genetics but is hampered by low power to reliably detect association. We show that addition of population-specific exome sequence data to global reference data allows more accurate imputation, particularly of less common SNPs (minor allele frequency 1-10% in two very different European populations. The imputation improvement corresponds to an increase in effective sample size of 28-38%, for SNPs with a minor allele frequency in the range 1-3%.
fcGENE: a versatile tool for processing and transforming SNP datasets.

Directory of Open Access Journals (Sweden)

Nab Raj Roshyara

Full Text Available Modern analysis of high-dimensional SNP data requires a number of biometrical and statistical methods such as pre-processing, analysis of population structure, association analysis and genotype imputation. Software used for these purposes often rely on specific and incompatible input and output data formats. Therefore extensive data management including multiple format conversions is necessary during analyses.In order to support fast and efficient management and bio-statistical quality control of high-dimensional SNP data, we developed the publically available software fcGENE using C++ object-oriented programming language. This software simplifies and automates the use of different existing analysis packages, especially during the workflow of genotype imputations and corresponding analyses.fcGENE transforms SNP data and imputation results into different formats required for a large variety of analysis packages such as PLINK, SNPTEST, HAPLOVIEW, EIGENSOFT, GenABEL and tools used for genotype imputation such as MaCH, IMPUTE, BEAGLE and others. Data Management tasks like merging, splitting, extracting SNP and pedigree information can be performed. fcGENE also supports a number of bio-statistical quality control processes and quality based filtering processes at SNP- and sample-wise level. The tool also generates templates of commands required to run specific software packages, especially those required for genotype imputation. We demonstrate the functionality of fcGENE by example workflows of SNP data analyses and provide a comprehensive manual of commands, options and applications.We have developed a user-friendly open-source software fcGENE, which comprehensively supports SNP data management, quality control and analysis workflows. Download statistics and corresponding feedbacks indicate that software is highly recognised and extensively applied by the scientific community.

Missing data in clinical trials: control-based mean imputation and sensitivity analysis.

Science.gov (United States)

Mehrotra, Devan V; Liu, Fang; Permutt, Thomas

2017-09-01

In some randomized (drug versus placebo) clinical trials, the estimand of interest is the between-treatment difference in population means of a clinical endpoint that is free from the confounding effects of "rescue" medication (e.g., HbA1c change from baseline at 24 weeks that would be observed without rescue medication regardless of whether or when the assigned treatment was discontinued). In such settings, a missing data problem arises if some patients prematurely discontinue from the trial or initiate rescue medication while in the trial, the latter necessitating the discarding of post-rescue data. We caution that the commonly used mixed-effects model repeated measures analysis with the embedded missing at random assumption can deliver an exaggerated estimate of the aforementioned estimand of interest. This happens, in part, due to implicit imputation of an overly optimistic mean for "dropouts" (i.e., patients with missing endpoint data of interest) in the drug arm. We propose an alternative approach in which the missing mean for the drug arm dropouts is explicitly replaced with either the estimated mean of the entire endpoint distribution under placebo (primary analysis) or a sequence of increasingly more conservative means within a tipping point framework (sensitivity analysis); patient-level imputation is not required. A supplemental "dropout = failure" analysis is considered in which a common poor outcome is imputed for all dropouts followed by a between-treatment comparison using quantile regression. All analyses address the same estimand and can adjust for baseline covariates. Three examples and simulation results are used to support our recommendations. Copyright © 2017 John Wiley & Sons, Ltd.
Random Forest as an Imputation Method for Education and Psychology Research: Its Impact on Item Fit and Difficulty of the Rasch Model

Science.gov (United States)

Golino, Hudson F.; Gomes, Cristiano M. A.

2016-01-01

This paper presents a non-parametric imputation technique, named random forest, from the machine learning field. The random forest procedure has two main tuning parameters: the number of trees grown in the prediction and the number of predictors used. Fifty experimental conditions were created in the imputation procedure, with different…
Age at menopause: imputing age at menopause for women with a hysterectomy with application to risk of postmenopausal breast cancer

Science.gov (United States)

Rosner, Bernard; Colditz, Graham A.

2011-01-01

Purpose Age at menopause, a major marker in the reproductive life, may bias results for evaluation of breast cancer risk after menopause. Methods We follow 38,948 premenopausal women in 1980 and identify 2,586 who reported hysterectomy without bilateral oophorectomy, and 31,626 who reported natural menopause during 22 years of follow-up. We evaluate risk factors for natural menopause, impute age at natural menopause for women reporting hysterectomy without bilateral oophorectomy and estimate the hazard of reaching natural menopause in the next 2 years. We apply this imputed age at menopause to both increase sample size and to evaluate the relation between postmenopausal exposures and risk of breast cancer. Results Age, cigarette smoking, age at menarche, pregnancy history, body mass index, history of benign breast disease, and history of breast cancer were each significantly related to age at natural menopause; duration of oral contraceptive use and family history of breast cancer were not. The imputation increased sample size substantially and although some risk factors after menopause were weaker in the expanded model (height, and alcohol use), use of hormone therapy is less biased. Conclusions Imputing age at menopause increases sample size, broadens generalizability making it applicable to women with hysterectomy, and reduces bias. PMID:21441037
Association of HLA Genotype and Fulminant Type 1 Diabetes in Koreans

Directory of Open Access Journals (Sweden)

Soo Heon Kwak

2015-12-01

Full Text Available Fulminant type 1 diabetes (T1DM is a distinct subtype of T1DM that is characterized by rapid onset hyperglycemia, ketoacidosis, absolute insulin deficiency, and near normal levels of glycated hemoglobin at initial presentation. Although it has been reported that class II human leukocyte antigen (HLA genotype is associated with fulminant T1DM, the genetic predisposition is not fully understood. In this study we investigated the HLA genotype and haplotype in 11 Korean cases of fulminant T1DM using imputation of whole exome sequencing data and compared its frequencies with 413 participants of the Korean Reference Panel. The HLA-DRB1*04:05–HLA-DQB1*04:01 haplotype was significantly associated with increased risk of fulminant T1DM in Fisher's exact test (odds ratio [OR], 4.11; 95% confidence interval [CI], 1.56 to 10.86; p = 0.009. A histidine residue at HLA-DRβ1 position 13 was marginally associated with increased risk of fulminant T1DM (OR, 2.45; 95% CI ,1.01 to 5.94; p = 0.054. Although we had limited statistical power, we provide evidence that HLA haplotype and amino acid change can be a genetic risk factor of fulminant T1DM in Koreans. Further large-scale research is required to confirm these findings.
Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the Arctic.

Science.gov (United States)

Hopke, P K; Liu, C; Rubin, D B

2001-03-01

Many chemical and environmental data sets are complicated by the existence of fully missing values or censored values known to lie below detection thresholds. For example, week-long samples of airborne particulate matter were obtained at Alert, NWT, Canada, between 1980 and 1991, where some of the concentrations of 24 particulate constituents were coarsened in the sense of being either fully missing or below detection limits. To facilitate scientific analysis, it is appealing to create complete data by filling in missing values so that standard complete-data methods can be applied. We briefly review commonly used strategies for handling missing values and focus on the multiple-imputation approach, which generally leads to valid inferences when faced with missing data. Three statistical models are developed for multiply imputing the missing values of airborne particulate matter. We expect that these models are useful for creating multiple imputations in a variety of incomplete multivariate time series data sets.
Handling missing data for the identification of charged particles in a multilayer detector: A comparison between different imputation methods

Energy Technology Data Exchange (ETDEWEB)

Riggi, S., E-mail: sriggi@oact.inaf.it [INAF - Osservatorio Astrofisico di Catania (Italy); Riggi, D. [Keras Strategy - Milano (Italy); Riggi, F. [Dipartimento di Fisica e Astronomia - Università di Catania (Italy); INFN, Sezione di Catania (Italy)

2015-04-21

Identification of charged particles in a multilayer detector by the energy loss technique may also be achieved by the use of a neural network. The performance of the network becomes worse when a large fraction of information is missing, for instance due to detector inefficiencies. Algorithms which provide a way to impute missing information have been developed over the past years. Among the various approaches, we focused on normal mixtures’ models in comparison with standard mean imputation and multiple imputation methods. Further, to account for the intrinsic asymmetry of the energy loss data, we considered skew-normal mixture models and provided a closed form implementation in the Expectation-Maximization (EM) algorithm framework to handle missing patterns. The method has been applied to a test case where the energy losses of pions, kaons and protons in a six-layers’ Silicon detector are considered as input neurons to a neural network. Results are given in terms of reconstruction efficiency and purity of the various species in different momentum bins.
Which DTW Method Applied to Marine Univariate Time Series Imputation

OpenAIRE

Phan , Thi-Thu-Hong; Caillault , Émilie; Lefebvre , Alain; Bigand , André

2017-01-01

International audience; Missing data are ubiquitous in any domains of applied sciences. Processing datasets containing missing values can lead to a loss of efficiency and unreliable results, especially for large missing sub-sequence(s). Therefore, the aim of this paper is to build a framework for filling missing values in univariate time series and to perform a comparison of different similarity metrics used for the imputation task. This allows to suggest the most suitable methods for the imp...
Towards a more efficient representation of imputation operators in TPOT

OpenAIRE

Garciarena, Unai; Mendiburu, Alexander; Santana, Roberto

2018-01-01

Automated Machine Learning encompasses a set of meta-algorithms intended to design and apply machine learning techniques (e.g., model selection, hyperparameter tuning, model assessment, etc.). TPOT, a software for optimizing machine learning pipelines based on genetic programming (GP), is a novel example of this kind of applications. Recently we have proposed a way to introduce imputation methods as part of TPOT. While our approach was able to deal with problems with missing data, it can prod...
Treatments of Missing Values in Large National Data Affect Conclusions: The Impact of Multiple Imputation on Arthroplasty Research.

Science.gov (United States)

Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Su, Edwin P; Grauer, Jonathan N

2018-03-01

Despite the advantages of large, national datasets, one continuing concern is missing data values. Complete case analysis, where only cases with complete data are analyzed, is commonly used rather than more statistically rigorous approaches such as multiple imputation. This study characterizes the potential selection bias introduced using complete case analysis and compares the results of common regressions using both techniques following unicompartmental knee arthroplasty. Patients undergoing unicompartmental knee arthroplasty were extracted from the 2005 to 2015 National Surgical Quality Improvement Program. As examples, the demographics of patients with and without missing preoperative albumin and hematocrit values were compared. Missing data were then treated with both complete case analysis and multiple imputation (an approach that reproduces the variation and associations that would have been present in a full dataset) and the conclusions of common regressions for adverse outcomes were compared. A total of 6117 patients were included, of which 56.7% were missing at least one value. Younger, female, and healthier patients were more likely to have missing preoperative albumin and hematocrit values. The use of complete case analysis removed 3467 patients from the study in comparison with multiple imputation which included all 6117 patients. The 2 methods of handling missing values led to differing associations of low preoperative laboratory values with commonly studied adverse outcomes. The use of complete case analysis can introduce selection bias and may lead to different conclusions in comparison with the statistically rigorous multiple imputation approach. Joint surgeons should consider the methods of handling missing values when interpreting arthroplasty research. Copyright © 2017 Elsevier Inc. All rights reserved.
Inference of haplotypic phase and missing genotypes in polyploid organisms and variable copy number genomic regions

Directory of Open Access Journals (Sweden)

Balding David J

2008-12-01

Full Text Available Abstract Background The power of haplotype-based methods for association studies, identification of regions under selection, and ancestral inference, is well-established for diploid organisms. For polyploids, however, the difficulty of determining phase has limited such approaches. Polyploidy is common in plants and is also observed in animals. Partial polyploidy is sometimes observed in humans (e.g. trisomy 21; Down's syndrome, and it arises more frequently in some human tissues. Local changes in ploidy, known as copy number variations (CNV, arise throughout the genome. Here we present a method, implemented in the software polyHap, for the inference of haplotype phase and missing observations from polyploid genotypes. PolyHap allows each individual to have a different ploidy, but ploidy cannot vary over the genomic region analysed. It employs a hidden Markov model (HMM and a sampling algorithm to infer haplotypes jointly in multiple individuals and to obtain a measure of uncertainty in its inferences. Results In the simulation study, we combine real haplotype data to create artificial diploid, triploid, and tetraploid genotypes, and use these to demonstrate that polyHap performs well, in terms of both switch error rate in recovering phase and imputation error rate for missing genotypes. To our knowledge, there is no comparable software for phasing a large, densely genotyped region of chromosome from triploids and tetraploids, while for diploids we found polyHap to be more accurate than fastPhase. We also compare the results of polyHap to SATlotyper on an experimentally haplotyped tetraploid dataset of 12 SNPs, and show that polyHap is more accurate. Conclusion With the availability of large SNP data in polyploids and CNV regions, we believe that polyHap, our proposed method for inferring haplotypic phase from genotype data, will be useful in enabling researchers analysing such data to exploit the power of haplotype-based analyses.
Decree no. 2004-90 from January 28, 2004 relative to the compensation of electric public utility charges

International Nuclear Information System (INIS)

2004-03-01

This decree defines the charges imputable to the missions of electric public utility, the procedure of determination of their amount, the contribution to these charges by end-users and the operations of recovery and transfer, the processing of declaration defects and payment failures and some other various dispositions. (J.S.)
High-throughput mouse genotyping using robotics automation.

Science.gov (United States)

Linask, Kaari L; Lo, Cecilia W

2005-02-01

The use of mouse models is rapidly expanding in biomedical research. This has dictated the need for the rapid genotyping of mutant mouse colonies for more efficient utilization of animal holding space. We have established a high-throughput protocol for mouse genotyping using two robotics workstations: a liquid-handling robot to assemble PCR and a microfluidics electrophoresis robot for PCR product analysis. This dual-robotics setup incurs lower start-up costs than a fully automated system while still minimizing human intervention. Essential to this automation scheme is the construction of a database containing customized scripts for programming the robotics workstations. Using these scripts and the robotics systems, multiple combinations of genotyping reactions can be assembled simultaneously, allowing even complex genotyping data to be generated rapidly with consistency and accuracy. A detailed protocol, database, scripts, and additional background information are available at http://dir.nhlbi.nih.gov/labs/ldb-chd/autogene/.
Life course variations in the heritability of body size

DEFF Research Database (Denmark)

Zhao, J.; Luan, J.A.; Sharp, S.J.

aim was to use this approach to investigate the life course variations in heritability of body size. Methods: We analysed height, weight and body mass index variables at 11 time-points in 2,452 individuals (1,225 men, 1,227 women) born in 1946 and enrolled in the MRC National Survey of Health...... and Development (NSHD), with genotypes at 147,949 single nucleotide polymorphisms (SNPs) on Metabochips which were subsequently imputed to 506,255 according to the 1000Genomes project. We obtained genome-wide kinship matrices using genotypes at SNPs on Metabochips and genotypes at all SNPs, which were used.......11(0-0.20), 0.10(0-0.22) for height, weight and body mass index, respectively. Variation in estimates was also seen between alternative procedures. Conclusion: This work supports the utility of large-scale genotype data in heritability estimation and highlights the age-related variability in genetic...
Genotyping-by-Sequencing and Its Exploitation for Forage and Cool-Season Grain Legume Breeding

Science.gov (United States)

Annicchiarico, Paolo; Nazzicari, Nelson; Wei, Yanling; Pecetti, Luciano; Brummer, Edward C.

2017-01-01

Genotyping-by-Sequencing (GBS) may drastically reduce genotyping costs compared with single nucleotide polymorphism (SNP) array platforms. However, it may require optimization for specific crops to maximize the number of available markers. Exploiting GBS-generated markers may require optimization, too (e.g., to cope with missing data). This study aimed (i) to compare elements of GBS protocols on legume species that differ for genome size, ploidy, and breeding system, and (ii) to show successful applications and challenges of GBS data on legume species. Preliminary work on alfalfa and Medicago truncatula suggested the greater interest of ApeKI over PstI:MspI DNA digestion. We compared KAPA and NEB Taq polymerases in combination with primer extensions that were progressively more selective on restriction sites, and found greater number of polymorphic SNP loci in pea, white lupin and diploid alfalfa when adopting KAPA with a non-selective primer. This protocol displayed a slight advantage also for tetraploid alfalfa (where SNP calling requires higher read depth). KAPA offered the further advantage of more uniform amplification than NEB over fragment sizes and GC contents. The number of GBS-generated polymorphic markers exceeded 6,500 in two tetraploid alfalfa reference populations and a world collection of lupin genotypes, and 2,000 in different sets of pea or lupin recombinant inbred lines. The predictive ability of GBS-based genomic selection was influenced by the genotype missing data threshold and imputation, as well as by the genomic selection model, with the best model depending on traits and data sets. We devised a simple method for comparing phenotypic vs. genomic selection in terms of predicted yield gain per year for same evaluation costs, whose application to preliminary data for alfalfa and pea in a hypothetical selection scenario for each crop indicated a distinct advantage of genomic selection. PMID:28536584
Genotyping-by-Sequencing and Its Exploitation for Forage and Cool-Season Grain Legume Breeding

Directory of Open Access Journals (Sweden)

Paolo Annicchiarico

2017-05-01

Full Text Available Genotyping-by-Sequencing (GBS may drastically reduce genotyping costs compared with single nucleotide polymorphism (SNP array platforms. However, it may require optimization for specific crops to maximize the number of available markers. Exploiting GBS-generated markers may require optimization, too (e.g., to cope with missing data. This study aimed (i to compare elements of GBS protocols on legume species that differ for genome size, ploidy, and breeding system, and (ii to show successful applications and challenges of GBS data on legume species. Preliminary work on alfalfa and Medicago truncatula suggested the greater interest of ApeKI over PstI:MspI DNA digestion. We compared KAPA and NEB Taq polymerases in combination with primer extensions that were progressively more selective on restriction sites, and found greater number of polymorphic SNP loci in pea, white lupin and diploid alfalfa when adopting KAPA with a non-selective primer. This protocol displayed a slight advantage also for tetraploid alfalfa (where SNP calling requires higher read depth. KAPA offered the further advantage of more uniform amplification than NEB over fragment sizes and GC contents. The number of GBS-generated polymorphic markers exceeded 6,500 in two tetraploid alfalfa reference populations and a world collection of lupin genotypes, and 2,000 in different sets of pea or lupin recombinant inbred lines. The predictive ability of GBS-based genomic selection was influenced by the genotype missing data threshold and imputation, as well as by the genomic selection model, with the best model depending on traits and data sets. We devised a simple method for comparing phenotypic vs. genomic selection in terms of predicted yield gain per year for same evaluation costs, whose application to preliminary data for alfalfa and pea in a hypothetical selection scenario for each crop indicated a distinct advantage of genomic selection.
Genotypical and environmental variability of fibre productivity and quality of linseed genotypes with a view to oil and short fibre utilization. Final report; Genotypische und umweltbedingte Variabilitaet der Faserleistung und -qualitaet von Oelleingenotypen im Hinblick auf die Nutzung von Oel und Kurzfaser. Abschlussbericht

Energy Technology Data Exchange (ETDEWEB)

Diepenbrock, W.; Rennebaum, H.; Grimm, E.

1999-10-01

Linseed (Linum usitatissimum L.) was analyzed for combined utilization of oil and fibres. Effects of genotypical variabilities and environmental factors were investigated in field experiments in two sites (Dikopshof, Etzdorf) for three years (1995-1997). A comparison of selected genotypes showed that there is no genotype which combines the characteristics of high oil yield, long stem and good fibre characteristics. Of the eleven genotypes tested, eight were found to be suited, with restrictions, for double use and for further cultivation. [German] Im Mittelpunkt der vorliegenden Arbeit steht die Evaluierung von Oellein (Linum usitatissimum L.) im Hinblick auf eine kombinierte Nutzung von Oel und Fasern. Oekonomische Vorteile werden aufgrund eines wachsenden Bedarfs an Pflanzenfasern, insbesondere ausserhalb textiler Verarbeitungslinien erwartet. Zur Pruefung genotypischer Variabilitaeten und dem Einfluss von Umweltfaktoren auf Ertrag, Ertragskomponenten und Faserqualitaet wurden Feldversuche an zwei Standorten (Dikopshof, Etzdorf) ueber drei Jahre (1995-1997) angelegt. Der Vergleich ausgewaehlter Genotypen zeigt, dass ein ausgesprochener Doppelnutzungstyp nicht vorliegt. Es fehlen Genotypen mit der Merkmalskomibnation: hoher Oelertrag, langer technischer Stengel und vorteilhafte Fasereigenschaften. Aus einer Gruppe von elf umfassend geprueften Genotypen sind acht mit Einschraenkungen fuer eine Doppelnutzung geeignet. Diese Genotypen kommen zugleich fuer eine pflanzenzuechterische Bearbeitung in Frage. (orig.)
Mapping change of older forest with nearest-neighbor imputation and Landsat time-series

Science.gov (United States)

Janet L. Ohmann; Matthew J. Gregory; Heather M. Roberts; Warren B. Cohen; Robert E. Kennedy; Zhiqiang. Yang

2012-01-01

The Northwest Forest Plan (NWFP), which aims to conserve late-successional and old-growth forests (older forests) and associated species, established new policies on federal lands in the Pacific Northwest USA. As part of monitoring for the NWFP, we tested nearest-neighbor imputation for mapping change in older forest, defined by threshold values for forest attributes...
Inference for multivariate regression model based on multiply imputed synthetic data generated via posterior predictive sampling

Science.gov (United States)

Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.

2017-06-01

The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.
Multiple imputation to account for measurement error in marginal structural models

Science.gov (United States)

Edwards, Jessie K.; Cole, Stephen R.; Westreich, Daniel; Crane, Heidi; Eron, Joseph J.; Mathews, W. Christopher; Moore, Richard; Boswell, Stephen L.; Lesko, Catherine R.; Mugavero, Michael J.

2015-01-01

Background Marginal structural models are an important tool for observational studies. These models typically assume that variables are measured without error. We describe a method to account for differential and non-differential measurement error in a marginal structural model. Methods We illustrate the method estimating the joint effects of antiretroviral therapy initiation and current smoking on all-cause mortality in a United States cohort of 12,290 patients with HIV followed for up to 5 years between 1998 and 2011. Smoking status was likely measured with error, but a subset of 3686 patients who reported smoking status on separate questionnaires composed an internal validation subgroup. We compared a standard joint marginal structural model fit using inverse probability weights to a model that also accounted for misclassification of smoking status using multiple imputation. Results In the standard analysis, current smoking was not associated with increased risk of mortality. After accounting for misclassification, current smoking without therapy was associated with increased mortality [hazard ratio (HR): 1.2 (95% CI: 0.6, 2.3)]. The HR for current smoking and therapy (0.4 (95% CI: 0.2, 0.7)) was similar to the HR for no smoking and therapy (0.4; 95% CI: 0.2, 0.6). Conclusions Multiple imputation can be used to account for measurement error in concert with methods for causal inference to strengthen results from observational studies. PMID:26214338
Multiple Imputation to Account for Measurement Error in Marginal Structural Models.

Science.gov (United States)

Edwards, Jessie K; Cole, Stephen R; Westreich, Daniel; Crane, Heidi; Eron, Joseph J; Mathews, W Christopher; Moore, Richard; Boswell, Stephen L; Lesko, Catherine R; Mugavero, Michael J

2015-09-01

Marginal structural models are an important tool for observational studies. These models typically assume that variables are measured without error. We describe a method to account for differential and nondifferential measurement error in a marginal structural model. We illustrate the method estimating the joint effects of antiretroviral therapy initiation and current smoking on all-cause mortality in a United States cohort of 12,290 patients with HIV followed for up to 5 years between 1998 and 2011. Smoking status was likely measured with error, but a subset of 3,686 patients who reported smoking status on separate questionnaires composed an internal validation subgroup. We compared a standard joint marginal structural model fit using inverse probability weights to a model that also accounted for misclassification of smoking status using multiple imputation. In the standard analysis, current smoking was not associated with increased risk of mortality. After accounting for misclassification, current smoking without therapy was associated with increased mortality (hazard ratio [HR]: 1.2 [95% confidence interval [CI] = 0.6, 2.3]). The HR for current smoking and therapy [0.4 (95% CI = 0.2, 0.7)] was similar to the HR for no smoking and therapy (0.4; 95% CI = 0.2, 0.6). Multiple imputation can be used to account for measurement error in concert with methods for causal inference to strengthen results from observational studies.

Multiple imputation strategies for zero-inflated cost data in economic evaluations : which method works best?

NARCIS (Netherlands)

MacNeil Vroomen, Janet; Eekhout, Iris; Dijkgraaf, Marcel G; van Hout, Hein; de Rooij, Sophia E; Heymans, Martijn W; Bosmans, Judith E

2016-01-01

Cost and effect data often have missing data because economic evaluations are frequently added onto clinical studies where cost data are rarely the primary outcome. The objective of this article was to investigate which multiple imputation strategy is most appropriate to use for missing
A Note on the Effect of Data Clustering on the Multiple-Imputation Variance Estimator: A Theoretical Addendum to the Lewis et al. article in JOS 2014

Directory of Open Access Journals (Sweden)

He Yulei

2016-03-01

Full Text Available Multiple imputation is a popular approach to handling missing data. Although it was originally motivated by survey nonresponse problems, it has been readily applied to other data settings. However, its general behavior still remains unclear when applied to survey data with complex sample designs, including clustering. Recently, Lewis et al. (2014 compared single- and multiple-imputation analyses for certain incomplete variables in the 2008 National Ambulatory Medicare Care Survey, which has a nationally representative, multistage, and clustered sampling design. Their study results suggested that the increase of the variance estimate due to multiple imputation compared with single imputation largely disappears for estimates with large design effects. We complement their empirical research by providing some theoretical reasoning. We consider data sampled from an equally weighted, single-stage cluster design and characterize the process using a balanced, one-way normal random-effects model. Assuming that the missingness is completely at random, we derive analytic expressions for the within- and between-multiple-imputation variance estimators for the mean estimator, and thus conveniently reveal the impact of design effects on these variance estimators. We propose approximations for the fraction of missing information in clustered samples, extending previous results for simple random samples. We discuss some generalizations of this research and its practical implications for data release by statistical agencies.
Estimating Stand Height and Tree Density in Pinus taeda plantations using in-situ data, airborne LiDAR and k-Nearest Neighbor Imputation.

Science.gov (United States)

Silva, Carlos Alberto; Klauberg, Carine; Hudak, Andrew T; Vierling, Lee A; Liesenberg, Veraldo; Bernett, Luiz G; Scheraiber, Clewerson F; Schoeninger, Emerson R

2018-01-01

Accurate forest inventory is of great economic importance to optimize the entire supply chain management in pulp and paper companies. The aim of this study was to estimate stand dominate and mean heights (HD and HM) and tree density (TD) of Pinus taeda plantations located in South Brazil using in-situ measurements, airborne Light Detection and Ranging (LiDAR) data and the non- k-nearest neighbor (k-NN) imputation. Forest inventory attributes and LiDAR derived metrics were calculated at 53 regular sample plots and we used imputation models to retrieve the forest attributes at plot and landscape-levels. The best LiDAR-derived metrics to predict HD, HM and TD were H99TH, HSD, SKE and HMIN. The Imputation model using the selected metrics was more effective for retrieving height than tree density. The model coefficients of determination (adj.R2) and a root mean squared difference (RMSD) for HD, HM and TD were 0.90, 0.94, 0.38m and 6.99, 5.70, 12.92%, respectively. Our results show that LiDAR and k-NN imputation can be used to predict stand heights with high accuracy in Pinus taeda. However, furthers studies need to be realized to improve the accuracy prediction of TD and to evaluate and compare the cost of acquisition and processing of LiDAR data against the conventional inventory procedures.
Baseline predictors of sputum culture conversion in pulmonary tuberculosis: importance of cavities, smoking, time to detection and W-Beijing genotype.

Directory of Open Access Journals (Sweden)

Marianne E Visser

Full Text Available Time to detection (TTD on automated liquid mycobacterial cultures is an emerging biomarker of tuberculosis outcomes. The M. tuberculosis W-Beijing genotype is spreading globally, indicating a selective advantage. There is a paucity of data on the association between baseline TTD and W-Beijing genotype and tuberculosis outcomes.To assess baseline predictors of failure of sputum culture conversion, within the first 2 months of antitubercular therapy, in participants with pulmonary tuberculosis.Between May 2005 and August 2008 we conducted a prospective cohort study of time to sputum culture conversion in ambulatory participants with first episodes of smear and culture positive pulmonary tuberculosis attending two primary care clinics in Cape Town, South Africa. Rifampicin resistance (diagnosed on phenotypic susceptibility testing was an exclusion criterion. Sputum was collected weekly for 8 weeks for mycobacterial culture on liquid media (BACTEC MGIT 960. Due to missing data, multiple imputation was performed. Time to sputum culture conversion was analysed using a Cox-proportional hazards model. Bayesian model averaging determined the posterior effect probability for each variable.113 participants were enrolled (30.1% female, 10.5% HIV-infected, 44.2% W-Beijing genotype, and 89% cavities. On Kaplan Meier analysis 50.4% of participants underwent sputum culture conversion by 8 weeks. The following baseline factors were associated with slower sputum culture conversion: TTD (adjusted hazard ratio (aHR = 1.11, 95% CI 1.02; 1.2, lung cavities (aHR = 0.13, 95% CI 0.02; 0.95, ever smoking (aHR = 0.32, 95% CI 0.1; 1.02 and the W-Beijing genotype (aHR = 0.51, 95% CI 0.25; 1.07. On Bayesian model averaging, posterior probability effects were strong for TTD, lung cavitation and smoking and moderate for W-Beijing genotype.We found that baseline TTD, smoking, cavities and W-Beijing genotype were associated with delayed 2 month sputum culture
Baseline predictors of sputum culture conversion in pulmonary tuberculosis: importance of cavities, smoking, time to detection and W-Beijing genotype.

Science.gov (United States)

Visser, Marianne E; Stead, Michael C; Walzl, Gerhard; Warren, Rob; Schomaker, Michael; Grewal, Harleen M S; Swart, Elizabeth C; Maartens, Gary

2012-01-01

Time to detection (TTD) on automated liquid mycobacterial cultures is an emerging biomarker of tuberculosis outcomes. The M. tuberculosis W-Beijing genotype is spreading globally, indicating a selective advantage. There is a paucity of data on the association between baseline TTD and W-Beijing genotype and tuberculosis outcomes. To assess baseline predictors of failure of sputum culture conversion, within the first 2 months of antitubercular therapy, in participants with pulmonary tuberculosis. Between May 2005 and August 2008 we conducted a prospective cohort study of time to sputum culture conversion in ambulatory participants with first episodes of smear and culture positive pulmonary tuberculosis attending two primary care clinics in Cape Town, South Africa. Rifampicin resistance (diagnosed on phenotypic susceptibility testing) was an exclusion criterion. Sputum was collected weekly for 8 weeks for mycobacterial culture on liquid media (BACTEC MGIT 960). Due to missing data, multiple imputation was performed. Time to sputum culture conversion was analysed using a Cox-proportional hazards model. Bayesian model averaging determined the posterior effect probability for each variable. 113 participants were enrolled (30.1% female, 10.5% HIV-infected, 44.2% W-Beijing genotype, and 89% cavities). On Kaplan Meier analysis 50.4% of participants underwent sputum culture conversion by 8 weeks. The following baseline factors were associated with slower sputum culture conversion: TTD (adjusted hazard ratio (aHR) = 1.11, 95% CI 1.02; 1.2), lung cavities (aHR = 0.13, 95% CI 0.02; 0.95), ever smoking (aHR = 0.32, 95% CI 0.1; 1.02) and the W-Beijing genotype (aHR = 0.51, 95% CI 0.25; 1.07). On Bayesian model averaging, posterior probability effects were strong for TTD, lung cavitation and smoking and moderate for W-Beijing genotype. We found that baseline TTD, smoking, cavities and W-Beijing genotype were associated with delayed 2 month sputum culture. Larger
Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

DEFF Research Database (Denmark)

Meseck, Kristin; Jankowska, Marta M; Schipperijn, Jasper

2016-01-01

The main purpose of the present study was to assess the impact of global positioning system (GPS) signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accurate...
Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels

NARCIS (Netherlands)

van Leeuwen, E.M.; Karssen, L.C.; Deelen, J.; Isaacs, A.; Medina-Gomez, C.; Mbarek, H.; Kanterakis, A.; Trompet, S.; Postmus, I.; Verweij, N.; van Enckevort, D.; Huffman, J.E.; White, C.C.; Feitosa, M.F.; Bartz, T.M.; Manichaikul, A.; Joshi, P.K.; Peloso, G.M.; Deelen, P.; Dijk, F.; Willemsen, G.; de Geus, E.J.C.; Milaneschi, Y.; Penninx, B.W.J.H.; Francioli, L.C.; Menelaou, A.; Pulit, S.L.; Rivadeneira, F.; Hofman, A.; Oostra, B.A.; Franco, O.H.; Mateo Leach, I.; Beekman, M.; de Craen, A.J.; Uh, H.W.; Trochet, H.; Hocking, L.J.; Porteous, D.J.; Sattar, N.; Packard, C.J.; Buckley, B.M.; Brody, J.A.; Bis, J.C.; Rotter, J.I.; Mychaleckyj, J.C.; Campbell, H.; Duan, Q.; Lange, L.A.; Wilson, J.F.; Hayward, C.; Polasek, O.; Vitart, V.; Rudan, I.; Wright, A.F.; Rich, S.S.; Psaty, B.M.; Borecki, I.B.; Kearney, P.M.; Stott, D.J.; Cupples, L.A.; Jukema, J.W.; van der Harst, P.; Sijbrands, E.J.; Hottenga, J.J.; Uitterlinden, A.G.; Swertz, M.A.; van Ommen, G.J.B; Bakker, P.I.W.; Slagboom, P.E.; Boomsma, D.I.; Wijmenga, C.; van Duijn, C.M.

2015-01-01

Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (∼35,000 samples) with the population-specific reference panel created
Estimating Stand Height and Tree Density in Pinus taeda plantations using in-situ data, airborne LiDAR and k-Nearest Neighbor Imputation

Directory of Open Access Journals (Sweden)

CARLOS ALBERTO SILVA

Full Text Available ABSTRACT Accurate forest inventory is of great economic importance to optimize the entire supply chain management in pulp and paper companies. The aim of this study was to estimate stand dominate and mean heights (HD and HM and tree density (TD of Pinus taeda plantations located in South Brazil using in-situ measurements, airborne Light Detection and Ranging (LiDAR data and the non- k-nearest neighbor (k-NN imputation. Forest inventory attributes and LiDAR derived metrics were calculated at 53 regular sample plots and we used imputation models to retrieve the forest attributes at plot and landscape-levels. The best LiDAR-derived metrics to predict HD, HM and TD were H99TH, HSD, SKE and HMIN. The Imputation model using the selected metrics was more effective for retrieving height than tree density. The model coefficients of determination (adj.R2 and a root mean squared difference (RMSD for HD, HM and TD were 0.90, 0.94, 0.38m and 6.99, 5.70, 12.92%, respectively. Our results show that LiDAR and k-NN imputation can be used to predict stand heights with high accuracy in Pinus taeda. However, furthers studies need to be realized to improve the accuracy prediction of TD and to evaluate and compare the cost of acquisition and processing of LiDAR data against the conventional inventory procedures.
Combining item response theory with multiple imputation to equate health assessment questionnaires.

Science.gov (United States)

Gu, Chenyang; Gutman, Roee

2017-09-01

The assessment of patients' functional status across the continuum of care requires a common patient assessment tool. However, assessment tools that are used in various health care settings differ and cannot be easily contrasted. For example, the Functional Independence Measure (FIM) is used to evaluate the functional status of patients who stay in inpatient rehabilitation facilities, the Minimum Data Set (MDS) is collected for all patients who stay in skilled nursing facilities, and the Outcome and Assessment Information Set (OASIS) is collected if they choose home health care provided by home health agencies. All three instruments or questionnaires include functional status items, but the specific items, rating scales, and instructions for scoring different activities vary between the different settings. We consider equating different health assessment questionnaires as a missing data problem, and propose a variant of predictive mean matching method that relies on Item Response Theory (IRT) models to impute unmeasured item responses. Using real data sets, we simulated missing measurements and compared our proposed approach to existing methods for missing data imputation. We show that, for all of the estimands considered, and in most of the experimental conditions that were examined, the proposed approach provides valid inferences, and generally has better coverages, relatively smaller biases, and shorter interval estimates. The proposed method is further illustrated using a real data set. © 2016, The International Biometric Society.
Infestation of the banana root borer among different banana plant genotypes

Directory of Open Access Journals (Sweden)

Fernando Teixeira de Oliveira

Full Text Available ABSTRACT: In this study, we aimed to investigate Cosmopolites sordidus (Coleoptera: Dryophthoridae infestation among different banana genotypes in a commercial banana orchard over the course of 30 months. Banana root borer infestation was compared in 20 banana genotypes, including five varieties and 15 hybrids. Overall, we observed that 94.17% of pest infestation cases occurred in the cortex region, and only 5.83% occurred in the central cylinder. Genotypes least sensitive to infestation were the Prata Anã (AAB and Pacovan (AAB varieties, where no damage was recorded. Among the hybrid genotypes, PV 9401 and BRS Fhia 18 showed intermediate levels of sensitivity, while BRS Tropical hybrids (AAAB, PA 9401 (AAAB, BRS Vitoria (AAAB, YB 4203 (AAAB, and Bucaneiro (AAAA were the most sensitive to attack by banana root borer. This study demonstrated that the infestation of the banana root borer varies according banana plant genotype, and the utilization of less susceptible genotypes could reduce infestation rates of C. sordidus.
Evidence that breast cancer risk at the 2q35 locus is mediated through IGFBP5 regulation

NARCIS (Netherlands)

M. Ghoussaini (Maya); S.L. Edwards (Stacey); K. Michailidou (Kyriaki); S. Nord (Silje); R. Cowper-Sal-lari (Richard); K. Desai (Kinjal); S. Kar (Siddhartha); K.M. Hillman (Kristine); S. Kaufmann (Susanne); D.M. Glubb (Dylan); J. Beesley (Jonathan); J. Dennis (Joe); M.K. Bolla (Manjeet); Q. Wang (Qing); E. Dicks (Ed); Q. Guo (Qi); M.K. Schmidt (Marjanka); M. Shah (Mitul); R.N. Luben (Robert); J. Brown (Judith); K. Czene (Kamila); H. Darabi (Hatef); M. Eriksson (Mats); D. Klevebring (Daniel); S.E. Bojesen (Stig); B.G. Nordestgaard (Børge); S.F. Nielsen (Sune); H. Flyger (Henrik); D. Lambrechts (Diether); B. Thienpont (Bernard); P. Neven (Patrick); H. Wildiers (Hans); A. Broeks (Annegien); L.J. van 't Veer (Laura); E.J.T. Rutgers (Emiel); F.J. Couch (Fergus); J.E. Olson (Janet); B. Hallberg (Boubou); C. Vachon (Celine); J. Chang-Claude (Jenny); A. Rudolph (Anja); P. Seibold (Petra); D. Flesch-Janys (Dieter); J. Peto (Julian); I. dos Santos Silva (Isabel); L.J. Gibson (Lorna); H. Nevanlinna (Heli); T.A. Muranen (Taru); K. Aittomäki (Kristiina); C. Blomqvist (Carl); P. Hall (Per); J. Li (Jingmei); J. Liu (Jianjun); M.K. Humphreys (Manjeet); D. Kang (Daehee); J.-Y. Choi (J.); S.K. Park (Sue); D-Y. Noh (Dong-Young); K. Matsuo (Keitaro); H. Ito (Hidemi); H. Iwata (Hisato); Y. Yatabe (Yasushi); P. Guénel (Pascal); T. Truong (Thérèse); F. Menegaux (Florence); M. Sanchez (Marie); B. Burwinkel (Barbara); F. Marme (Federick); A. Schneeweiss (Andreas); C. Sohn (Christof); A.H. Wu (Anna H.); C.-C. Tseng (Chiu-Chen); D. Van Den Berg (David); D.O. Stram (Daniel O.); J. Benítez (Javier); M.P. Zamora (Pilar); J.I.A. Perez (Jose Ignacio Arias); P. Menéndez (Primitiva); X.-O. Shu (Xiao-Ou); W. Lu (Wei); Y. Gao; Q. Cai (Qiuyin); A. Cox (Angela); S.S. Cross (Simon); M.W.R. Reed (Malcolm); I.L. Andrulis (Irene); J.A. Knight (Julia); G. Glendon (Gord); S. Tchatchou (Sandrine); E.J. Sawyer (Elinor); I.P. Tomlinson (Ian); M. Kerin (Michael); N. Miller (Nicola); C.A. Haiman (Christopher); B.E. Henderson (Brian); F.R. Schumacher (Fredrick); L. Le Marchand (Loic); A. Lindblom (Annika); S. Margolin (Sara); S.-H. Teo (Soo-Hwang); C.H. Yip (Cheng Har); D.S.C. Lee (Daphne S.C.); T.Y. Wong (Tien Yin); M.J. Hooning (Maartje); J.W.M. Martens (John W. M.); J.M. Collée (Margriet); C.H.M. van Deurzen (Carolien); J.L. Hopper (John); M.C. Southey (Melissa); H. Tsimiklis (Helen); M.K. Kapuscinski (Miroslav K.); C-Y. Shen (Chen-Yang); P.-E. Wu (Pei-Ei); J-C. Yu (Jyh-Cherng); S.-T. Chen; G.G. Alnæs (Grethe); A.-L. Borresen-Dale (Anne-Lise); G.G. Giles (Graham); R.L. Milne (Roger); C.A. McLean (Catriona Ann); K.R. Muir (K.); A. Lophatananon (Artitaya); S. Stewart-Brown (Sarah); P. Siriwanarangsan (Pornthep); M. Hartman (Mikael); X. Miao; S.A.B.S. Buhari (Shaik Ahmad Bin Syed); Y.Y. Teo (Yik Ying); P.A. Fasching (Peter); L. Haeberle (Lothar); A.B. Ekici (Arif); M.W. Beckmann (Matthias); H. Brenner (Hermann); A.K. Dieffenbach (Aida Karina); V. Arndt (Volker); C. Stegmaier (Christa); A.J. Swerdlow (Anthony ); A. Ashworth (Alan); N. Orr (Nick); M. Schoemaker (Minouk); M. García-Closas (Montserrat); J.D. Figueroa (Jonine); S.J. Chanock (Stephen); J. Lissowska (Jolanta); J. Simard (Jacques); M.S. Goldberg (Mark); F. Labrèche (France); M. Dumont (Martine); R. Winqvist (Robert); K. Pykäs (Katri); A. Jukkola-Vuorinen (Arja); H. Brauch (Hiltrud); T. Brüning (Thomas); Y.-D. Koto (Yon-Dschun); P. Radice (Paolo); P. Peterlongo (Paolo); B. Bonnani (Bernardo); S. Volorio (Sara); T. Dörk (Thilo); N.V. Bogdanova (Natalia); S. Helbig (Sonja); A. Mannermaa (Arto); V. Kataja (Vesa); V-M. Kosma (Veli-Matti); J.M. Hartikainen (J.); P. Devilee (Peter); R.A.E.M. Tollenaar (Rob); C.M. Seynaeve (Caroline); C.J. van Asperen (Christi); A. Jakubowska (Anna); J. Lubinski (Jan); K. Jaworska-Bieniek (Katarzyna); K. Durda (Katarzyna); S. Slager (Susan); A.E. Toland (Amanda); C.B. Ambrosone (Christine); D. Yannoukakos (Drakoulis); S. Sangrajrang (Suleeporn); V. Gaborieau (Valerie); P. Brennan (Paul); J.D. McKay (James); U. Hamann (Ute); D. Torres (Diana); W. Zheng (Wei); J. Long (Jirong); H. Anton-Culver (Hoda); S.L. Neuhausen (Susan); C. Luccarini (Craig); C. Baynes (Caroline); S. Ahmed (Shahana); M. Maranian (Melanie); S. Healey (Sue); A. González-Neira (Anna); G. Pita (Guillermo); M.R. Alonso (Rosario); N. Álvarez (Nuria); D. Herrero (Daniel); D.C. Tessier (Daniel C.); D. Vincent (Daniel); F. Bacot (Francois); I. de Santiago (Ines); J. Carroll (Jason); C. Caldas (Carlos); M. Brown (Melissa); M. Lupien (Mathieu); V. Kristensen (Vessela); P.D.P. Pharoah (Paul); G. Chenevix-Trench (Georgia); J.D. French (Juliet); D.F. Easton (Douglas); A.M. Dunning (Alison); P. Webb (Penny); A. De Fazio (Anna)

2014-01-01

textabstractGWAS have identified a breast cancer susceptibility locus on 2q35. Here we report the fine mapping of this locus using data from 101,943 subjects from 50 case-control studies. We genotype 276 SNPs using the 'iCOGS' genotyping array and impute genotypes for a further 1,284 using 1000
Genomic Selection for Predicting Fusarium Head Blight Resistance in a Wheat Breeding Program

Directory of Open Access Journals (Sweden)

Marcio P. Arruda

2015-11-01

Full Text Available Genomic selection (GS is a breeding method that uses marker–trait models to predict unobserved phenotypes. This study developed GS models for predicting traits associated with resistance to head blight (FHB in wheat ( L.. We used genotyping-by-sequencing (GBS to identify 5054 single-nucleotide polymorphisms (SNPs, which were then treated as predictor variables in GS analysis. We compared how the prediction accuracy of the genomic-estimated breeding values (GEBVs was affected by (i five genotypic imputation methods (random forest imputation [RFI], expectation maximization imputation [EMI], -nearest neighbor imputation [kNNI], singular value decomposition imputation [SVDI], and the mean imputation [MNI]; (ii three statistical models (ridge-regression best linear unbiased predictor [RR-BLUP], least absolute shrinkage and operator selector [LASSO], and elastic net; (iii marker density ( = 500, 1500, 3000, and 4500 SNPs; (iv training population (TP size ( = 96, 144, 192, and 218; (v marker-based and pedigree-based relationship matrices; and (vi control for relatedness in TPs and validation populations (VPs. No discernable differences in prediction accuracy were observed among imputation methods. The RR-BLUP outperformed other models in nearly all scenarios. Accuracies decreased substantially when marker number decreased to 3000 or 1500 SNPs, depending on the trait; when sample size of the training set was less than 192; when using pedigree-based instead of marker-based matrix; or when no control for relatedness was implemented. Overall, moderate to high prediction accuracies were observed in this study, suggesting that GS is a very promising breeding strategy for FHB resistance in wheat.
Imputation of Baseline LDL Cholesterol Concentration in Patients with Familial Hypercholesterolemia on Statins or Ezetimibe.

Science.gov (United States)

Ruel, Isabelle; Aljenedil, Sumayah; Sadri, Iman; de Varennes, Émilie; Hegele, Robert A; Couture, Patrick; Bergeron, Jean; Wanneh, Eric; Baass, Alexis; Dufour, Robert; Gaudet, Daniel; Brisson, Diane; Brunham, Liam R; Francis, Gordon A; Cermakova, Lubomira; Brophy, James M; Ryomoto, Arnold; Mancini, G B John; Genest, Jacques

2018-02-01

Familial hypercholesterolemia (FH) is the most frequent genetic disorder seen clinically and is characterized by increased LDL cholesterol (LDL-C) (>95th percentile), family history of increased LDL-C, premature atherosclerotic cardiovascular disease (ASCVD) in the patient or in first-degree relatives, presence of tendinous xanthomas or premature corneal arcus, or presence of a pathogenic mutation in the LDLR , PCSK9 , or APOB genes. A diagnosis of FH has important clinical implications with respect to lifelong risk of ASCVD and requirement for intensive pharmacological therapy. The concentration of baseline LDL-C (untreated) is essential for the diagnosis of FH but is often not available because the individual is already on statin therapy. To validate a new algorithm to impute baseline LDL-C, we examined 1297 patients. The baseline LDL-C was compared with the imputed baseline obtained within 18 months of the initiation of therapy. We compared the percent reduction in LDL-C on treatment from baseline with the published percent reductions. After eliminating individuals with missing data, nonstandard doses of statins, or medications other than statins or ezetimibe, we provide data on 951 patients. The mean ± SE baseline LDL-C was 243.0 (2.2) mg/dL [6.28 (0.06) mmol/L], and the mean ± SE imputed baseline LDL-C was 244.2 (2.6) mg/dL [6.31 (0.07) mmol/L] ( P = 0.48). There was no difference in response according to the patient's sex or in percent reduction between observed and expected for individual doses or types of statin or ezetimibe. We provide a validated estimation of baseline LDL-C for patients with FH that may help clinicians in making a diagnosis. © 2017 American Association for Clinical Chemistry.
Evaluation of Upland Rice Genotypes for Efficient Uptake of Nitrogen and Phosphorus

Energy Technology Data Exchange (ETDEWEB)

Zaharah, A. R.; Hanafi, M. M. [Universiti Putra Malaysia, Serdang, Selangor (Malaysia)

2013-11-15

Upland rice grown by subsistence farmers in the tropics and subtropics is known to produce very low yields due to it being planted on low fertility soils and under drought-prone conditions. Little information is available on upland rice cultivar differences in response to N and P fertilization in Asia, thus screening for P (PUE) and N use efficiency (NUE) of upland rice genotypes is a necessary first step. The objectives of the study were: (i) to identify upland rice genotypes with root characteristics favorable for efficient N and P uptake and utilization, (ii) to evaluate the selected genotypes for their grain yield, and (iii) to assess the variability of N and P use efficiency in upland rice genotypes grown under field conditions. Several laboratory, glasshouse and field experiments were carried out from 2007 to 2011 at Universiti Putra Malaysia to achieve the above objectives. Fifteen local and 15 upland rice genotypes from WARDA were identified to have long roots, and it was observed that some of the WARDA lines showed longer root length than the local landraces. This is a good trait since it is known that longer root length will enhance the absorption of easily mobile nutrients such as nitrate and potassium. Glasshouse and field evaluation of N use efficiency by these upland rice genotypes showed that high N is utilized (40-80% of applied N), with good grain yield, and P use efficiency is similar to other crops (4-8%). (author)
Phosphorus Use Efficiency by Brazilian Upland Rice Genotypes Evaluated by the {sup 32}P Dilution Technique

Energy Technology Data Exchange (ETDEWEB)

Franzini, V. I.; Mendes, F. L. [Brazilian Agricultural Research Corporation, EMBRAPA-Amazonia Oriental, Belem, PA (Brazil); Muraoka, T.; Da Silva, E. C. [Center for Nuclear Energy in Agriculture, University of Sao Paulo, Piracicaba, SP (Brazil); Adu-Gyamfi, J. J. [Soil and Water Management and Crop Nutrition Laboratory, International Atomic Energy Agency, Vienna (Austria)

2013-11-15

The objectives of this work were to identify the most efficient upland rice genotypes in phosphorus (P) utilization, and to verify if P from the seed affects the classification of upland rice genotypes on P uptake efficiency. The experiment was conducted in a greenhouse of the Center for Nuclear Energy in Agriculture (CENA/USP), Piracicaba, Sao Paulo, Brazil, using the {sup 32}P isotope technique, and plants were grown in pots with samples of dystrophic Typic Haplustox (Oxisol). The experimental design was completely randomized with four replications. The treatments consisted of 47 upland rice genotypes and two standard plant species, efficient or inefficient in P uptake. The results were assessed through correlation and cluster analysis (multivariate). The Carisma upland rice genotype was the most efficient in P uptake, and Caripuna was the most efficient on P utilization. The P derived from seed does not influence the identification of upland rice genotypes in P uptake efficiency. (author)
Phosphorus Use Efficiency by Brazilian Common Bean Genotypes Assessed by the {sup 32}P Dilution Technique

Energy Technology Data Exchange (ETDEWEB)

Franzini, V. I. [Brazilian Agricultural Research Corporation, EMBRAPA-Amazonia Oriental, Belem, PA (Brazil); Muraoka, T. [Center for Nuclear Energy in Agriculture, University of Sao Paulo, Piracicaba, SP (Brazil); Adu-Gyamfi, J. J [International Atomic Energy Agency, Vienna (Austria); Lynch, J. P. [Pennsylvania State University, University Park, PA (United States)

2013-11-15

The objectives of this work were to identify the most efficient common bean (Phaseolus vulgaris L.) genotypes on phosphorus (P) utilization, and verify if P from the seed affects the classification of common bean genotypes on P uptake efficiency when the {sup 32}P isotopic dilution technique is used. The experiment was conducted in a greenhouse, and plants were grown in pots with surface samples of a dystrophic Typic Haplustox. The treatments consisted of 50 common bean genotypes and two standard plant species, efficient or inefficient in P uptake. The results were assessed through correlation and cluster analysis (multivariate). Sangue de Boi, Rosinha, Thayu, Grafite, Horizonte, Pioneiro and Jalo Precoce common bean genotypes were the most efficient on P uptake, and Carioca 80, CNF 10, Perola, IAPAR 31, Roxao EEP, Apore, Pioneiro, Pontal, Timbo and Ruda were the most efficient in P utilization. The P derived from seed influences the identification of common bean genotypes for P uptake efficiency. (author)
Handling missing data in cluster randomized trials: A demonstration of multiple imputation with PAN through SAS

Directory of Open Access Journals (Sweden)

Jiangxiu Zhou

2014-09-01

Full Text Available The purpose of this study is to demonstrate a way of dealing with missing data in clustered randomized trials by doing multiple imputation (MI with the PAN package in R through SAS. The procedure for doing MI with PAN through SAS is demonstrated in detail in order for researchers to be able to use this procedure with their own data. An illustration of the technique with empirical data was also included. In this illustration thePAN results were compared with pairwise deletion and three types of MI: (1 Normal Model (NM-MI ignoring the cluster structure; (2 NM-MI with dummy-coded cluster variables (fixed cluster structure; and (3 a hybrid NM-MI which imputes half the time ignoring the cluster structure, and the other half including the dummy-coded cluster variables. The empirical analysis showed that using PAN and the other strategies produced comparable parameter estimates. However, the dummy-coded MI overestimated the intraclass correlation, whereas MI ignoring the cluster structure and the hybrid MI underestimated the intraclass correlation. When compared with PAN, the p-value and standard error for the treatment effect were higher with dummy-coded MI, and lower with MI ignoring the clusterstructure, the hybrid MI approach, and pairwise deletion. Previous studies have shown that NM-MI is not appropriate for handling missing data in clustered randomized trials. This approach, in addition to the pairwise deletion approach, leads to a biased intraclass correlation and faultystatistical conclusions. Imputation in clustered randomized trials should be performed with PAN. We have demonstrated an easy way for using PAN through SAS.
Evidence that breast cancer risk at the 2q35 locus is mediated through IGFBP5 regulation

DEFF Research Database (Denmark)

Ghoussaini, Maya; Edwards, Stacey L; Michailidou, Kyriaki

2014-01-01

GWAS have identified a breast cancer susceptibility locus on 2q35. Here we report the fine mapping of this locus using data from 101,943 subjects from 50 case-control studies. We genotype 276 SNPs using the 'iCOGS' genotyping array and impute genotypes for a further 1,284 using 1000 Genomes Proje...
Genomic Prediction and Association Mapping of Curd-Related Traits in Gene Bank Accessions of Cauliflower.

Science.gov (United States)

Thorwarth, Patrick; Yousef, Eltohamy A A; Schmid, Karl J

2018-02-02

Genetic resources are an important source of genetic variation for plant breeding. Genome-wide association studies (GWAS) and genomic prediction greatly facilitate the analysis and utilization of useful genetic diversity for improving complex phenotypic traits in crop plants. We explored the potential of GWAS and genomic prediction for improving curd-related traits in cauliflower ( Brassica oleracea var. botrytis ) by combining 174 randomly selected cauliflower gene bank accessions from two different gene banks. The collection was genotyped with genotyping-by-sequencing (GBS) and phenotyped for six curd-related traits at two locations and three growing seasons. A GWAS analysis based on 120,693 single-nucleotide polymorphisms identified a total of 24 significant associations for curd-related traits. The potential for genomic prediction was assessed with a genomic best linear unbiased prediction model and BayesB. Prediction abilities ranged from 0.10 to 0.66 for different traits and did not differ between prediction methods. Imputation of missing genotypes only slightly improved prediction ability. Our results demonstrate that GWAS and genomic prediction in combination with GBS and phenotyping of highly heritable traits can be used to identify useful quantitative trait loci and genotypes among genetically diverse gene bank material for subsequent utilization as genetic resources in cauliflower breeding. Copyright © 2018 Thorwarth et al.
Genomic Prediction and Association Mapping of Curd-Related Traits in Gene Bank Accessions of Cauliflower

Directory of Open Access Journals (Sweden)

Patrick Thorwarth

2018-02-01

Full Text Available Genetic resources are an important source of genetic variation for plant breeding. Genome-wide association studies (GWAS and genomic prediction greatly facilitate the analysis and utilization of useful genetic diversity for improving complex phenotypic traits in crop plants. We explored the potential of GWAS and genomic prediction for improving curd-related traits in cauliflower (Brassica oleracea var. botrytis by combining 174 randomly selected cauliflower gene bank accessions from two different gene banks. The collection was genotyped with genotyping-by-sequencing (GBS and phenotyped for six curd-related traits at two locations and three growing seasons. A GWAS analysis based on 120,693 single-nucleotide polymorphisms identified a total of 24 significant associations for curd-related traits. The potential for genomic prediction was assessed with a genomic best linear unbiased prediction model and BayesB. Prediction abilities ranged from 0.10 to 0.66 for different traits and did not differ between prediction methods. Imputation of missing genotypes only slightly improved prediction ability. Our results demonstrate that GWAS and genomic prediction in combination with GBS and phenotyping of highly heritable traits can be used to identify useful quantitative trait loci and genotypes among genetically diverse gene bank material for subsequent utilization as genetic resources in cauliflower breeding.

Imputing historical statistics, soils information, and other land-use data to crop area

Science.gov (United States)

Perry, C. R., Jr.; Willis, R. W.; Lautenschlager, L.

1982-01-01

In foreign crop condition monitoring, satellite acquired imagery is routinely used. To facilitate interpretation of this imagery, it is advantageous to have estimates of the crop types and their extent for small area units, i.e., grid cells on a map represent, at 60 deg latitude, an area nominally 25 by 25 nautical miles in size. The feasibility of imputing historical crop statistics, soils information, and other ancillary data to crop area for a province in Argentina is studied.
Phosphorus use efficiency in pima cotton (Gossypium barbadense L. genotypes

Directory of Open Access Journals (Sweden)

Elcio Santos

2015-06-01

Full Text Available In the Brazilian Cerrado, P deficiency restricts cotton production, which requires large amounts of phosphate fertilizer. To improve the yield of cotton crops, genotypes with high P use efficiency must be identified and used. The present study evaluated P uptake and use efficiency of different Gossypium barbadense L. genotypes grown in the Cerrado. The experiment was carried out in a greenhouse with a completely randomized design, 15 x 2 factorial treatment structure (15 genotypes x 2 P levels, and four replicates. The genotypes were MT 69, MT 70, MT 87, MT 91, MT 92, MT 94, MT 101, MT 102, MT 103, MT 105, MT 106, MT 110, MT 112, MT 124, and MT 125; P levels were sufficient (1000 mg pot-1, PS treatment or deficient (PD treatment. Dry matter (DM and P levels were determined in cotton plant parts and used to calculate plant P content and use efficiency. In general, DM and P content were higher in the PS than in the PD treatment, with the exception of root DM and total DM in some genotypes. Genotypes also differed in terms of P uptake and use capacity. In the PS treatment, genotypes MT 92 and MT 102 had the highest response to phosphate fertilization. Genotype MT 69 exhibited the most efficient P uptake in the PD treatment. Genotype MT 124 showed the best shoot physiological efficiency, apparent recovery efficiency, and utilization efficiency, whereas MT 110 exhibited the highest root physiological efficiency.
Statistical Analysis of a Class: Monte Carlo and Multiple Imputation Spreadsheet Methods for Estimation and Extrapolation

Science.gov (United States)

Fish, Laurel J.; Halcoussis, Dennis; Phillips, G. Michael

2017-01-01

The Monte Carlo method and related multiple imputation methods are traditionally used in math, physics and science to estimate and analyze data and are now becoming standard tools in analyzing business and financial problems. However, few sources explain the application of the Monte Carlo method for individuals and business professionals who are…
Analysis of the genetic diversity of four rabbit genotypes using ...

African Journals Online (AJOL)

Dr.Ola

2013-05-15

May 15, 2013 ... consumption and low cost, it has been widely utilized in genetics analysis in ... isozyme variation among the selected individuals within each rabbit genotype. ... with different embryo survival (Bolet and Theau-Clement, 1994).
Traffic speed data imputation method based on tensor completion.

Science.gov (United States)

Ran, Bin; Tan, Huachun; Feng, Jianshuai; Liu, Ying; Wang, Wuhong

2015-01-01

Traffic speed data plays a key role in Intelligent Transportation Systems (ITS); however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS). In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC), an efficient tensor completion method, is employed to estimate the missing traffic speed data. This proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. The proposed method is evaluated on Performance Measurement System (PeMS) database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches.
Personal genome testing in medical education: student experiences with genotyping in the classroom.

Science.gov (United States)

Vernez, Simone Lucia; Salari, Keyan; Ormond, Kelly E; Lee, Sandra Soo-Jin

2013-01-01

Direct-to-consumer (DTC) personal genotyping services are beginning to be adopted by educational institutions as pedagogical tools for learning about human genetics. However, there is little known about student reactions to such testing. This study investigated student experiences and attitudes towards DTC personal genome testing. Individual interviews were conducted with students who chose to undergo personal genotyping in the context of an elective genetics course. Ten medical and graduate students were interviewed before genotyping occurred, and at 2 weeks and 6 months after receiving their genotype results. Qualitative analysis of interview transcripts assessed the expectations and experiences of students who underwent personal genotyping, how they interpreted and applied their results; how the testing affected the quality of their learning during the course, and what were their perceived needs for support. Students stated that personal genotyping enhanced their engagement with the course content. Although students expressed skepticism over the clinical utility of some test results, they expressed significant enthusiasm immediately after receiving their personal genetic analysis, and were particularly interested in results such as drug response and carrier testing. However, few reported making behavioral changes or following up on specific results through a healthcare provider. Students did not report utilizing genetic counseling, despite feeling strongly that the 'general public' would need these services. In follow-up interviews, students exhibited poor recall on details of the consent and biobanking agreements, but expressed little regret over their decision to undergo genotyping. Students reported mining their raw genetic data, and conveyed a need for further consultation support in their exploration of genetic variants. Personal genotyping may improve students' self-reported motivation and engagement with course material. However, consultative support that
31 CFR 19.630 - May the Department of the Treasury impute conduct of one person to another?

Science.gov (United States)

2010-07-01

... 31 Money and Finance: Treasury 1 2010-07-01 2010-07-01 false May the Department of the Treasury impute conduct of one person to another? 19.630 Section 19.630 Money and Finance: Treasury Office of the Secretary of the Treasury GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles...
Semiautomatic imputation of activity travel diaries : use of global positioning system traces, prompted recall, and context-sensitive learning algorithms

NARCIS (Netherlands)

Moiseeva, A.; Jessurun, A.J.; Timmermans, H.J.P.; Stopher, P.

2016-01-01

Anastasia Moiseeva, Joran Jessurun and Harry Timmermans (2010), ‘Semiautomatic Imputation of Activity Travel Diaries: Use of Global Positioning System Traces, Prompted Recall, and Context-Sensitive Learning Algorithms’, Transportation Research Record: Journal of the Transportation Research Board,
Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

Science.gov (United States)

Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

2009-01-01

Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....
Prevalence and clinical utility of human papilloma virus genotyping in patients with cervical lesions.

Science.gov (United States)

Kaur, Parminder; Aggarwal, Aruna; Nagpal, Madhu; Oberoi, Loveena; Sharma, Swati

2014-08-01

Cervical cancer is the commonest cancer among Indian women. High-risk human papilloma virus (HPV) detection holds the potential to be used as a tool to identify women, at risk of subsequent development of cervical cancer. There is a pressing need to identify prevalence of asymptomatic cervical HPV infection in local population. In our study, we explored the prevalence of HPV genotypes and their distribution in women with cervical lesions. Scrape specimens were obtained from 100 women (study group) with cervical abnormalities. HPV was detected with amplicor HPV tests, and the individual genotypes in these specimens were identified by Hybribio Genoarray test kit. Fifty specimens were also collected from females with healthy cervix (control group). The present study also aimed to determine the status of HPV prevalence and its association with different sociodemographic factors. Out of the total number of 100 samples, 10 (10 %) women tested positive for HPV DNA. Among them, HPV 18 was observed in 6, HPV 16 in 2, HPV 52 and HPV 39 in one each. Fifty specimens collected from patients with healthy cervix were not infected with any of the HPV genotype. Our study generates data of HPV prevalence in patients with cervical lesions visiting tertiary care institute. The data generated will be useful for laying guidelines for mass screening of HPV detection, treatment, and prophylaxis.
Non-imputability, criminal dangerousness and curative safety measures: myths and realities

Directory of Open Access Journals (Sweden)

Frank Harbottle Quirós

2017-04-01

Full Text Available The curative safety measures are imposed in a criminal proceeding to the non-imputable people provided that through a prognosis it is concluded in an affirmative way about its criminal dangerousness. Although this statement seems very elementary, in judicial practice several myths remain in relation to these legal institutes whose versions may vary, to a greater or lesser extent, between the different countries of the world. In this context, the present article formulates ten myths based on the experience of Costa Rica and provides an explanation that seeks to weaken or knock them down, inviting the reader to reflect on them.
Molecular characterization and genetic diversity of different genotypes of Oryza sativa and Oryza glaberrima

Directory of Open Access Journals (Sweden)

Caijin Chen

2017-11-01

Conclusions: Genetic diversity studies revealed that 50 rice types were clustered into different subpopulations whereas three genotypes were admixtures. Molecular fingerprinting and 10 specific markers were obtained to identify the 53 rice genotypes. These results can facilitate the potential utilization of sibling species in rice breeding and molecular classification of O. sativa and O. glaberrima germplasms.
29 CFR 1471.630 - May the Federal Mediation and Conciliation Service impute conduct of one person to another?

Science.gov (United States)

2010-07-01

... 29 Labor 4 2010-07-01 2010-07-01 false May the Federal Mediation and Conciliation Service impute...) FEDERAL MEDIATION AND CONCILIATION SERVICE GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1471.630 May the Federal Mediation and...
Traffic Speed Data Imputation Method Based on Tensor Completion

Directory of Open Access Journals (Sweden)

Bin Ran

2015-01-01

Full Text Available Traffic speed data plays a key role in Intelligent Transportation Systems (ITS; however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS. In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor Completion (HaLRTC, an efficient tensor completion method, is employed to estimate the missing traffic speed data. This proposed method is able to recover missing entries from given entries, which may be noisy, considering severe fluctuation of traffic speed data compared with traffic volume. The proposed method is evaluated on Performance Measurement System (PeMS database, and the experimental results show the superiority of the proposed approach over state-of-the-art baseline approaches.
Leaf transpiration plays a role in phosphorus acquisition among a large set of chickpea genotypes.

Science.gov (United States)

Pang, Jiayin; Zhao, Hongxia; Bansal, Ruchi; Bohuon, Emilien; Lambers, Hans; Ryan, Megan H; Siddique, Kadambot H M

2018-01-09

Low availability of inorganic phosphorus (P) is considered a major constraint for crop productivity worldwide. A unique set of 266 chickpea (Cicer arietinum L.) genotypes, originating from 29 countries and with diverse genetic background, were used to study P-use efficiency. Plants were grown in pots containing sterilized river sand supplied with P at a rate of 10 μg P g -1 soil as FePO 4 , a poorly soluble form of P. The results showed large genotypic variation in plant growth, shoot P content, physiological P-use efficiency, and P-utilization efficiency in response to low P supply. Further investigation of a subset of 100 chickpea genotypes with contrasting growth performance showed significant differences in photosynthetic rate and photosynthetic P-use efficiency. A positive correlation was found between leaf P concentration and transpiration rate of the young fully expanded leaves. For the first time, our study has suggested a role of leaf transpiration in P acquisition, consistent with transpiration-driven mass flow in chickpea grown in low-P sandy soils. The identification of 6 genotypes with high plant growth, P-acquisition, and P-utilization efficiency suggests that the chickpea reference set can be used in breeding programmes to improve both P-acquisition and P-utilization efficiency under low-P conditions. © 2018 John Wiley & Sons Ltd.
Missing Value Imputation Based on Gaussian Mixture Model for the Internet of Things

OpenAIRE

Yan, Xiaobo; Xiong, Weiqing; Hu, Liang; Wang, Feng; Zhao, Kuo

2015-01-01

This paper addresses missing value imputation for the Internet of Things (IoT). Nowadays, the IoT has been used widely and commonly by a variety of domains, such as transportation and logistics domain and healthcare domain. However, missing values are very common in the IoT for a variety of reasons, which results in the fact that the experimental data are incomplete. As a result of this, some work, which is related to the data of the IoT, can’t be carried out normally. And it leads to the red...
Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans

Directory of Open Access Journals (Sweden)

Assaf Gottlieb

2017-11-01

Full Text Available Abstract Background Genome-wide association studies are useful for discovering genotype–phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into “gene level” effects. Methods Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression—on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. Results We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Conclusions Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort
Impute DC link (IDCL) cell based power converters and control thereof

Science.gov (United States)

Divan, Deepakraj M.; Prasai, Anish; Hernendez, Jorge; Moghe, Rohit; Iyer, Amrit; Kandula, Rajendra Prasad

2016-04-26

Power flow controllers based on Imputed DC Link (IDCL) cells are provided. The IDCL cell is a self-contained power electronic building block (PEBB). The IDCL cell may be stacked in series and parallel to achieve power flow control at higher voltage and current levels. Each IDCL cell may comprise a gate drive, a voltage sharing module, and a thermal management component in order to facilitate easy integration of the cell into a variety of applications. By providing direct AC conversion, the IDCL cell based AC/AC converters reduce device count, eliminate the use of electrolytic capacitors that have life and reliability issues, and improve system efficiency compared with similarly rated back-to-back inverter system.
Characterization and Selection of Phosphorus Deficiency Tolerant Rice Genotypes in Sri Lanka

Directory of Open Access Journals (Sweden)

Y.C. Aluwihare

2016-07-01

Full Text Available Phosphorus (P deficiency in soil is a major constrain for rice production. An important set of rice genotypes (landraces, old improved and new improved varieties were screened for P deficiency tolerance in two major cropping seasons of Sri Lanka, in 2012. The Ultisol soil, which was collected from a plot cultivated with rice without fertilizer application for past 40 years (P0 at the Rice Research and Development Institute (RRDI, Bathalagoda, Sri Lanka, was used as the potting medium for greenhouse trials. Two field trials were conducted in the same plots at RRDI. Both P0 and P30 (30 mg/kg P2O5 conditions were used in the two greenhouse trials. At the early vegetative (three weeks after transplanting, late vegetative (six weeks after transplanting and flowering stages, plant height and number of tillers per plant were recorded. At the flowering stage, shoots were harvested and shoot dry weight, shoot P concentration, shoot P uptake and P utilization efficiency were measured. All data were statistically analyzed using analysis of variance, regression and cluster procedures. The measured parameters were significantly different between P0 and P30 conditions (P < 0.05. Higher shoot dry weight was reported by the rice genotypes H4 and Marss under P0 conditions. The regression analysis between shoot dry weight and P utilization efficiency revealed that the studied rice genotypes could be categorized to three P deficiency tolerance classes. A total of 13 genotypes could be considered as highly tolerant and 4 genotypes as sensitive for P deficiency. These results could be used to select parental genotypes for breeding and genetic studies and also to select interesting varieties or landraces for organic rice production.
A suggested approach for imputation of missing dietary data for young children in daycare

OpenAIRE

Stevens, June; Ou, Fang-Shu; Truesdale, Kimberly P.; Zeng, Donglin; Vaughn, Amber E.; Pratt, Charlotte; Ward, Dianne S.

2015-01-01

Background: Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult.Objective: The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method.Design: Data were from children aged 2-5 years in the My Parenting...

Effect of calcium on the salt tolerance of different wheat (triticum aestivum l.) genotypes

International Nuclear Information System (INIS)

Arshad, M.; Saqib, M.; Akhtar, J.

2012-01-01

In saline soil conditions the availability and uptake of Ca/sup 2+/ is reduced that results in the loss of membrane integrity and other disorders associated with Ca/sup 2+/ deficiency in plants. A wheat genotype efficient in uptake and utilization of calcium under saline conditions may be better able to withstand saline conditions in the field. Very little information is available on wheat response to salinity and low Ca/sup 2+/ as screening of wheat genotypes has usually been done against salinity alone. The present study was designed to evaluate the performance of different wheat genotypes against salinity at low and adequate calcium supply. The experiment was conducted in hydroponics with four treatments including T1: non-saline with adequate Ca/sup 2+/, T2: non-saline with low Ca/sup 2+/ (level of calcium was 1/4 of the adequate level), T3: saline (125 mM NaCl) with adequate Ca/sup 2+/ and T4: saline with low calcium. All the physical growth parameters including shoot length, root length, and shoot and root fresh weights were decreased significantly due to salinity and low calcium alone as well as in combination. Reduction was more pronounced under the combined stress of salinity and low calcium and different genotypes differed significantly in different stress treatments for shoot and root fresh weight production. In saline treatment (T3), the genotypes 25-SAWSN-39 and 25-SAWSN-31 showed better growth performance and accumulated lower Na+ and higher Ca/sup 2+/ where as the genotypes 25-SAWSN-35 and 25-SAWSN-47 showed less growth and had less accumulation of Ca/sup 2+/ and high accumulation of Na+. In salinity + low calcium treatment the genotype 25-SAWSN-39 behaved as a tolerant genotype where as 25-SAWSN-31 behaved similar to the sensitive genotype and these differences were due to high accumulation of Ca/sup 2+/ in 25-SAWSN-39 and vice versa. This study shows that the salt tolerance of wheat genotypes differs with the availability and accumulation of calcium
Genotyping of Brucella species using clade specific SNPs

Directory of Open Access Journals (Sweden)

Foster Jeffrey T

2012-06-01

Full Text Available Abstract Background Brucellosis is a worldwide disease of mammals caused by Alphaproteobacteria in the genus Brucella. The genus is genetically monomorphic, requiring extensive genotyping to differentiate isolates. We utilized two different genotyping strategies to characterize isolates. First, we developed a microarray-based assay based on 1000 single nucleotide polymorphisms (SNPs that were identified from whole genome comparisons of two B. abortus isolates , one B. melitensis, and one B. suis. We then genotyped a diverse collection of 85 Brucella strains at these SNP loci and generated a phylogenetic tree of relationships. Second, we developed a selective primer-extension assay system using capillary electrophoresis that targeted 17 high value SNPs across 8 major branches of the phylogeny and determined their genotypes in a large collection ( n = 340 of diverse isolates. Results Our 1000 SNP microarray readily distinguished B. abortus, B. melitensis, and B. suis, differentiating B. melitensis and B. suis into two clades each. Brucella abortus was divided into four major clades. Our capillary-based SNP genotyping confirmed all major branches from the microarray assay and assigned all samples to defined lineages. Isolates from these lineages and closely related isolates, among the most commonly encountered lineages worldwide, can now be quickly and easily identified and genetically characterized. Conclusions We have identified clade-specific SNPs in Brucella that can be used for rapid assignment into major groups below the species level in the three main Brucella species. Our assays represent SNP genotyping approaches that can reliably determine the evolutionary relationships of bacterial isolates without the need for whole genome sequencing of all isolates.
GRIMP: A web- and grid-based tool for high-speed analysis of large-scale genome-wide association using imputed data.

NARCIS (Netherlands)

K. Estrada Gil (Karol); A. Abuseiris (Anis); F.G. Grosveld (Frank); A.G. Uitterlinden (André); T.A. Knoch (Tobias); F. Rivadeneira Ramirez (Fernando)

2009-01-01

textabstractThe current fast growth of genome-wide association studies (GWAS) combined with now common computationally expensive imputation requires the online access of large user groups to high-performance computing resources capable of analyzing rapidly and efficiently millions of genetic
Cost-utility analysis of ledipasvir/sofosbuvir for the treatment of genotype 1 chronic hepatitis C in Japan.

Science.gov (United States)

Igarashi, Ataru; Tang, Wentao; Guerra, Ines; Marié, Lucile; Cure, Sandrine; Lopresti, Michael

2017-01-01

Hepatitis C is the result of a ribonucleic acid (RNA) virus (hepatitis C virus; HCV). The Japan Society of Hepatology (JSH) estimated that 1.5-2 million people in Japan carry HCV. Six major HCV genotypes (GT) and a large number of subtypes have been described in the literature. In Japan, around 70% to 80% of people are infected with HCV genotype 1b. The progress of the disease primarily affects the liver and may lead to liver cirrhosis, hepatocellular carcinoma (HCC) and death. Sofosbuvir (SOF) is a nucleotide analogue NS5B inhibitor and ledipasvir (LDV) is an inhibitor of the HCV NS5A protein. They are combined in a single tablet regimen for the treatment of GT1 patients and resulted in sustained virological response (SVR) above 94% in large phase III trials. This analysis assesses the cost-utility of LDV/SOF in GT1 patients in Japan. A cohort of 10,000 patients was followed through a Markov model until they reached 100 years of age. GT1 treatment-naïve and experienced, non-cirrhotic and cirrhotic patients were studied separately. LDV/SOF was compared to several treatment regimens containing pegylated interferon (PEGIFN), telaprevir (TVR), simeprevir (SMV), daclatasvir (DCV), asunaprevir (ASV) and ribavirin (RBV). Discount rates of 2% were applied to costs and outcomes according to the Japanese guidelines. LDV/SOF was cost-effective against most comparators with incremental cost-effectiveness ratios (ICERs) below JPY 5,000,000. By applying a societal perspective, LDV/SOF was the dominant treatment strategy in all cases. Moreover, LDV/SOF reduced the number of cases of advanced liver disease. These results were robust to sensitivity analyses. LDV/SOF was cost-effective compared to most of the currently recommended treatments. Furthermore, LDV/SOF extends treatments to HCV-infected patients who are ineligible for interferon and RBV-based regimens. LDV/SOF thus has the potential to help reduce the burden of HCV in Japan.
Utility of the Abbott RealTime HCV Genotype Plus RUO assay used in combination with the Abbott RealTime HCV Genotype II assay.

Science.gov (United States)

He, Chao; Germer, Jeffrey J; Ptacek, Elizabeth R; Bommersbach, Carl E; Mitchell, P Shawn; Yao, Joseph D C

Hepatitis virus C (HCV) genotype (GT) determination and subtype (ST) differentiation (1a versus 1b) remain important for the selection of appropriate direct-acting antiviral (DAA) therapy. This study is a retrospective comparison of HCV GT and ST result distribution when using the Abbott RealTime HCV Genotype II assay (HCVGT II) alone and in combination with the Abbott RealTime HCV Genotype Plus RUO assay (HCVGT Plus) for routine testing of clinical serum specimens at a reference laboratory. HCVGT II results of specimens tested from June 2014 through January 2016 (period 1) were compared with combined results from HCVGT II and HCVGT Plus (HCVGT II/Plus) performed from January 2016 through January 2017 (period 2). A total of 44,127 and 25,361 specimens were tested during periods 1 and 2, respectively. Use of HCVGT II/Plus significantly reduced the frequency of GT 1 results without ST (0.4%) when compared to preliminary HCVGT II results during period 2 (5.3%; p < 0.01) and final HCVGT II results in period 1 (5.5%; p < 0.01). HCVGT II/Plus also resulted in GT 6 reactivity in 38 specimens with results of "HCV detected" (n = 17) or GT 1 (n = 21) following initial HCVGT II testing during period 2. When compared to the use of HCVGT II alone, HCVGT II/Plus significantly reduced the frequency of GT 1 without ST results observed in a large reference laboratory, while also enabling the identification of HCV GT 6. Copyright © 2018 Elsevier B.V. All rights reserved.
Mapping wildland fuels and forest structure for land management: a comparison of nearest neighbor imputation and other methods

Science.gov (United States)

Kenneth B. Pierce; Janet L. Ohmann; Michael C. Wimberly; Matthew J. Gregory; Jeremy S. Fried

2009-01-01

Land managers need consistent information about the geographic distribution of wildland fuels and forest structure over large areas to evaluate fire risk and plan fuel treatments. We compared spatial predictions for 12 fuel and forest structure variables across three regions in the western United States using gradient nearest neighbor (GNN) imputation, linear models (...
Using the Superpopulation Model for Imputations and Variance Computation in Survey Sampling

Directory of Open Access Journals (Sweden)

Petr Novák

2012-03-01

Full Text Available This study is aimed at variance computation techniques for estimates of population characteristics based on survey sampling and imputation. We use the superpopulation regression model, which means that the target variable values for each statistical unit are treated as random realizations of a linear regression model with weighted variance. We focus on regression models with one auxiliary variable and no intercept, which have many applications and straightforward interpretation in business statistics. Furthermore, we deal with caseswhere the estimates are not independent and thus the covariance must be computed. We also consider chained regression models with auxiliary variables as random variables instead of constants.
Sequence data and association statistics from 12,940 type 2 diabetes cases and controls

DEFF Research Database (Denmark)

Jason, Flannick; Fuchsberger, Christian; Mahajan, Anubha

2017-01-01

variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced...... individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics...... from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D....
Using beta coefficients to impute missing correlations in meta-analysis research: Reasons for caution.

Science.gov (United States)

Roth, Philip L; Le, Huy; Oh, In-Sue; Van Iddekinge, Chad H; Bobko, Philip

2018-06-01

Meta-analysis has become a well-accepted method for synthesizing empirical research about a given phenomenon. Many meta-analyses focus on synthesizing correlations across primary studies, but some primary studies do not report correlations. Peterson and Brown (2005) suggested that researchers could use standardized regression weights (i.e., beta coefficients) to impute missing correlations. Indeed, their beta estimation procedures (BEPs) have been used in meta-analyses in a wide variety of fields. In this study, the authors evaluated the accuracy of BEPs in meta-analysis. We first examined how use of BEPs might affect results from a published meta-analysis. We then developed a series of Monte Carlo simulations that systematically compared the use of existing correlations (that were not missing) to data sets that incorporated BEPs (that impute missing correlations from corresponding beta coefficients). These simulations estimated ρ̄ (mean population correlation) and SDρ (true standard deviation) across a variety of meta-analytic conditions. Results from both the existing meta-analysis and the Monte Carlo simulations revealed that BEPs were associated with potentially large biases when estimating ρ̄ and even larger biases when estimating SDρ. Using only existing correlations often substantially outperformed use of BEPs and virtually never performed worse than BEPs. Overall, the authors urge a return to the standard practice of using only existing correlations in meta-analysis. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
The African Genome Variation Project shapes medical genetics in Africa

Science.gov (United States)

Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.

2015-01-01

Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
The African Genome Variation Project shapes medical genetics in Africa.

Science.gov (United States)

Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O; Choudhury, Ananyo; Ritchie, Graham R S; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N; Young, Elizabeth H; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S

2015-01-15

Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
The association of complex liver disorders with HBV genotypes prevalent in Pakistan

Directory of Open Access Journals (Sweden)

Qureshi Huma

2007-11-01

Full Text Available Abstract Background Genotyping of HBV is generally used for determining the epidemiological relationship between various virus strains and origin of infection mostly in research studies. The utility of genotyping for clinical applications is only beginning to gain importance. Whether HBV genotyping will constitute part of the clinical evaluation of Hepatitis B patients depends largely on the availability of the relevance of the evidence based information. Since Pakistan has a HBV genotype distribution which has been considered less virulent as investigated by earlier studies from south East Asian countries, a study on correlation between HBV genotypes and risk of progression to further complex hepatic infection was much needed Methods A total of 295 patients with HBsAg positive were selected from the Pakistan Medical Research Council's (PMRC out patient clinics. Two hundred and twenty six (77% were males, sixty nine (23% were females (M to F ratio 3.3:1. Results Out of 295 patients, 156 (53.2% had Acute(CAH, 71 (24.2% were HBV Carriers, 54 (18.4% had Chronic liver disease (CLD Hepatitis. 14 (4.7% were Cirrhosis and HCC patients. Genotype D was the most prevalent genotype in all categories of HBV patients, Acute (108, Chronic (39, and Carrier (53. Cirrhosis/HCC (7 were HBV/D positive. Genotype A was the second most prevalent with 28 (13% in acute cases, 12 (22.2% in chronics, 14 (19.7% in carriers and 5 (41.7 in Cirrhosis/HCC patients. Mixed genotype (A/D was found in 20 (12.8% of Acute patients, 3 (5.6% of Chronic and 4 (5.6% of carriers, none in case of severe liver conditions. Conclusion Mixed HBV genotypes A, D and A/D combination were present in all categories of patients except that no A/D combination was detected in severe conditions. Genotype D was the dominant genotype. However, genotype A was found to be more strongly associated with severe liver disease. Mixed genotype (A/D did not significantly appear to influence the clinical outcome.
Selection and Evaluation of Maize Genotypes Tolerance to Low Phosphorus Soils

Energy Technology Data Exchange (ETDEWEB)

Yang, J. C.; Jiang, H. M.; Zhang, J. F.; Li, L. L.; Li, G. H. [Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing (China)

2013-11-15

Maize species differ in their ability to take up phosphorus (P) from the soil, and these differences are attributed to the morphology and physiology of plants relative to their germplasm base. An effective method of increasing P efficiency in maize is to select and evaluate genotypes that can produce a high yield under P deficient conditions. In this study, 116 maize inbred lines with various genetic backgrounds collected from several Agricultural Universities and Institutes in China were evaluated in a field experiment to identify genotypic differences in P efficiency in 2007. Overall, 15 maize inbred lines were selected from the 116 inbred lines during the 5-year field experimental period based on their 100-grain weight in P-deficient soil at maturity, when compared to the characteristics exhibited in P-sufficient soil. All of the selected lines were evaluated in field experiments from 2008 to 2010 for their tolerance to low-P at the seedling and maturity stages. Inhibition (%) was used and defined as the parameter measured under P limitation compared to the parameters measured under P sufficiency to evaluate the genotypic variation in tolerance. Inhibition of root length, root surface area, volume, root: shoot ratio and P uptake efficiency could be used as indices to assess the genotypic tolerance to P limitation. Low-P tolerant genotypes could uptake more P and accumulate more dry matter at the seedling stage. A strong relationship between the total biomass and root length was exhibited. In order to understand the mechanisms of the genotypic tolerance to low-P soil to utilize P from the sparing soluble P forms, 5 maize genotypes selected out of the 15 maize inbred lines, according to the four quadrant distribution, was used as the criteria in a {sup 32}P isotope tracer experiment to follow the recovery of {sup 32}P in soil P fractions. The {sup 32}P tracer results showed a higher rate for water- soluble P transformation to slowly available P in P deficient soil
Genotype X/C recombinant (putative genotype I) of hepatitis B virus is rare in Hanoi, Vietnam--genotypes B4 and C1 predominate.

Science.gov (United States)

Phung, Thi Bich Thuy; Alestig, Erik; Nguyen, Thanh Liem; Hannoun, Charles; Lindh, Magnus

2010-08-01

There are eight known genotypes of hepatitis B virus, A-H, and several subgenotypes, with rather well-defined geographic distributions. HBV genotypes were evaluated in 153 serum samples from Hanoi, Vietnam. Of the 87 samples that could be genotyped, genotype B was found in 67 (77%) and genotype C in 19 (22%). All genotype C strains were of subgenotype C1, and the majority of genotype B strains were B4, while a few were B2. The genotype X/C recombinant strain, identified previously in Swedish patients of indigenous Vietnamese origin, was found in one sample. This variant, proposed to be classified as genotype I, has been found recently also by others in Vietnam and Laos. The current study indicates that the genotype X/C recombinant may represent approximately 1% of the HBV strains circulating in Vietnam. (c) 2010 Wiley-Liss, Inc.
Existence of various human parvovirus B19 genotypes in Chinese plasma pools: genotype 1, genotype 3, putative intergenotypic recombinant variants and new genotypes.

Science.gov (United States)

Jia, Junting; Ma, Yuyuan; Zhao, Xiong; Huangfu, Chaoji; Zhong, Yadi; Fang, Chi; Fan, Rui; Lv, Maomin; Zhang, Jingang

2016-09-17

Human parvovirus B19 (B19V) is a frequent contaminant of blood and plasma-derived medicinal products. Three distinct genotypes of B19V have been identified. The distribution of the three B19V genotypes has been investigated in various regions or countries. However, in China, data on the existence of different B19V genotypes are limited. One hundred and eighteen B19V-DNA positive source plasma pool samples collected from three Chinese blood products manufacturers were analyzed. The subgenomic NS1/VP1u region junction of B19V was amplified by nested PCR. These amplified products were then cloned and subsequently sequenced. For genotyping, their phylogenetic inferences were constructed based on the NS1/VP1-unique region. Then putative recombination events were analyzed and identified. Phylogenetic analysis of 118 B19V sequences attributed 61.86 % to genotype 1a, 10.17 % to genotype 1b, and 17.80 % to genotype 3b. All the genotype 3b sequences obtained in this study grouped as a specific, closely related cluster with B19V strain D91.1. Four 1a/3b recombinants and 5 new atypical B19V variants with no recombination events were identified. There were at least 3 subtypes (1a, 1b and 3b) of B19V circulating in China. Furthermore, putative B19V 1a/3b recombinants and unclassified strains were identified as well. Such recombinant and unclassified strains may contribute to the genetic diversity of B19V and consequently complicate the B19V infection diagnosis and NAT screening. Further studies will be required to elucidate the biological significance of the recombinant and unclassified strains.
Identification of polymorphic inversions from genotypes

Directory of Open Access Journals (Sweden)

Cáceres Alejandro

2012-02-01

Full Text Available Abstract Background Polymorphic inversions are a source of genetic variability with a direct impact on recombination frequencies. Given the difficulty of their experimental study, computational methods have been developed to infer their existence in a large number of individuals using genome-wide data of nucleotide variation. Methods based on haplotype tagging of known inversions attempt to classify individuals as having a normal or inverted allele. Other methods that measure differences between linkage disequilibrium attempt to identify regions with inversions but unable to classify subjects accurately, an essential requirement for association studies. Results We present a novel method to both identify polymorphic inversions from genome-wide genotype data and classify individuals as containing a normal or inverted allele. Our method, a generalization of a published method for haplotype data 1, utilizes linkage between groups of SNPs to partition a set of individuals into normal and inverted subpopulations. We employ a sliding window scan to identify regions likely to have an inversion, and accumulation of evidence from neighboring SNPs is used to accurately determine the inversion status of each subject. Further, our approach detects inversions directly from genotype data, thus increasing its usability to current genome-wide association studies (GWAS. Conclusions We demonstrate the accuracy of our method to detect inversions and classify individuals on principled-simulated genotypes, produced by the evolution of an inversion event within a coalescent model 2. We applied our method to real genotype data from HapMap Phase III to characterize the inversion status of two known inversions within the regions 17q21 and 8p23 across 1184 individuals. Finally, we scan the full genomes of the European Origin (CEU and Yoruba (YRI HapMap samples. We find population-based evidence for 9 out of 15 well-established autosomic inversions, and for 52 regions
Hepatitis B virus genotypes circulating in Brazil: molecular characterization of genotype F isolates

Directory of Open Access Journals (Sweden)

Virgolino Helaine A

2007-11-01

Full Text Available Abstract Background Hepatitis B virus (HBV isolates have been classified in eight genotypes, A to H, which exhibit distinct geographical distributions. Genotypes A, D and F are predominant in Brazil, a country formed by a miscegenated population, where the proportion of individuals from Caucasian, Amerindian and African origins varies by region. Genotype F, which is the most divergent, is considered indigenous to the Americas. A systematic molecular characterization of HBV isolates from different parts of the world would be invaluable in establishing HBV evolutionary origins and dispersion patterns. A large-scale study is needed to map the region-by-region distribution of the HBV genotypes in Brazil. Results Genotyping by PCR-RFLP of 303 HBV isolates from HBsAg-positive blood donors showed that at least two of the three genotypes, A, D, and F, co-circulate in each of the five geographic regions of Brazil. No other genotypes were identified. Overall, genotype A was most prevalent (48.5%, and most of these isolates were classified as subgenotype A1 (138/153; 90.2%. Genotype D was the most common genotype in the South (84.2% and Central (47.6% regions. The prevalence of genotype F was low (13% countrywide. Nucleotide sequencing of the S gene and a phylogenetic analysis of 32 HBV genotype F isolates showed that a great majority (28/32; 87.5% belonged to subgenotype F2, cluster II. The deduced serotype of 31 of 32 F isolates was adw4. The remaining isolate showed a leucine-to-isoleucine substitution at position 127. Conclusion The presence of genotypes A, D and F, and the absence of other genotypes in a large cohort of HBV infected individuals may reflect the ethnic origins of the Brazilian population. The high prevalence of isolates from subgenotype A1 (of African origin indicates that the African influx during the colonial slavery period had a major impact on the circulation of HBV genotype A currently found in Brazil. Although most genotype F
Nonparametric autocovariance estimation from censored time series by Gaussian imputation.

Science.gov (United States)

Park, Jung Wook; Genton, Marc G; Ghosh, Sujit K

2009-02-01

One of the most frequently used methods to model the autocovariance function of a second-order stationary time series is to use the parametric framework of autoregressive and moving average models developed by Box and Jenkins. However, such parametric models, though very flexible, may not always be adequate to model autocovariance functions with sharp changes. Furthermore, if the data do not follow the parametric model and are censored at a certain value, the estimation results may not be reliable. We develop a Gaussian imputation method to estimate an autocovariance structure via nonparametric estimation of the autocovariance function in order to address both censoring and incorrect model specification. We demonstrate the effectiveness of the technique in terms of bias and efficiency with simulations under various rates of censoring and underlying models. We describe its application to a time series of silicon concentrations in the Arctic.
Avoid Filling Swiss Cheese with Whipped Cream; Imputation Techniques and Evaluation Procedures for Cross-Country Time Series

OpenAIRE

Michael Weber; Michaela Denk

2011-01-01

International organizations collect data from national authorities to create multivariate cross-sectional time series for their analyses. As data from countries with not yet well-established statistical systems may be incomplete, the bridging of data gaps is a crucial challenge. This paper investigates data structures and missing data patterns in the cross-sectional time series framework, reviews missing value imputation techniques used for micro data in official statistics, and discusses the...
On Matrix Sampling and Imputation of Context Questionnaires with Implications for the Generation of Plausible Values in Large-Scale Assessments

Science.gov (United States)

Kaplan, David; Su, Dan

2016-01-01

This article presents findings on the consequences of matrix sampling of context questionnaires for the generation of plausible values in large-scale assessments. Three studies are conducted. Study 1 uses data from PISA 2012 to examine several different forms of missing data imputation within the chained equations framework: predictive mean…

21 CFR 1404.630 - May the Office of National Drug Control Policy impute conduct of one person to another?

Science.gov (United States)

2010-04-01

... 21 Food and Drugs 9 2010-04-01 2010-04-01 false May the Office of National Drug Control Policy impute conduct of one person to another? 1404.630 Section 1404.630 Food and Drugs OFFICE OF NATIONAL DRUG CONTROL POLICY GOVERNMENTWIDE DEBARMENT AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 1404.630...
A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data

NARCIS (Netherlands)

Y.J. Kim (Young Jin); J. Lee (Juyoung); B.-J. Kim (Bong-Jo); T. Park (Taesung); G.R. Abecasis (Gonçalo); M.A.A. De Almeida (Marcio); D. Altshuler (David); J.L. Asimit (Jennifer L.); G. Atzmon (Gil); M. Barber (Mathew); A. Barzilai (Ari); N.L. Beer (Nicola L.); G.I. Bell (Graeme I.); J. Below (Jennifer); T. Blackwell (Tom); J. Blangero (John); M. Boehnke (Michael); D.W. Bowden (Donald W.); N.P. Burtt (Noël); J.C. Chambers (John); H. Chen (Han); P. Chen (Ping); P.S. Chines (Peter); S. Choi (Sungkyoung); C. Churchhouse (Claire); P. Cingolani (Pablo); B.K. Cornes (Belinda); N.J. Cox (Nancy); A.G. Day-Williams (Aaron); A. Duggirala (Aparna); J. Dupuis (Josée); T. Dyer (Thomas); S. Feng (Shuang); J. Fernandez-Tajes (Juan); T. Ferreira (Teresa); T.E. Fingerlin (Tasha E.); J. Flannick (Jason); J.C. Florez (Jose); P. Fontanillas (Pierre); T.M. Frayling (Timothy); C. Fuchsberger (Christian); E. Gamazon (Eric); K. Gaulton (Kyle); S. Ghosh (Saurabh); B. Glaser (Benjamin); A.L. Gloyn (Anna); R.L. Grossman (Robert L.); J. Grundstad (Jason); C. Hanis (Craig); A. Heath (Allison); H. Highland (Heather); M. Horikoshi (Momoko); I.-S. Huh (Ik-Soo); J.R. Huyghe (Jeroen R.); M.K. Ikram (Kamran); K.A. Jablonski (Kathleen); Y. Jun (Yang); N. Kato (Norihiro); J. Kim (Jayoun); Y.J. Kim (Young Jin); B.-J. Kim (Bong-Jo); J. Lee (Juyoung); C.R. King (C. Ryan); J.S. Kooner (Jaspal S.); M.-S. Kwon (Min-Seok); H.K. Im (Hae Kyung); M. Laakso (Markku); K.K.-Y. Lam (Kevin Koi-Yau); J. Lee (Jaehoon); S. Lee (Selyeong); S. Lee (Sungyoung); D.M. Lehman (Donna M.); H. Li (Heng); C.M. Lindgren (Cecilia); X. Liu (Xuanyao); O.E. Livne (Oren E.); A.E. Locke (Adam E.); A. Mahajan (Anubha); J.B. Maller (Julian B.); A.K. Manning (Alisa K.); T.J. Maxwell (Taylor J.); A. Mazoure (Alexander); M.I. McCarthy (Mark); J.B. Meigs (James B.); B. Min (Byungju); K.L. Mohlke (Karen); A.P. Morris (Andrew); S. Musani (Solomon); Y. Nagai (Yoshihiko); M.C.Y. Ng (Maggie C.Y.); D. Nicolae (Dan); S. Oh (Sohee); N.D. Palmer (Nicholette); T. Park (Taesung); T.I. Pollin (Toni I.); I. Prokopenko (Inga); D. Reich (David); M.A. Rivas (Manuel); L.J. Scott (Laura); M. Seielstad (Mark); Y.S. Cho (Yoon Shin); X. Sim (Xueling); R. Sladek (Rob); P. Smith (Philip); I. Tachmazidou (Ioanna); E.S. Tai (Shyong); Y.Y. Teo (Yik Ying); T.M. Teslovich (Tanya M.); J. Torres (Jason); V. Trubetskoy (Vasily); S.M. Willems (Sara); A.L. Williams (Amy L.); J.G. Wilson (James); S. Wiltshire (Steven); S. Won (Sungho); A.R. Wood (Andrew); W. Xu (Wang); J. Yoon (Joon); M. Zawistowski (Matthew); E. Zeggini (Eleftheria); W. Zhang (Weihua); S. Zöllner (Sebastian)

2015-01-01

textabstractBackground: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the
Common genotypes of hepatitis B virus

International Nuclear Information System (INIS)

Idrees, M.; Khan, S.; Riazuddin, S.

2004-01-01

Objective: To find out the frequency of common genotypes of hepatitis-B virus (HBV). Subjects and Methods: HBV genotypes were determined in 112 HBV DNA positive sera by a simple and precise molecular genotyping system base on PCR using type-specific primers for the determination of genotypes of HBV A through H. Results: Four genotypes (A,B,C and D) out of total eight reported genotypes so far were identified. Genotypes A, B and C were predominant. HBV genotype C was the most predominant in this collection, appearing in 46 samples (41.7%). However, the genotypes of a total of 5 (4.46%) samples could not be determined with the present genotyping system. Mixed genotypes were seen in 8(7.14% HBV) isolates. Five of these were infected with genotypes A/D whereas two were with genotypes C/D. One patient was infected with 4 genotypes (A/B/C/D). Genotype A (68%) was predominant in Sindh genotype C was most predominant in North West Frontier Province (NWFP) (68.96) whereas genotype C and B were dominant in Punjab (39.65% and 25.86% respectively). Conclusion: All the four common genotypes of HBV found worldwide (A,B,C and D) were isolated. Genotype C is the predominant Genotypes B and C are predominant in Punjab and N.W.F.P. whereas genotype A is predominant in Sindh. (author)
Impact of inter-genotypic recombination and probe cross-reactivity on the performance of the Abbott RealTime HCV Genotype II assay for hepatitis C genotyping.

Science.gov (United States)

Sridhar, Siddharth; Yip, Cyril C Y; Chan, Jasper F W; To, Kelvin K W; Cheng, Vincent C C; Yuen, Kwok-Yung

2018-05-01

The Abbott RealTime HCV Genotype II assay (Abbott-RT-HCV assay) is a real-time PCR based genotyping method for hepatitis C virus (HCV). This study measured the impact of inter-genotypic recombination and probe cross-reactivity on the performance of the Abbott-RT-HCV assay. 517 samples were genotyped using the Abbott-RT-HCV assay over a one-year period, 34 (6.6%) were identified as HCV genotype 1 without further subtype designation raising the possibility of inaccurate genotyping. These samples were subjected to confirmatory sequencing. 27 of these 34 (79%) samples were genotype 1b while five (15%) were genotype 6. One HCV isolate was an inter-genotypic 1a/4o recombinant. This is a novel natural HCV recombinant that has never been reported. Inter-genotypic recombination and probe cross-reactivity can affect the accuracy of the Abbott-RT-HCV assay, both of which have significant implications on antiviral regimen choice. Confirmatory sequencing of ambiguous results is crucial for accurate genotyping. Copyright © 2018 Elsevier Inc. All rights reserved.
Evaluation of the Abbott Real Time HCV genotype II assay for Hepatitis C virus genotyping.

Science.gov (United States)

Sariguzel, Fatma Mutlu; Berk, Elife; Gokahmetoglu, Selma; Ercal, Baris Derya; Celik, Ilhami

2015-01-01

The determination of HCV genotypes and subtypes is very important for the selection of antiviral therapy and epidemiological studies. The aim of this study was to evaluate the performance of Abbott Real Time HCV Genotype II assay in HCV genotyping of HCV infected patients in Kayseri, Turkey. One hundred patients with chronic hepatitis C admitted to our hospital were evaluated between June 2012 and December 2012, HCV RNA levels were determined by the COBAS® AmpliPrep/COBAS® TaqMan® 48 HCV test. HCV genotyping was investigated by the Abbott Real Time HCV Genotype II assay. With the exception of genotype 1, subtypes of HCV genotypes could not be determined by Abbott assay. Sequencing analysis was used as the reference method. Genotypes 1, 2, 3 and 4 were observed in 70, 4, 2 and 24 of the 100 patients, respectively, by two methods. The concordance between the two systems to determine HCV major genotypes was 100%. Of 70 patients with genotype 1, 66 showed infection with subtype 1b and 4 with subtype 1a by Abbott Real Time HCV Genotype II assay. Using sequence analysis, 61 showed infection with subtype 1b and 9 with subtype 1a. In determining of HCV genotype 1 subtypes, the difference between the two methods was not statistically significant (P>0.05). HCV genotype 4 and 3 samples were found to be subtype 4d and 3a, respectively, by sequence analysis. There were four patients with genotype 2. Sequence analysis revealed that two of these patients had type 2a and the other two had type 2b. The Abbott Real Time HCV Genotype II assay yielded results consistent with sequence analysis. However, further optimization of the Abbott Real Time HCV Genotype II assay for subtype identification of HCV is required.
Comparação de métodos de imputação única e múltipla usando como exemplo um modelo de risco para mortalidade cirúrgica Comparison of simple and multiple imputation methods using a risk model for surgical mortality as example

Directory of Open Access Journals (Sweden)

Luciana Neves Nunes

2010-12-01

Full Text Available INTRODUÇÃO: A perda de informações é um problema frequente em estudos realizados na área da Saúde. Na literatura essa perda é chamada de missing data ou dados faltantes. Através da imputação dos dados faltantes são criados conjuntos de dados artificialmente completos que podem ser analisados por técnicas estatísticas tradicionais. O objetivo desse artigo foi comparar, em um exemplo baseado em dados reais, a utilização de três técnicas de imputações diferentes. MÉTODO: Os dados utilizados referem-se a um estudo de desenvolvimento de modelo de risco cirúrgico, sendo que o tamanho da amostra foi de 450 pacientes. Os métodos de imputação empregados foram duas imputações únicas e uma imputação múltipla (IM, e a suposição sobre o mecanismo de não-resposta foi MAR (Missing at Random. RESULTADOS: A variável com dados faltantes foi a albumina sérica, com 27,1% de perda. Os modelos obtidos pelas imputações únicas foram semelhantes entre si, mas diferentes dos obtidos com os dados imputados pela IM quanto à inclusão de variáveis nos modelos. CONCLUSÕES: Os resultados indicam que faz diferença levar em conta a relação da albumina com outras variáveis observadas, pois foram obtidos modelos diferentes nas imputações única e múltipla. A imputação única subestima a variabilidade, gerando intervalos de confiança mais estreitos. É importante se considerar o uso de métodos de imputação quando há dados faltantes, especialmente a IM que leva em conta a variabilidade entre imputações para as estimativas do modelo.INTRODUCTION: It is common for studies in health to face problems with missing data. Through imputation, complete data sets are built artificially and can be analyzed by traditional statistical analysis. The objective of this paper is to compare three types of imputation based on real data. METHODS: The data used came from a study on the development of risk models for surgical mortality. The
A cost utility analysis of simeprevir used with peginterferon + ribavirin in the management of genotype 1 hepatitis C virus infection, from the perspective of the UK National Health Service.

Science.gov (United States)

Westerhout, Kirsten; Treur, Maarten; Mehnert, Angelika; Pascoe, Katie; Ladha, Imran; Belsey, Jonathan

2015-01-01

Triple therapy using a protease inhibitor (PI) with peginterferon and ribavirin (PR) is increasingly used in patients with chronic hepatitis C virus (HCV) infection. The most recently introduced PI, simeprevir (SMV), offers high levels of viral eradication combined with a reduced overall duration of therapy. The objective of this study was to compare the cost-effectiveness of SMV + PR vs PR alone or in combination with telaprevir (TVR) or boceprevir (BOC) in patients infected with genotype 1 HCV Method: A cost-utility model was constructed, incorporating two phases, capturing the efficacy of therapy in an initial treatment phase, followed by a long-term post-treatment Markov phase, capturing lifetime outcomes according to whether a sustained viral response (SVR) had been achieved on treatment. Dosage regimens were based on the EMA approved label for each treatment. SVR estimates and adverse event rates were derived from a mixed treatment comparison. Baseline characteristics were drawn from an analysis of a UK HCV data-set and clinician opinion. Health state transition probabilities, utilities, and health state costs were drawn from previously published economic analyses. The model considered direct health costs only, and the perspective was that of the UK National Health Service. The model yielded an ICER for SMV + PR vs PR alone of £9725/QALY for treatment-naïve and £7819/QALY for treatment-experienced. Benefit was driven by increased likelihood of achieving SVR, with consequent long-term utility gains. SMV + PR dominated TVR + PR and BOC + PR in both patient groups. This principally reflected the QALY benefit of an increased likelihood of SVR with SMV, combined with lower overall drug costs, due to reduced mean treatment duration. Compared to other currently licensed treatment options, SMV + PR represents a cost effective treatment option for patients with chronic genotype 1 HCV infection.
Temperature Switch PCR (TSP: Robust assay design for reliable amplification and genotyping of SNPs

Directory of Open Access Journals (Sweden)

Mather Diane E

2009-12-01

Full Text Available Abstract Background Many research and diagnostic applications rely upon the assay of individual single nucleotide polymorphisms (SNPs. Thus, methods to improve the speed and efficiency for single-marker SNP genotyping are highly desirable. Here, we describe the method of temperature-switch PCR (TSP, a biphasic four-primer PCR system with a universal primer design that permits amplification of the target locus in the first phase of thermal cycling before switching to the detection of the alleles. TSP can simplify assay design for a range of commonly used single-marker SNP genotyping methods, and reduce the requirement for individual assay optimization and operator expertise in the deployment of SNP assays. Results We demonstrate the utility of TSP for the rapid construction of robust and convenient endpoint SNP genotyping assays based on allele-specific PCR and high resolution melt analysis by generating a total of 11,232 data points. The TSP assays were performed under standardised reaction conditions, requiring minimal optimization of individual assays. High genotyping accuracy was verified by 100% concordance of TSP genotypes in a blinded study with an independent genotyping method. Conclusion Theoretically, TSP can be directly incorporated into the design of assays for most current single-marker SNP genotyping methods. TSP provides several technological advances for single-marker SNP genotyping including simplified assay design and development, increased assay specificity and genotyping accuracy, and opportunities for assay automation. By reducing the requirement for operator expertise, TSP provides opportunities to deploy a wider range of single-marker SNP genotyping methods in the laboratory. TSP has broad applications and can be deployed in any animal and plant species.
Soil and applied sulphur utilization by sunflower grown on vertisol under rainfed conditions

International Nuclear Information System (INIS)

Sreemannarayana, B.; Sreenivasa Raju, A.

1993-01-01

In a field experiment, conducted with sunflower genotypes viz., Morden, APSH-11, and EC 68414 grown on a local black clay loam soil, fertilizer sulphur (labelled with 35 S) was applied at the rate of 0, 20, 40 and 60 kg S/ha through gypsum and ammonium sulphate. Among the sunflower genotypes, EC 68414 utilized maximum sulphur from the sources at any given growth stage i.e., star, bud, flowering and maturity. Sulphur applied through ammonium sulphate resulted in highest S utilization by all the genotypes at all the stages of growth. Though, sulphur uptake showed an increase, the S utilization decreased with increase in levels of S. The S uptake was highest at 60 kg S/ha level applied through any of the sources. The soil S uptake was higher than fertilizer S uptake at any given stage of crop. Maximum yields were recorded at 40 kg S/ha level signifying that this dose is optimum for sunflower grown on black clay loam soils. (author). 17 refs., 3 tabs
An XML-based interchange format for genotype-phenotype data.

Science.gov (United States)

Whirl-Carrillo, M; Woon, M; Thorn, C F; Klein, T E; Altman, R B

2008-02-01

Recent advances in high-throughput genotyping and phenotyping have accelerated the creation of pharmacogenomic data. Consequently, the community requires standard formats to exchange large amounts of diverse information. To facilitate the transfer of pharmacogenomics data between databases and analysis packages, we have created a standard XML (eXtensible Markup Language) schema that describes both genotype and phenotype data as well as associated metadata. The schema accommodates information regarding genes, drugs, diseases, experimental methods, genomic/RNA/protein sequences, subjects, subject groups, and literature. The Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB; www.pharmgkb.org) has used this XML schema for more than 5 years to accept and process submissions containing more than 1,814,139 SNPs on 20,797 subjects using 8,975 assays. Although developed in the context of pharmacogenomics, the schema is of general utility for exchange of genotype and phenotype data. We have written syntactic and semantic validators to check documents using this format. The schema and code for validation is available to the community at http://www.pharmgkb.org/schema/index.html (last accessed: 8 October 2007). (c) 2007 Wiley-Liss, Inc.
Breed traceability of buffalo meat using microsatellite genotyping technique.

Science.gov (United States)

Kannur, Bheemashankar H; Fairoze, Md Nadeem; Girish, P S; Karabasanavar, Nagappa; Rudresh, B H

2017-02-01

Although buffalo has emerged as a major meat producing animal in Asia, major research on breed traceability has so far been focused on cattle (beef). This research gap on buffalo breed traceability has impelled development and validation of buffalo breed traceability using a set of eight microsatellite (STR) markers in seven Indian buffalo breeds (Bhadawari, Jaffaarabadi, Murrah, Mehsana, Nagpuri, Pandharpuri and Surti). Probability of sharing same profile by two individuals at a specific locus was computed considering different STR numbers, allele pooling in breed and population. Match probabilities per breed were considered and six most polymorphic loci were genotyped. Out of eight microsatellite markers studied, markers CSSMO47, DRB3 and CSSM060 were found most polymorphic. Developed technique was validated with known and unknown, blood and meat samples; wherein, samples were genetically traced in 24 out of 25 samples tested. Results of this study showed potential applications of the methodology and encourage other researchers to address the problem of buffalo traceability so as to create a world-wide archive of breed specific genotypes. This work is the first report of breed traceability of buffalo meat utilizing microsatellite genotyping technique.
Genotype 3 is the predominant hepatitis C genotype in a multi-ethnic Asian population in Malaysia.

Science.gov (United States)

Ho, Shiaw-Hooi; Ng, Kee-Peng; Kaur, Harvinder; Goh, Khean-Lee

2015-06-01

Genotypes of hepatitis C virus (HCV) are distributed differently across the world. There is a paucity of such data in a multi-ethnic Asian population like Malaysia. The objectives of this study were to determine the distribution of HCV genotypes between major ethnic groups and to ascertain their association with basic demographic variables like age and gender. This was a cross-sectional prospective study conducted from September 2007 to September 2013. Consecutive patients who were detected to have anti-HCV antibodies in the University of Malaya Medical Centre were included and tested for the presence of HCV RNA using Roche Cobas Amplicor Analyzer and HCV genotype using Roche single Linear Array HCV Genotyping strip. Five hundred and ninety-six subjects were found to have positive anti-HCV antibodies during this period of time. However, only 396 (66.4%) were HCV RNA positive and included in the final analysis. Our results showed that HCV genotype 3 was the predominant genotype with overall frequency of 61.9% followed by genotypes 1 (35.9%), 2 (1.8%) and 6 (0.5%). There was a slightly higher prevalence of HCV genotype 3 among the Malays when compared to the Chinese (P=0.043). No other statistical significant differences were observed in the distribution of HCV genotypes among the major ethnic groups. There was also no association between the predominant genotypes and basic demographic variables. In a multi-ethnic Asian society in Malaysia, genotype 3 is the predominant genotype among all the major ethnic groups with genotype 1 as the second commonest genotype. Both genotypes 2 and 6 are uncommon. Neither genotype 4 nor 5 was detected. There is no identification of HCV genotype according to ethnic origin, age and gender.
Estimating past hepatitis C infection risk from reported risk factor histories: implications for imputing age of infection and modeling fibrosis progression

Directory of Open Access Journals (Sweden)

Busch Michael P

2007-12-01

Full Text Available Abstract Background Chronic hepatitis C virus infection is prevalent and often causes hepatic fibrosis, which can progress to cirrhosis and cause liver cancer or liver failure. Study of fibrosis progression often relies on imputing the time of infection, often as the reported age of first injection drug use. We sought to examine the accuracy of such imputation and implications for modeling factors that influence progression rates. Methods We analyzed cross-sectional data on hepatitis C antibody status and reported risk factor histories from two large studies, the Women's Interagency HIV Study and the Urban Health Study, using modern survival analysis methods for current status data to model past infection risk year by year. We compared fitted distributions of past infection risk to reported age of first injection drug use. Results Although injection drug use appeared to be a very strong risk factor, models for both studies showed that many subjects had considerable probability of having been infected substantially before or after their reported age of first injection drug use. Persons reporting younger age of first injection drug use were more likely to have been infected after, and persons reporting older age of first injection drug use were more likely to have been infected before. Conclusion In cross-sectional studies of fibrosis progression where date of HCV infection is estimated from risk factor histories, modern methods such as multiple imputation should be used to account for the substantial uncertainty about when infection occurred. The models presented here can provide the inputs needed by such methods. Using reported age of first injection drug use as the time of infection in studies of fibrosis progression is likely to produce a spuriously strong association of younger age of infection with slower rate of progression.
Polygenic analysis of genome-wide SNP data identifies common variants on allergic rhinitis

DEFF Research Database (Denmark)

Mohammadnejad, Afsaneh; Brasch-Andersen, Charlotte; Haagerup, Annette

Background: Allergic Rhinitis (AR) is a complex disorder that affects many people around the world. There is a high genetic contribution to the development of the AR, as twins and family studies have estimated heritability of more than 33%. Due to the complex nature of the disease, single SNP...... analysis has limited power in identifying the genetic variations for AR. We combined genome-wide association analysis (GWAS) with polygenic risk score (PRS) in exploring the genetic basis underlying the disease. Methods: We collected clinical data on 631 Danish subjects with AR cases consisting of 434...... sibling pairs and unrelated individuals and control subjects of 197 unrelated individuals. SNP genotyping was done by Affymetrix Genome-Wide Human SNP Array 5.0. SNP imputation was performed using "IMPUTE2". Using additive effect model, GWAS was conducted in discovery sample, the genotypes...
Genetic diversity and phylogenetic relationship in different genotypes of cotton for future breeding

Directory of Open Access Journals (Sweden)

Jehan

2017-11-01

Full Text Available Background: To make the plants well adapted and more resistant to diseases and other environmental stresses there is always a need to improve the quality of plant’s genome i.e. to increase its genetic diversity. Methods: In the present study six variety and six lines of cotton were investigated for their genetic diversity and phylogenetic relationship. For this purpose 35 different RAPD primers obtained from the Gene Link Technologies, USA were used. Results: Among 35 RAPD primers, 13 primers produced reproducible PCR bands while the rest failed to show any amplification product. Our results indicated that the total count of the reproducible bands was 670 and polymorphic loci were counted to be 442 which constitute 66% of total loci. Phylogenetic analysis revealed two major groups each consists of 7 and 5 genotypes respectively. Genotypes Lp1 and Tp4 were placed at maximum genetic distance and in separate groups and could be utilized for future cotton breeding. Conclusions: RAPD analysis is a cheaper and time saving technique for the determination of genetic diversity of different cotton genotypes. Cotton genotype Lp1 and Tp4 could be the best candidates for future breeding programs as both genotypes are genetically distant from each other.
Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

DEFF Research Database (Denmark)

Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls....... In a dairy data set, predictions using BayesRC and imputed sequence data from 1000 Bull Genomes were 2% more accurate than with 800k data. We could demonstrate the method identified causal mutations in some cases. Further improvements will come from more accurate imputation of sequence variant genotypes...
A prospective study to assess the association between genotype, phenotype and Prakriti in individuals on phenytoin monotherapy

Directory of Open Access Journals (Sweden)

Saket J. Thaker

2017-01-01

Conclusions: We did not find any association between Prakriti and either phenotype or genotypes suggesting that Prakriti assessment would be of limited utility in individualizing phenytoin therapy in epilepsy patients.
Relatedness of Indian flax genotypes (Linum usitatissimum L.): an inter-simple sequence repeat (ISSR) primer assay.

Science.gov (United States)

Rajwade, Ashwini V; Arora, Ritu S; Kadoo, Narendra Y; Harsulkar, Abhay M; Ghorpade, Prakash B; Gupta, Vidya S

2010-06-01

The objective of this study was to analyze the genetic relationships, using PCR-based ISSR markers, among 70 Indian flax (Linum usitatissimum L.) genotypes actively utilized in flax breeding programs. Twelve ISSR primers were used for the analysis yielding 136 loci, of which 87 were polymorphic. The average number of amplified loci and the average number of polymorphic loci per primer were 11.3 and 7.25, respectively, while the percent loci polymorphism ranged from 11.1 to 81.8 with an average of 63.9 across all the genotypes. The range of polymorphism information content scores was 0.03-0.49, with an average of 0.18. A dendrogram was generated based on the similarity matrix by the Unweighted Pair Group Method with Arithmetic Mean (UPGMA), wherein the flax genotypes were grouped in five clusters. The Jaccard's similarity coefficient among the genotypes ranged from 0.60 to 0.97. When the omega-3 alpha linolenic acid (ALA) contents of the individual genotypes were correlated with the clusters in the dendrogram, the high ALA containing genotypes were grouped in two clusters. This study identified SLS 50, Ayogi, and Sheetal to be the most diverse genotypes and suggested their use in breeding programs and for developing mapping populations.
Decoding noises in HIV computational genotyping.

Science.gov (United States)

Jia, MingRui; Shaw, Timothy; Zhang, Xing; Liu, Dong; Shen, Ye; Ezeamama, Amara E; Yang, Chunfu; Zhang, Ming

2017-11-01

Lack of a consistent and reliable genotyping system can critically impede HIV genomic research on pathogenesis, fitness, virulence, drug resistance, and genomic-based healthcare and treatment. At present, mis-genotyping, i.e., background noises in molecular genotyping, and its impact on epidemic surveillance is unknown. For the first time, we present a comprehensive assessment of HIV genotyping quality. HIV sequence data were retrieved from worldwide published records, and subjected to a systematic genotyping assessment pipeline. Results showed that mis-genotyped cases occurred at 4.6% globally, with some regional and high-risk population heterogeneities. Results also revealed a consistent mis-genotyping pattern in gp120 in all studied populations except the group of men who have sex with men. Our study also suggests novel virus diversities in the mis-genotyped cases. Finally, this study reemphasizes the importance of implementing a standardized genotyping pipeline to avoid genotyping disparity and to advance our understanding of virus evolution in various epidemiological settings. Copyright © 2017 Elsevier Inc. All rights reserved.
Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

Science.gov (United States)

Horikoshi, Momoko; Mӓgi, Reedik; van de Bunt, Martijn; Surakka, Ida; Sarin, Antti-Pekka; Mahajan, Anubha; Marullo, Letizia; Thorleifsson, Gudmar; Hӓgg, Sara; Hottenga, Jouke-Jan; Ladenvall, Claes; Ried, Janina S; Winkler, Thomas W; Willems, Sara M; Pervjakova, Natalia; Esko, Tõnu; Beekman, Marian; Nelson, Christopher P; Willenborg, Christina; Wiltshire, Steven; Ferreira, Teresa; Fernandez, Juan; Gaulton, Kyle J; Steinthorsdottir, Valgerdur; Hamsten, Anders; Magnusson, Patrik K E; Willemsen, Gonneke; Milaneschi, Yuri; Robertson, Neil R; Groves, Christopher J; Bennett, Amanda J; Lehtimӓki, Terho; Viikari, Jorma S; Rung, Johan; Lyssenko, Valeriya; Perola, Markus; Heid, Iris M; Herder, Christian; Grallert, Harald; Müller-Nurasyid, Martina; Roden, Michael; Hypponen, Elina; Isaacs, Aaron; van Leeuwen, Elisabeth M; Karssen, Lennart C; Mihailov, Evelin; Houwing-Duistermaat, Jeanine J; de Craen, Anton J M; Deelen, Joris; Havulinna, Aki S; Blades, Matthew; Hengstenberg, Christian; Erdmann, Jeanette; Schunkert, Heribert; Kaprio, Jaakko; Tobin, Martin D; Samani, Nilesh J; Lind, Lars; Salomaa, Veikko; Lindgren, Cecilia M; Slagboom, P Eline; Metspalu, Andres; van Duijn, Cornelia M; Eriksson, Johan G; Peters, Annette; Gieger, Christian; Jula, Antti; Groop, Leif; Raitakari, Olli T; Power, Chris; Penninx, Brenda W J H; de Geus, Eco; Smit, Johannes H; Boomsma, Dorret I; Pedersen, Nancy L; Ingelsson, Erik; Thorsteinsdottir, Unnur; Stefansson, Kari; Ripatti, Samuli; Prokopenko, Inga; McCarthy, Mark I; Morris, Andrew P

2015-07-01

Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.

Genotype x environment interaction for grain yield of wheat genotypes tested under water stress conditions

International Nuclear Information System (INIS)

Sail, M.A.; Dahot, M.U.; Mangrio, S.M.; Memon, S.

2007-01-01

Effect of water stress on grain yield in different wheat genotypes was studied under field conditions at various locations. Grain yield is a complex polygenic trait influenced by genotype, environment and genotype x environment (GxE) interaction. To understand the stability among genotypes for grain yield, twenty-one wheat genotypes developed Through hybridization and radiation-induced mutations at Nuclear Institute of Agriculture (NIA) TandoJam were evaluated with four local check varieties (Sarsabz, Thori, Margalla-99 and Chakwal-86) in multi-environmental trails (MET/sub s/). The experiments were conducted over 5 different water stress environments in Sindh. Data on grain yield were recorded from each site and statistically analyzed. Combined analysis of variance for all the environments indicated that the genotype, environment and genotype x environment (GxE) interaction were highly significant (P greater then 0.01) for grain yield. Genotypes differed in their response to various locations. The overall highest site mean yield (4031 kg/ha) recorded at Moro and the lowest (2326 kg/ha) at Thatta. Six genotypes produced significantly (P=0.01) the highest grain yield overall the environments. Stability analysis was applied to estimate stability parameters viz., regression coefficient (b), standard error of regression coefficient and variance due to deviation from regression (S/sub 2/d) genotypes 10/8, BWS-78 produced the highest mean yield over all the environments with low regression coefficient (b=0.68, 0.67 and 0.63 respectively and higher S/sup 2/ d value, showing specific adaptation to poor (un favorable) environments. Genotype 8/7 produced overall higher grain yield (3647 kg/ha) and ranked as third high yielding genotype had regression value close to unity (b=0.9) and low S/sup d/ value, indicating more stability and wide adaptation over the all environments. The knowledge of the presence and magnitude of genotype x environment (GE) interaction is important to
HBV genotypic variability in Cuba.

Directory of Open Access Journals (Sweden)

Carmen L Loureiro

Full Text Available The genetic diversity of HBV in human population is often a reflection of its genetic admixture. The aim of this study was to explore the genotypic diversity of HBV in Cuba. The S genomic region of Cuban HBV isolates was sequenced and for selected isolates the complete genome or precore-core sequence was analyzed. The most frequent genotype was A (167/250, 67%, mainly A2 (149, 60% but also A1 and one A4. A total of 77 isolates were classified as genotype D (31%, with co-circulation of several subgenotypes (56 D4, 2 D1, 5 D2, 7 D3/6 and 7 D7. Three isolates belonged to genotype E, two to H and one to B3. Complete genome sequence analysis of selected isolates confirmed the phylogenetic analysis performed with the S region. Mutations or polymorphisms in precore region were more common among genotype D compared to genotype A isolates. The HBV genotypic distribution in this Caribbean island correlates with the Y lineage genetic background of the population, where a European and African origin prevails. HBV genotypes E, B3 and H isolates might represent more recent introductions.
HBV Genotypic Variability in Cuba

Science.gov (United States)

Loureiro, Carmen L.; Aguilar, Julio C.; Aguiar, Jorge; Muzio, Verena; Pentón, Eduardo; Garcia, Daymir; Guillen, Gerardo; Pujol, Flor H.

2015-01-01

The genetic diversity of HBV in human population is often a reflection of its genetic admixture. The aim of this study was to explore the genotypic diversity of HBV in Cuba. The S genomic region of Cuban HBV isolates was sequenced and for selected isolates the complete genome or precore-core sequence was analyzed. The most frequent genotype was A (167/250, 67%), mainly A2 (149, 60%) but also A1 and one A4. A total of 77 isolates were classified as genotype D (31%), with co-circulation of several subgenotypes (56 D4, 2 D1, 5 D2, 7 D3/6 and 7 D7). Three isolates belonged to genotype E, two to H and one to B3. Complete genome sequence analysis of selected isolates confirmed the phylogenetic analysis performed with the S region. Mutations or polymorphisms in precore region were more common among genotype D compared to genotype A isolates. The HBV genotypic distribution in this Caribbean island correlates with the Y lineage genetic background of the population, where a European and African origin prevails. HBV genotypes E, B3 and H isolates might represent more recent introductions. PMID:25742179
Desmanthus GENOTYPES

Directory of Open Access Journals (Sweden)

JOSÉ HENRIQUE DE ALBUQUERQUE RANGEL

2015-01-01

Full Text Available Desmanthus is a genus of forage legumes with potential to improve pastures and livestock produc-tion on clay soils of dry tropical and subtropical regions such as the existing in Brazil and Australia. Despite this patterns of natural or enforced after-ripening of Desmanthus seeds have not been well established. Four year old seed banks of nine Desmanthus genotypes at James Cook University were accessed for their patterns of seed softe-ning in response to a range of temperatures. Persistent seed banks were found to exist under all of the studied ge-notypes. The largest seeds banks were found in the genotypes CPI 78373 and CPI 78382 and the smallest in the genotypes CPI’s 37143, 67643, and 83563. An increase in the percentage of softened seeds was correlated with higher temperatures, in two patterns of response: in some accessions seeds were not significantly affected by tempe-ratures below 80º C; and in others, seeds become soft when temperature rose to as little as 60 ºC. At 80 °C the heat started to depress germination. High seed production of Desmanthus associated with dependence of seeds on eleva-ted temperatures to softening can be a very important strategy for plants to survive in dry tropical regions.
Fine scale mapping of the 17q22 breast cancer locus using dense SNPs, genotyped within the Collaborative Oncological Gene-Environment Study (COGs).

Science.gov (United States)

Darabi, Hatef; Beesley, Jonathan; Droit, Arnaud; Kar, Siddhartha; Nord, Silje; Moradi Marjaneh, Mahdi; Soucy, Penny; Michailidou, Kyriaki; Ghoussaini, Maya; Fues Wahl, Hanna; Bolla, Manjeet K; Wang, Qin; Dennis, Joe; Alonso, M Rosario; Andrulis, Irene L; Anton-Culver, Hoda; Arndt, Volker; Beckmann, Matthias W; Benitez, Javier; Bogdanova, Natalia V; Bojesen, Stig E; Brauch, Hiltrud; Brenner, Hermann; Broeks, Annegien; Brüning, Thomas; Burwinkel, Barbara; Chang-Claude, Jenny; Choi, Ji-Yeob; Conroy, Don M; Couch, Fergus J; Cox, Angela; Cross, Simon S; Czene, Kamila; Devilee, Peter; Dörk, Thilo; Easton, Douglas F; Fasching, Peter A; Figueroa, Jonine; Fletcher, Olivia; Flyger, Henrik; Galle, Eva; García-Closas, Montserrat; Giles, Graham G; Goldberg, Mark S; González-Neira, Anna; Guénel, Pascal; Haiman, Christopher A; Hallberg, Emily; Hamann, Ute; Hartman, Mikael; Hollestelle, Antoinette; Hopper, John L; Ito, Hidemi; Jakubowska, Anna; Johnson, Nichola; Kang, Daehee; Khan, Sofia; Kosma, Veli-Matti; Kriege, Mieke; Kristensen, Vessela; Lambrechts, Diether; Le Marchand, Loic; Lee, Soo Chin; Lindblom, Annika; Lophatananon, Artitaya; Lubinski, Jan; Mannermaa, Arto; Manoukian, Siranoush; Margolin, Sara; Matsuo, Keitaro; Mayes, Rebecca; McKay, James; Meindl, Alfons; Milne, Roger L; Muir, Kenneth; Neuhausen, Susan L; Nevanlinna, Heli; Olswold, Curtis; Orr, Nick; Peterlongo, Paolo; Pita, Guillermo; Pylkäs, Katri; Rudolph, Anja; Sangrajrang, Suleeporn; Sawyer, Elinor J; Schmidt, Marjanka K; Schmutzler, Rita K; Seynaeve, Caroline; Shah, Mitul; Shen, Chen-Yang; Shu, Xiao-Ou; Southey, Melissa C; Stram, Daniel O; Surowy, Harald; Swerdlow, Anthony; Teo, Soo H; Tessier, Daniel C; Tomlinson, Ian; Torres, Diana; Truong, Thérèse; Vachon, Celine M; Vincent, Daniel; Winqvist, Robert; Wu, Anna H; Wu, Pei-Ei; Yip, Cheng Har; Zheng, Wei; Pharoah, Paul D P; Hall, Per; Edwards, Stacey L; Simard, Jacques; French, Juliet D; Chenevix-Trench, Georgia; Dunning, Alison M

2016-09-07

Genome-wide association studies have found SNPs at 17q22 to be associated with breast cancer risk. To identify potential causal variants related to breast cancer risk, we performed a high resolution fine-mapping analysis that involved genotyping 517 SNPs using a custom Illumina iSelect array (iCOGS) followed by imputation of genotypes for 3,134 SNPs in more than 89,000 participants of European ancestry from the Breast Cancer Association Consortium (BCAC). We identified 28 highly correlated common variants, in a 53 Kb region spanning two introns of the STXBP4 gene, that are strong candidates for driving breast cancer risk (lead SNP rs2787486 (OR = 0.92; CI 0.90-0.94; P = 8.96 × 10(-15))) and are correlated with two previously reported risk-associated variants at this locus, SNPs rs6504950 (OR = 0.94, P = 2.04 × 10(-09), r(2) = 0.73 with lead SNP) and rs1156287 (OR = 0.93, P = 3.41 × 10(-11), r(2) = 0.83 with lead SNP). Analyses indicate only one causal SNP in the region and several enhancer elements targeting STXBP4 are located within the 53 kb association signal. Expression studies in breast tumor tissues found SNP rs2787486 to be associated with increased STXBP4 expression, suggesting this may be a target gene of this locus.
IL28B genotype is not useful for predicting treatment outcome in Asian chronic hepatitis B patients treated with pegylated interferon-α.

Science.gov (United States)

Holmes, Jacinta A; Nguyen, Tin; Ratnam, Dilip; Heerasing, Neel M; Tehan, Jane V; Bonanzinga, Sara; Dev, Anouk; Bell, Sally; Pianko, Stephen; Chen, Robert; Visvanathan, Kumar; Hammond, Rachel; Iser, David; Rusli, Ferry; Sievert, William; Desmond, Paul V; Bowden, D Scott; Thompson, Alexander J

2013-05-01

IL28B genotype predicts response to pegylated interferon (peg-IFN)-based therapy in chronic hepatitis C. However, the utility of IL28B genotyping in chronic hepatitis B (CHB) cohorts treated with peg-IFN is unclear. It was investigated whether IL28B genotype is associated with peg-IFN treatment outcomes in a predominantly Asian CHB cohort. This was a retrospective analysis of CHB patients treated with 48 weeks of peg-IFN monotherapy. IL28B genotype (rs12979860) was determined (TaqMan allelic discrimination kit). Baseline hepatitis B virus (HBV)-DNA, alanine aminotransferase, and liver histology were available. The primary end-points were HBV e antigen (HBeAg) seroconversion with HBV-DNA < 2000 IU/mL 24 weeks post-therapy (HBeAg-positive patients) and HBV-DNA < 2000 IU/mL 24 weeks after peg-IFN (HBeAg-negative patients). The association between IL28B genotype and peg-IFN outcomes was analyzed. IL28B genotype was determined for 96 patients. Eighty-eight percent were Asian, 62% were HBeAg positive, and 13% were METAVIR stage F3-4. Median follow-up time was 39.3 months. The majority of patients carried the CC IL28B genotype (84%). IL28B genotype did not differ according to HBeAg status. The primary end-points were achieved in 27% of HBeAg-positive and 61% of HBeAg-negative patients. There was no association between IL28B genotype and the primary end-point in either group. Furthermore, there was no difference in HBeAg loss alone, HBV surface antigen, alanine aminotransferase normalization, or on-treatment HBV-DNA levels according to IL28B genotype. In the context of a small possible effect size and high frequency in Asian populations, IL28B genotyping is likely to have, at best, limited clinical utility for predicting peg-IFN treatment outcome for CHB patients in the Asia-Pacific region. © 2013 Journal of Gastroenterology and Hepatology Foundation and Wiley Publishing Asia Pty Ltd.
A standardized framework for accurate, high-throughput genotyping of recombinant and non-recombinant viral sequences.

Science.gov (United States)

Alcantara, Luiz Carlos Junior; Cassol, Sharon; Libin, Pieter; Deforche, Koen; Pybus, Oliver G; Van Ranst, Marc; Galvão-Castro, Bernardo; Vandamme, Anne-Mieke; de Oliveira, Tulio

2009-07-01

Human immunodeficiency virus type-1 (HIV-1), hepatitis B and C and other rapidly evolving viruses are characterized by extremely high levels of genetic diversity. To facilitate diagnosis and the development of prevention and treatment strategies that efficiently target the diversity of these viruses, and other pathogens such as human T-lymphotropic virus type-1 (HTLV-1), human herpes virus type-8 (HHV8) and human papillomavirus (HPV), we developed a rapid high-throughput-genotyping system. The method involves the alignment of a query sequence with a carefully selected set of pre-defined reference strains, followed by phylogenetic analysis of multiple overlapping segments of the alignment using a sliding window. Each segment of the query sequence is assigned the genotype and sub-genotype of the reference strain with the highest bootstrap (>70%) and bootscanning (>90%) scores. Results from all windows are combined and displayed graphically using color-coded genotypes. The new Virus-Genotyping Tools provide accurate classification of recombinant and non-recombinant viruses and are currently being assessed for their diagnostic utility. They have incorporated into several HIV drug resistance algorithms including the Stanford (http://hivdb.stanford.edu) and two European databases (http://www.umcutrecht.nl/subsite/spread-programme/ and http://www.hivrdb.org.uk/) and have been successfully used to genotype a large number of sequences in these and other databases. The tools are a PHP/JAVA web application and are freely accessible on a number of servers including: http://bioafrica.mrc.ac.za/rega-genotype/html/, http://lasp.cpqgm.fiocruz.br/virus-genotype/html/, http://jose.med.kuleuven.be/genotypetool/html/.
A rare variant in MYH6 is associated with high risk of sick sinus syndrome

DEFF Research Database (Denmark)

Holm, Hilma; Gudbjartsson, Daniel F; Sulem, Patrick

2011-01-01

Through complementary application of SNP genotyping, whole-genome sequencing and imputation in 38,384 Icelanders, we have discovered a previously unidentified sick sinus syndrome susceptibility gene, MYH6, encoding the alpha heavy chain subunit of cardiac myosin. A missense variant in this gene, ...
Similar predictions of etravirine sensitivity regardless of genotypic testing method used: comparison of available scoring systems.

Science.gov (United States)

Vingerhoets, Johan; Nijs, Steven; Tambuyzer, Lotke; Hoogstoel, Annemie; Anderson, David; Picchio, Gaston

2012-01-01

The aims of this study were to compare various genotypic scoring systems commonly used to predict virological outcome to etravirine, and examine their concordance with etravirine phenotypic susceptibility. Six etravirine genotypic scoring systems were assessed: Tibotec 2010 (based on 20 mutations; TBT 20), Monogram, Stanford HIVdb, ANRS, Rega (based on 37, 30, 27 and 49 mutations, respectively) and virco(®)TYPE HIV-1 (predicted fold change based on genotype). Samples from treatment-experienced patients who participated in the DUET trials and with both genotypic and phenotypic data (n=403) were assessed using each scoring system. Results were retrospectively correlated with virological response in DUET. κ coefficients were calculated to estimate the degree of correlation between the different scoring systems. Correlation between the five scoring systems and the TBT 20 system was approximately 90%. Virological response by etravirine susceptibility was comparable regardless of which scoring system was utilized, with 70-74% of DUET patients determined as susceptible to etravirine by the different scoring systems achieving plasma viral load <50 HIV-1 RNA copies/ml. In samples classed as phenotypically susceptible to etravirine (fold change in 50% effective concentration ≤3), correlations with genotypic score were consistently high across scoring systems (≥70%). In general, the etravirine genotypic scoring systems produced similar results, and genotype-phenotype concordance was high. As such, phenotypic interpretations, and in their absence all genotypic scoring systems investigated, may be used to reliably predict the activity of etravirine.
Blood group genotyping: the power and limitations of the Hemo ID Panel and MassARRAY platform.

Science.gov (United States)

McBean, Rhiannon S; Hyland, Catherine A; Flower, Robert L

2015-01-01

Matrix-assisted laser desorption/ionization, time-of-flight mass spectrometry (MALDI-TOF MS), is a sensitive analytical method capable of resolving DNA fragments varying in mass by a single nucleotide. MALDI-TOF MS is applicable to blood group genotyping, as the majority of blood group antigens are encoded by single nucleotide polymorphisms. Blood group genotyping by MALDI-TOF MS can be performed using a panel (Hemo ID Blood Group Genotyping Panel, Agena Bioscience Inc., San Diego, CA) that is a set of genotyping assays that predict the phenotype for 101 antigens from 16 blood group systems. These assays involve three fundamental stages: multiplex target-specific polymerase chain reaction amplification, allele-specific single base primer extension, and MALDI-TOFMS analysis using the MassARRAY system. MALDI-TOF MS-based genotyping has many advantages over alternative methods including high throughput, high multiplex capability, flexibility and adaptability, and the high level of accuracy based on the direct detection method. Currently available platforms for MALDI-TOF MS-based genotyping are not without limitations, including high upfront instrumentation costs and the number of non-automated steps. The Hemo ID Blood Group Genotyping Panel, developed and optimized in a collaboration between the vendor and the Blood Transfusion Service of the Swiss Red Cross in Zurich, Switzerland, is not yet widely utilized, although several laboratories are currently evaluating the MassARRAY system for blood group genotyping. Based on the accuracy and other advantages offered by MALDITOF MS analysis, in the future, this method is likely to become widely adopted for blood group genotyping, in particular, for population screening.
An efficient genotyping method for genome-modified animals and human cells generated with CRISPR/Cas9 system.

Science.gov (United States)

Zhu, Xiaoxiao; Xu, Yajie; Yu, Shanshan; Lu, Lu; Ding, Mingqin; Cheng, Jing; Song, Guoxu; Gao, Xing; Yao, Liangming; Fan, Dongdong; Meng, Shu; Zhang, Xuewen; Hu, Shengdi; Tian, Yong

2014-09-19

The rapid generation of various species and strains of laboratory animals using CRISPR/Cas9 technology has dramatically accelerated the interrogation of gene function in vivo. So far, the dominant approach for genotyping of genome-modified animals has been the T7E1 endonuclease cleavage assay. Here, we present a polyacrylamide gel electrophoresis-based (PAGE) method to genotype mice harboring different types of indel mutations. We developed 6 strains of genome-modified mice using CRISPR/Cas9 system, and utilized this approach to genotype mice from F0 to F2 generation, which included single and multiplexed genome-modified mice. We also determined the maximal detection sensitivity for detecting mosaic DNA using PAGE-based assay as 0.5%. We further applied PAGE-based genotyping approach to detect CRISPR/Cas9-mediated on- and off-target effect in human 293T and induced pluripotent stem cells (iPSCs). Thus, PAGE-based genotyping approach meets the rapidly increasing demand for genotyping of the fast-growing number of genome-modified animals and human cell lines created using CRISPR/Cas9 system or other nuclease systems such as TALEN or ZFN.
Distribution of genotype network sizes in sequence-to-structure genotype-phenotype maps.

Science.gov (United States)

Manrubia, Susanna; Cuesta, José A

2017-04-01

An essential quantity to ensure evolvability of populations is the navigability of the genotype space. Navigability, understood as the ease with which alternative phenotypes are reached, relies on the existence of sufficiently large and mutually attainable genotype networks. The size of genotype networks (e.g. the number of RNA sequences folding into a particular secondary structure or the number of DNA sequences coding for the same protein structure) is astronomically large in all functional molecules investigated: an exhaustive experimental or computational study of all RNA folds or all protein structures becomes impossible even for moderately long sequences. Here, we analytically derive the distribution of genotype network sizes for a hierarchy of models which successively incorporate features of increasingly realistic sequence-to-structure genotype-phenotype maps. The main feature of these models relies on the characterization of each phenotype through a prototypical sequence whose sites admit a variable fraction of letters of the alphabet. Our models interpolate between two limit distributions: a power-law distribution, when the ordering of sites in the prototypical sequence is strongly constrained, and a lognormal distribution, as suggested for RNA, when different orderings of the same set of sites yield different phenotypes. Our main result is the qualitative and quantitative identification of those features of sequence-to-structure maps that lead to different distributions of genotype network sizes. © 2017 The Author(s).
Transforming microbial genotyping: a robotic pipeline for genotyping bacterial strains.

Directory of Open Access Journals (Sweden)

Brian O'Farrell

Full Text Available Microbial genotyping increasingly deals with large numbers of samples, and data are commonly evaluated by unstructured approaches, such as spread-sheets. The efficiency, reliability and throughput of genotyping would benefit from the automation of manual manipulations within the context of sophisticated data storage. We developed a medium- throughput genotyping pipeline for MultiLocus Sequence Typing (MLST of bacterial pathogens. This pipeline was implemented through a combination of four automated liquid handling systems, a Laboratory Information Management System (LIMS consisting of a variety of dedicated commercial operating systems and programs, including a Sample Management System, plus numerous Python scripts. All tubes and microwell racks were bar-coded and their locations and status were recorded in the LIMS. We also created a hierarchical set of items that could be used to represent bacterial species, their products and experiments. The LIMS allowed reliable, semi-automated, traceable bacterial genotyping from initial single colony isolation and sub-cultivation through DNA extraction and normalization to PCRs, sequencing and MLST sequence trace evaluation. We also describe robotic sequencing to facilitate cherrypicking of sequence dropouts. This pipeline is user-friendly, with a throughput of 96 strains within 10 working days at a total cost of 200,000 items were processed by two to three people. Our sophisticated automated pipeline can be implemented by a small microbiology group without extensive external support, and provides a general framework for semi-automated bacterial genotyping of large numbers of samples at low cost.
An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.

Science.gov (United States)

Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K

2014-01-01

Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone
Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation.

Directory of Open Access Journals (Sweden)

Momoko Horikoshi

2015-07-01

Full Text Available Reference panels from the 1000 Genomes (1000G Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS, supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI at genome-wide significance, and two for fasting glucose (FG, none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3 and FG (GCK and G6PC2. The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.
Rotavirus genotype shifts among Swedish children and adults-Application of a real-time PCR genotyping.

Science.gov (United States)

Andersson, Maria; Lindh, Magnus

2017-11-01

It is well known that human rotavirus group A is the most important cause of severe diarrhoea in infants and young children. Less is known about rotavirus infections in other age groups, and about how rotavirus genotypes change over time in different age groups. Develop a real-time PCR to easily genotype rotavirus strains in order to monitor the pattern of circulating genotypes. In this study, rotavirus strains in clinical samples from children and adults in Western Sweden during 2010-2014 were retrospectively genotyped by using specific amplification of VP 4 and VP 7 genes with a new developed real-rime PCR. A genotype was identified in 97% of 775 rotavirus strains. G1P[8] was the most common genotype representing 34.9%, followed by G2P[4] (28.3%), G9P[8] (11.5%), G3P[8] (8.1%), and G4P[8] (7.9%) The genotype distribution changed over time, from predominance of G1P[8] in 2010-2012 to predominance of G2P[4] in 2013-2014. There were also age-related differences, with G1P[8] being the most common genotype in children under 2 years (47.6%), and G2P[4] the most common in those over 70 years of age (46.1%.). The shift to G2P[4] in 2013-2014 was associated with a change in the age distribution, with a greater number of rotavirus positive cases in elderly than in children. By using a new real-time PCR method for genotyping we found that genotype distribution was age related and changed over time with a decreasing proportion of G1P[8]. Copyright © 2017. Published by Elsevier B.V.
A Nonparametric, Multiple Imputation-Based Method for the Retrospective Integration of Data Sets

Science.gov (United States)

Carrig, Madeline M.; Manrique-Vallier, Daniel; Ranby, Krista W.; Reiter, Jerome P.; Hoyle, Rick H.

2015-01-01

Complex research questions often cannot be addressed adequately with a single data set. One sensible alternative to the high cost and effort associated with the creation of large new data sets is to combine existing data sets containing variables related to the constructs of interest. The goal of the present research was to develop a flexible, broadly applicable approach to the integration of disparate data sets that is based on nonparametric multiple imputation and the collection of data from a convenient, de novo calibration sample. We demonstrate proof of concept for the approach by integrating three existing data sets containing items related to the extent of problematic alcohol use and associations with deviant peers. We discuss both necessary conditions for the approach to work well and potential strengths and weaknesses of the method compared to other data set integration approaches. PMID:26257437
Selecting and utilizing Populus and Salix for landfill covers: implications for leachate irrigation.

Science.gov (United States)

Zalesny, Ronald S; Bauer, Edmund O

2007-01-01

The success of using Populus and Salix for phytoremediation has prompted further use of leachate as a combination of irrigation and fertilization for the trees. A common protocol for such efforts has been to utilize a limited number of readily-available genotypes with decades of deployment in other applications, such as fiber or windbreaks. However, it may be possible to increase phytoremediation success with proper genotypic screening and selection, followed by the field establishment of clones that exhibited favorable potential for cleanup of specific contaminants. There is an overwhelming need for testing and subsequent deployment of diverse Populus and Salix genotypes, given current availability of clonal material and the inherent genetic variation among and within these genera. Therefore, we detail phyto-recurrent selection, a method that consists of revising and combining crop and tree improvement protocols to meet the objective of utilizing superior Populus and Salix clones for remediation applications. Although such information is lacking for environmental clean-up technologies, centuries of plant selection success in agronomy, horticulture, and forestry validate the need for similar approaches in phytoremediation. We bridge the gap between these disciplines by describing project development, clone selection, tree establishment, and evaluation of success metrics in the context of their importance to utilizing trees for phytoremediation.
An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency.

Science.gov (United States)

Guo, Wei-Li; Huang, De-Shuang

2017-08-22

Transcription factors (TFs) are DNA-binding proteins that have a central role in regulating gene expression. Identification of DNA-binding sites of TFs is a key task in understanding transcriptional regulation, cellular processes and disease. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) enables genome-wide identification of in vivo TF binding sites. However, it is still difficult to map every TF in every cell line owing to cost and biological material availability, which poses an enormous obstacle for integrated analysis of gene regulation. To address this problem, we propose a novel computational approach, TFBSImpute, for predicting additional TF binding profiles by leveraging information from available ChIP-seq TF binding data. TFBSImpute fuses the dataset to a 3-mode tensor and imputes missing TF binding signals via simultaneous completion of multiple TF binding matrices with positional consistency. We show that signals predicted by our method achieve overall similarity with experimental data and that TFBSImpute significantly outperforms baseline approaches, by assessing the performance of imputation methods against observed ChIP-seq TF binding profiles. Besides, motif analysis shows that TFBSImpute preforms better in capturing binding motifs enriched in observed data compared with baselines, indicating that the higher performance of TFBSImpute is not simply due to averaging related samples. We anticipate that our approach will constitute a useful complement to experimental mapping of TF binding, which is beneficial for further study of regulation mechanisms and disease.
Genotype Reconstruction of Paternity in European Lobsters (Homarus gammarus).

Science.gov (United States)

Ellis, Charlie D; Hodgson, David J; André, Carl; Sørdalen, Tonje K; Knutsen, Halvor; Griffiths, Amber G F

2015-01-01

Decapod crustaceans exhibit considerable variation in fertilisation strategies, ranging from pervasive single paternity to the near-ubiquitous presence of multiple paternity, and such knowledge of mating systems and behaviour are required for the informed management of commercially-exploited marine fisheries. We used genetic markers to assess the paternity of individual broods in the European lobster, Homarus gammarus, a species for which paternity structure is unknown. Using 13 multiplexed microsatellite loci, three of which are newly described in this study, we genotyped 10 eggs from each of 34 females collected from an Atlantic peninsula in the south-western United Kingdom. Single reconstructed paternal genotypes explained all observed progeny genotypes in each of the 34 egg clutches, and each clutch was fertilised by a different male. Simulations indicated that the probability of detecting multiple paternity was in excess of 95% if secondary sires account for at least a quarter of the brood, and in excess of 99% where additional sire success was approximately equal. Our results show that multiple paternal fertilisations are either absent, unusual, or highly skewed in favour of a single male among H. gammarus in this area. Potential mechanisms upholding single paternal fertilisation are discussed, along with the prospective utility of parentage assignments in evaluations of hatchery stocking and other fishery conservation approaches in light of this finding.

Genotype Reconstruction of Paternity in European Lobsters (Homarus gammarus.

Directory of Open Access Journals (Sweden)

Charlie D Ellis

Full Text Available Decapod crustaceans exhibit considerable variation in fertilisation strategies, ranging from pervasive single paternity to the near-ubiquitous presence of multiple paternity, and such knowledge of mating systems and behaviour are required for the informed management of commercially-exploited marine fisheries. We used genetic markers to assess the paternity of individual broods in the European lobster, Homarus gammarus, a species for which paternity structure is unknown. Using 13 multiplexed microsatellite loci, three of which are newly described in this study, we genotyped 10 eggs from each of 34 females collected from an Atlantic peninsula in the south-western United Kingdom. Single reconstructed paternal genotypes explained all observed progeny genotypes in each of the 34 egg clutches, and each clutch was fertilised by a different male. Simulations indicated that the probability of detecting multiple paternity was in excess of 95% if secondary sires account for at least a quarter of the brood, and in excess of 99% where additional sire success was approximately equal. Our results show that multiple paternal fertilisations are either absent, unusual, or highly skewed in favour of a single male among H. gammarus in this area. Potential mechanisms upholding single paternal fertilisation are discussed, along with the prospective utility of parentage assignments in evaluations of hatchery stocking and other fishery conservation approaches in light of this finding.
Contemplation of wheat genotypes for enhanced antioxidant enzyme activity

International Nuclear Information System (INIS)

Nasim, S.; Shabbir, G.; Ilyas, M.

2017-01-01

Wheat (Triticum aestivum L.) is leading cereal crop in Pakistan but its yield is highly affected due to various abiotic factors especially drought stress, which affects the metabolism of plants. The present study was conducted at Pir Mehr Ali Shah Arid Agriculture University Rawalpindi, using thirty three genotypes during 2011 to investigate the response of anti oxidative enzymes. Seedlings were subjected to stress condition with 30 % PEG 6000 solution along with control (irrigated with water) under in vitro conditions. The experiment was conducted in pots following Complete Randomized Design in Laboratory. Results revealed that under control conditions the maximum values for Guaiacol peroxidase were found in Punjab-96 and Auqab-2000 (2.523), for superoxide in C-273 (0.294), for ascorbate peroxide in PAK-81 (2.523) and for catalase in Kohsar-95 (0.487). Under moisture stress condition the maximum value for Guaiacol peroxidase were recorded for Kohsar-95 (2.699), for superoxide in Kohsar-95 (1.259), for ascorbate peroxide in Pak-81, SA-75, Mexipak-65 and PARI-73 (3.000) and for catalase in Mexipak-65 (0.640). The genotypes which showed higher antioxidant enzyme activity under drought stress have the ability to perform better under adverse soil moisture condition. Such potential genotypes can be utilized in the future breeding programs and also in improving the wheat varieties against drought stress. (author)
Increasing Genome Sampling and Improving SNP Genotyping for Genotyping-by-Sequencing with New Combinations of Restriction Enzymes.

Science.gov (United States)

Fu, Yong-Bi; Peterson, Gregory W; Dong, Yibo

2016-04-07

Genotyping-by-sequencing (GBS) has emerged as a useful genomic approach for exploring genome-wide genetic variation. However, GBS commonly samples a genome unevenly and can generate a substantial amount of missing data. These technical features would limit the power of various GBS-based genetic and genomic analyses. Here we present software called IgCoverage for in silico evaluation of genomic coverage through GBS with an individual or pair of restriction enzymes on one sequenced genome, and report a new set of 21 restriction enzyme combinations that can be applied to enhance GBS applications. These enzyme combinations were developed through an application of IgCoverage on 22 plant, animal, and fungus species with sequenced genomes, and some of them were empirically evaluated with different runs of Illumina MiSeq sequencing in 12 plant species. The in silico analysis of 22 organisms revealed up to eight times more genome coverage for the new combinations consisted of pairing four- or five-cutter restriction enzymes than the commonly used enzyme combination PstI + MspI. The empirical evaluation of the new enzyme combination (HinfI + HpyCH4IV) in 12 plant species showed 1.7-6 times more genome coverage than PstI + MspI, and 2.3 times more genome coverage in dicots than monocots. Also, the SNP genotyping in 12 Arabidopsis and 12 rice plants revealed that HinfI + HpyCH4IV generated 7 and 1.3 times more SNPs (with 0-16.7% missing observations) than PstI + MspI, respectively. These findings demonstrate that these novel enzyme combinations can be utilized to increase genome sampling and improve SNP genotyping in various GBS applications. Copyright © 2016 Fu et al.
Assessment of Consequences of Replacement of System of the Uniform Tax on Imputed Income Patent System of the Taxation

Directory of Open Access Journals (Sweden)

Galina A. Manokhina

2012-11-01

Full Text Available The article highlights the main questions concerning possible consequences of replacement of nowadays operating system in the form of a single tax in reference to imputed income with patent system of the taxation. The main advantages and drawbacks of new system of the taxation are shown, including the opinion that not the replacement of one special mode of the taxation with another is more effective, but the introduction of patent a taxation system as an auxilary system.
Cross-species infection of specific-pathogen-free pigs by a genotype 4 strain of human hepatitis E virus

Science.gov (United States)

Feagins, A. R.; Opriessnig, T.; Huang, Y. W.; Halbur, P. G.; Meng, X. J.

2010-01-01

SUMMARY Hepatitis E virus (HEV) is an important pathogen. The animal strain of HEV, swine HEV, is related to human HEV. The genotype 3 swine HEV infected humans and genotype 3 human HEV infected pigs. The genotype 4 swine and human HEV strains are genetically related, but it is unknown whether genotype 4 human HEV can infect pigs. A swine bioassay was utilized in this study to determine whether genotype 4 human HEV can infect pigs. Fifteen, 4-week-old, specific-pathogen-free pigs were divided into 3 groups of 5 each. Group 1 pigs were each inoculated intravenously with PBS buffer as negative controls, group 2 pigs similarly with genotype 3 human HEV (strain US-2), and group 3 pigs similarly with genotype 4 human HEV (strain TW6196E). Serum and fecal samples were collected at 0, 7, 14, 21, 28, 35, 42, 49, and 56 days postinoculation (dpi) and tested for evidence of HEV infection. All pigs were necropsied at 56 dpi. As expected, the negative control pigs remained negative. The positive control pigs inoculated with genotype 3 human HEV all became infected as evidenced by detection of HEV antibodies, viremia and fecal virus shedding. All five pigs in group 3 inoculated with genotype 4 human HEV also became infected: fecal virus shedding and viremia were detected variably from 7 to 56 dpi, and seroconversion occurred by 28 dpi. The data indicated that genotype 4 human HEV has an expanded host range, and the results have important implications for understanding the natural history and zoonosis of HEV. PMID:18551597
Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias

Directory of Open Access Journals (Sweden)

Didion John P

2012-01-01

Full Text Available Abstract Background High-density genotyping arrays that measure hybridization of genomic DNA fragments to allele-specific oligonucleotide probes are widely used to genotype single nucleotide polymorphisms (SNPs in genetic studies, including human genome-wide association studies. Hybridization intensities are converted to genotype calls by clustering algorithms that assign each sample to a genotype class at each SNP. Data for SNP probes that do not conform to the expected pattern of clustering are often discarded, contributing to ascertainment bias and resulting in lost information - as much as 50% in a recent genome-wide association study in dogs. Results We identified atypical patterns of hybridization intensities that were highly reproducible and demonstrated that these patterns represent genetic variants that were not accounted for in the design of the array platform. We characterized variable intensity oligonucleotide (VINO probes that display such patterns and are found in all hybridization-based genotyping platforms, including those developed for human, dog, cattle, and mouse. When recognized and properly interpreted, VINOs recovered a substantial fraction of discarded probes and counteracted SNP ascertainment bias. We developed software (MouseDivGeno that identifies VINOs and improves the accuracy of genotype calling. MouseDivGeno produced highly concordant genotype calls when compared with other methods but it uniquely identified more than 786000 VINOs in 351 mouse samples. We used whole-genome sequence from 14 mouse strains to confirm the presence of novel variants explaining 28000 VINOs in those strains. We also identified VINOs in human HapMap 3 samples, many of which were specific to an African population. Incorporating VINOs in phylogenetic analyses substantially improved the accuracy of a Mus species tree and local haplotype assignment in laboratory mouse strains. Conclusion The problems of ascertainment bias and missing
Hepatitis C virus genotypes in Myanmar.

Science.gov (United States)

Win, Nan Nwe; Kanda, Tatsuo; Nakamoto, Shingo; Yokosuka, Osamu; Shirasawa, Hiroshi

2016-07-21

Myanmar is adjacent to India, Bangladesh, Thailand, Laos and China. In Myanmar, the prevalence of hepatitis C virus (HCV) infection is 2%, and HCV infection accounts for 25% of hepatocellular carcinoma. In this study, we reviewed the prevalence of HCV genotypes in Myanmar. HCV genotypes 1, 3 and 6 were observed in volunteer blood donors in and around the Myanmar city of Yangon. Although there are several reports of HCV genotype 6 and its variants in Myanmar, the distribution of the HCV genotypes has not been well documented in areas other than Yangon. Previous studies showed that treatment with peginterferon and a weight-based dose of ribavirin for 24 or 48 wk could lead to an 80%-100% sustained virological response (SVR) rates in Myanmar. Current interferon-free treatments could lead to higher SVR rates (90%-95%) in patients infected with almost all HCV genotypes other than HCV genotype 3. In an era of heavy reliance on direct-acting antivirals against HCV, there is an increasing need to measure HCV genotypes, and this need will also increase specifically in Myanmar. Current available information of HCV genotypes were mostly from Yangon and other countries than Myanmar. The prevalence of HCV genotypes in Myanmar should be determined.
Selection of lettuce genotypes for phosphorus uptaking efficiency - DOI: 10.4025/actasciagron.v25i1.2348

OpenAIRE

Cock, Wallace Rudeck Sthel; UENF; Tardin, Flávio Dessaune; UENF; Amaral Júnior, Antônio Teixeira do; UENF; Scapim, Carlos Alberto; UEM; Amaral, José Francisco Teixeira do; UFES; Cunha, Gláucio de Mello; UFES; Bressan-Smith, Ricardo Enrique; UENF; Pinto, Ronald José Barth; UEM

2008-01-01

Nineteen late flowering lettuce genotypes from the UENF horticultural germoplasm bank were evaluated for phosphorus utilization efficiency under a 10 mg.dm-3 P level. A biometrical analysis of genetic parameters and genetic, phenotypic and environment correlations between shoot and root dry matter production, P content in roots and shoot Puptake, P-translocation and P utilization efficiency was undertaken. Genetic variability, which could be promising to obtain positive response to selection,...
Hepatitis C Virus Genie: A Web 2.0 Interpretation and Analytics Platform for the Versant Hepatitis C Virus Genotype Line Probe Assay Version 2.0.

Science.gov (United States)

Dussaq, Alex M; Soni, Abha; Willey, Christopher; Park, Seung L; Harada, Shuko

2017-01-01

Hepatitis C virus (HCV) genotyping at our institution is performed using the Versant HCV genotype 2.0 Line Probe Assay (LiPA). The last steps of this procedure are manual, laborious, and error-prone process that involves the comparison of the banding pattern on a test strip to a physical reference table. We developed a web-based HCV genotype interpretation platform that utilizes a scanned image to generate the genotypes, thus minimizing interpretation time and reducing error. HCV Genie 2 utilizes a database of banding patterns in conjuncture with image analysis algorithms to determine the genotype for any number of scanned LiPA strips. HCV Genie 2 is built with client-side JavaScript; allowing the program to run in the user' browser rather than on an unknown server, essentially eliminating data and patient privacy concerns. HCV Genie 2 was tested over 2 months and proved identical to human expert interpretation for 148 samples (>1000 bands identified). Manual intervention was required only for two faint bands and one false-positive band; this was done utilizing the built-in-user interface. Utilizing the original method, the trained laboratory technician interpretation time for 16 samples was 13.8 (±0.96) min as compared to 5.0 (±1.09) min with HCV Genie 2, a 63.8% decrease. In addition to the time savings, the new method provides an additional validation step, which decreases the potential for errors. Our institution has moved exclusively to utilize the new techniques and tools described here. Both experienced technicians and the molecular pathologists at our institution prefer the workflow using HCV Genie. It is easier for the technicians to prepare and document, and the pathologists are more rapidly able to review and confirm results. The use of this tool will lead to increase the quality of patient care delivered through this test methodology by decreasing the potential for error. The algorithms developed here can be ported to similar band identification
Hepatitis C virus Genie: A web 2.0 interpretation and analytics platform for the Versant Hepatitis C virus genotype Line Probe Assay version 2.0

Directory of Open Access Journals (Sweden)

Alex M Dussaq

2017-01-01

Full Text Available Context: Hepatitis C virus (HCV genotyping at our institution is performed using the Versant HCV genotype 2.0 Line Probe Assay (LiPA. The last steps of this procedure are manual, laborious, and error-prone process that involves the comparison of the banding pattern on a test strip to a physical reference table. Aim: We developed a web-based HCV genotype interpretation platform that utilizes a scanned image to generate the genotypes, thus minimizing interpretation time and reducing error. Subjects and Methods: HCV Genie 2 utilizes a database of banding patterns in conjuncture with image analysis algorithms to determine the genotype for any number of scanned LiPA strips. HCV Genie 2 is built with client-side JavaScript; allowing the program to run in the user' browser rather than on an unknown server, essentially eliminating data and patient privacy concerns. Results: HCV Genie 2 was tested over 2 months and proved identical to human expert interpretation for 148 samples (>1000 bands identified. Manual intervention was required only for two faint bands and one false-positive band; this was done utilizing the built-in-user interface. Utilizing the original method, the trained laboratory technician interpretation time for 16 samples was 13.8 (±0.96 min as compared to 5.0 (±1.09 min with HCV Genie 2, a 63.8% decrease. In addition to the time savings, the new method provides an additional validation step, which decreases the potential for errors. Conclusions: Our institution has moved exclusively to utilize the new techniques and tools described here. Both experienced technicians and the molecular pathologists at our institution prefer the workflow using HCV Genie. It is easier for the technicians to prepare and document, and the pathologists are more rapidly able to review and confirm results. The use of this tool will lead to increase the quality of patient care delivered through this test methodology by decreasing the potential for error. The
The Accuracy and Bias of Single-Step Genomic Prediction for Populations Under Selection

Directory of Open Access Journals (Sweden)

Wan-Ling Hsu

2017-08-01

Full Text Available In single-step analyses, missing genotypes are explicitly or implicitly imputed, and this requires centering the observed genotypes using the means of the unselected founders. If genotypes are only available for selected individuals, centering on the unselected founder mean is not straightforward. Here, computer simulation is used to study an alternative analysis that does not require centering genotypes but fits the mean μg of unselected individuals as a fixed effect. Starting with observed diplotypes from 721 cattle, a five-generation population was simulated with sire selection to produce 40,000 individuals with phenotypes, of which the 1000 sires had genotypes. The next generation of 8000 genotyped individuals was used for validation. Evaluations were undertaken with (J or without (N μg when marker covariates were not centered; and with (JC or without (C μg when all observed and imputed marker covariates were centered. Centering did not influence accuracy of genomic prediction, but fitting μg did. Accuracies were improved when the panel comprised only quantitative trait loci (QTL; models JC and J had accuracies of 99.4%, whereas models C and N had accuracies of 90.2%. When only markers were in the panel, the 4 models had accuracies of 80.4%. In panels that included QTL, fitting μg in the model improved accuracy, but had little impact when the panel contained only markers. In populations undergoing selection, fitting μg in the model is recommended to avoid bias and reduction in prediction accuracy due to selection.
Laboratory Information Management Software for genotyping workflows: applications in high throughput crop genotyping

Directory of Open Access Journals (Sweden)

Prasanth VP

2006-08-01

Full Text Available Abstract Background With the advances in DNA sequencer-based technologies, it has become possible to automate several steps of the genotyping process leading to increased throughput. To efficiently handle the large amounts of genotypic data generated and help with quality control, there is a strong need for a software system that can help with the tracking of samples and capture and management of data at different steps of the process. Such systems, while serving to manage the workflow precisely, also encourage good laboratory practice by standardizing protocols, recording and annotating data from every step of the workflow. Results A laboratory information management system (LIMS has been designed and implemented at the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT that meets the requirements of a moderately high throughput molecular genotyping facility. The application is designed as modules and is simple to learn and use. The application leads the user through each step of the process from starting an experiment to the storing of output data from the genotype detection step with auto-binning of alleles; thus ensuring that every DNA sample is handled in an identical manner and all the necessary data are captured. The application keeps track of DNA samples and generated data. Data entry into the system is through the use of forms for file uploads. The LIMS provides functions to trace back to the electrophoresis gel files or sample source for any genotypic data and for repeating experiments. The LIMS is being presently used for the capture of high throughput SSR (simple-sequence repeat genotyping data from the legume (chickpea, groundnut and pigeonpea and cereal (sorghum and millets crops of importance in the semi-arid tropics. Conclusion A laboratory information management system is available that has been found useful in the management of microsatellite genotype data in a moderately high throughput genotyping
A germline variant in the TP53 polyadenylation signal confers cancer susceptibility

DEFF Research Database (Denmark)

Stacey, Simon N; Sulem, Patrick; Jonasdottir, Aslaug

2011-01-01

To identify new risk variants for cutaneous basal cell carcinoma, we performed a genome-wide association study of 16 million SNPs identified through whole-genome sequencing of 457 Icelanders. We imputed genotypes for 41,675 Illumina SNP chip-typed Icelanders and their relatives. In the discovery...
Immunochip Analysis Identifies Multiple Susceptibility Loci for Systemic Sclerosis

NARCIS (Netherlands)

Mayes, Maureen D.; Bossini-Castillo, Lara; Gorlova, Olga; Martin, Jose Ezequiel; Zhou, Xiaodong; Chen, Wei V.; Assassi, Shervin; Ying, Jun; Tan, Filemon K.; Arnett, Frank C.; Reveille, John D.; Guerra, Sandra; Terue, Maria; Carmona, Francisco David; Gregersen, Peter K.; Lee, Annette T.; Lopez-Isac, Elena; Ochoa, Eguzkine; Carreira, Patricia; Simeon, Carmen Pilar; Castellvi, Ivan; Angel Gonzalez-Gay, Miguel; Zhernakova, Alexandra; Padyukov, Leonid; Aarcon-Riquelme, Marta; Wijmenga, Cisca; Beretta, Lorenzo; Riemekasten, Gabriela; Witte, Torsten; Hunzelmann, Nicolas; Kreuter, Alexander; Distler, Jorg H. W.; Voskuy, Alexandre E.; Schuerwegh, Annemie J.; Hesselstrand, Roger; Nordin, Annika; Airo, Paolo; Lunardi, Claudio; Shiels, Paul; van Laar, Jacob M.; Herrick, Ariane; Worthington, Jane; Denton, Christopher; Wigley, Fredrick M.; Hummers, Laura K.; Varga, John; Hinchcliff, Monique E.; Baron, Murray; Hudson, Marie; Pope, Janet E.

2014-01-01

In this study, 1,833 systemic sclerosis (SSc) cases and 3,466 controls were genotyped with the Immunochip array. Classical alleles, amino acid residues, and SNPs across the human leukocyte antigen (HLA) region were imputed and tested. These analyses resulted in a model composed of six polymorphic
ANGSD

DEFF Research Database (Denmark)

Korneliussen, Thorfinn Sand; Albrechtsen, Anders; Nielsen, Rasmus

2014-01-01

is available at http://www.popgen.dk/angsd. The program is tested and validated on GNU/Linux systems. The program facilitates multiple input formats including BAM and imputed beagle genotype probability files. The program allow the user to choose between combinations of existing methods and can perform...
The African Genome Variation Project shapes medical genetics in Africa

Science.gov (United States)

Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.

2014-01-01

Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterisation of African genetic diversity is needed. The African Genome Variation Project (AGVP) provides a resource to help design, implement and interpret genomic studies in sub-Saharan Africa (SSA) and worldwide. The AGVP represents dense genotypes from 1,481 and whole genome sequences (WGS) from 320 individuals across SSA. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across SSA. We identify new loci under selection, including for malaria and hypertension. We show that modern imputation panels can identify association signals at highly differentiated loci across populations in SSA. Using WGS, we show further improvement in imputation accuracy supporting efforts for large-scale sequencing of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa, showing for the first time that such designs are feasible. PMID:25470054
The Oxytocin Receptor Gene ( OXTR) and Face Recognition.

Science.gov (United States)

Verhallen, Roeland J; Bosten, Jenny M; Goodbourn, Patrick T; Lawrance-Owen, Adam J; Bargary, Gary; Mollon, J D

2017-01-01

A recent study has linked individual differences in face recognition to rs237887, a single-nucleotide polymorphism (SNP) of the oxytocin receptor gene ( OXTR; Skuse et al., 2014). In that study, participants were assessed using the Warrington Recognition Memory Test for Faces, but performance on Warrington's test has been shown not to rely purely on face recognition processes. We administered the widely used Cambridge Face Memory Test-a purer test of face recognition-to 370 participants. Performance was not significantly associated with rs237887, with 16 other SNPs of OXTR that we genotyped, or with a further 75 imputed SNPs. We also administered three other tests of face processing (the Mooney Face Test, the Glasgow Face Matching Test, and the Composite Face Test), but performance was never significantly associated with rs237887 or with any of the other genotyped or imputed SNPs, after corrections for multiple testing. In addition, we found no associations between OXTR and Autism-Spectrum Quotient scores.
Criteria of GenCall score to edit marker data and methods to handle missing markers have an influence on accuracy of genomic predictions

DEFF Research Database (Denmark)

Edriss, Vahid; Guldbrandtsen, Bernt; Lund, Mogens Sandø

2013-01-01

The aim of this study was to investigate the effect of different strategies for handling low-quality or missing data on prediction accuracy for direct genomic values of protein yield, mastitis and fertility using a Bayesian variable model and a GBLUP model in the Danish Jersey population. The data...... contained 1071 Jersey bulls that were genotyped with the Illumina Bovine 50K chip. After preliminary editing, 39227 SNP remained in the dataset. Four methods to handle missing genotypes were: 1) BEAGLE: missing markers were imputed using Beagle 3.3 software, 2) COMMON: missing genotypes at a locus were...
Root phosphatase activity, plant growth and phosphorus accumulation of maize genotypes

Directory of Open Access Journals (Sweden)

Machado Cynthia Torres de Toledo

2004-01-01

Full Text Available The activity of the enzyme phosphatase (P-ase is a physiological characteristic related to plant efficiency in relation to P acquisition and utilization, and is genetically variable. As part of a study on maize genotype characterization in relation to phosphorus (P uptake and utilization efficiency, two experiments were set up to measure phosphatase (P-ase activity in intact roots of six local and improved maize varieties and two sub-populations. Plants were grown at one P level in nutrient solution (4 mg L-1 and the P-ase activity assay was run using 17-day-old plants for varieties and 24-day-old plants for subpopulations. Shoot and root dry matter yields and P concentrations and contents in plant parts were determined, as well as P-efficiency indexes. Root P-ase activity differed among varieties, and highest enzimatic activities were observed in two local varieties -'Catetão' and 'Caiano' -and three improved varieties -'Sol da Manhã', 'Nitrodente' and 'BR 106'. 'Carioca', a local variety, had the lowest activity. Between subpopulations, 'ND2', with low yielding and poorly P-efficient plants, presented higher root P-ase activity as compared to 'ND10', high yielding and highly P-efficient plants. In general, subpopulations presented lower P-ase activities as compared to varieties. Positive and/or negative correlations were obtained between P-ase activity and P-efficiency characteristics, specific for the genotypes, not allowing inference on a general and clear association between root-secreted phosphatase and dry matter production or P acquisition. Genotypic variability must be known and considered before using P-ase activity as an indicator of P nutritional status, or P tolerance, adaptation and efficiency under low P conditions.
RAPD markers for screening shoot gall maker (Betousa stylophora Swinhoe tolerant genotypes of amla (Phyllanthus emblica L.

Directory of Open Access Journals (Sweden)

Sethuraman Thilaga

2017-12-01

Full Text Available Phyllanthus emblica Linn. is the most important medicinally useful tree crop in Asian Subcontinent and is severely infested by Betousa stylophora Swinhoe, known as shoot gall maker (SGM. This pest tunnels the shoots of seedlings and actively growing branches of trees and develops gall, leading to stunted growth, unusual branching and death of actively growing shoots. Our study revealed that trees possessing smooth bark were free from the attack of this pest than those with rough bark surface. Unfortunately, this character is not detectable either at seedling stage or during early growth of trees in the orchard. RAPD genetic fingerprinting of trees possessing smooth and rough bark revealed distinguishable and highly reproducible DNA banding pattern between the two genotypes. Of the 20 RAPD primers tested, five of them produced distinguishable RAPD bands between rough and smooth barked genotypes of P. emblica. Trees with smooth bark produced five unique RAPD bands with molecular weight ranging from 350 bp to 1500 bp and those with rough bark produced six RAPD bands (350 bp–650 bp to utilize these DNA bands as potential DNA marker for screening tolerant genotypes of this crop against SGM. The utility of this finding in genetic improvement of this tree crop against SGM is discussed.

Linking Bacillus cereus Genotypes and Carbohydrate Utilization Capacity.

Directory of Open Access Journals (Sweden)

Alicja K Warda

Full Text Available We characterised carbohydrate utilisation of 20 newly sequenced Bacillus cereus strains isolated from food products and food processing environments and two laboratory strains, B. cereus ATCC 10987 and B. cereus ATCC 14579. Subsequently, genome sequences of these strains were analysed together with 11 additional B. cereus reference genomes to provide an overview of the different types of carbohydrate transporters and utilization systems found in B. cereus strains. The combined application of API tests, defined growth media experiments and comparative genomics enabled us to link the carbohydrate utilisation capacity of 22 B. cereus strains with their genome content and in some cases to the panC phylogenetic grouping. A core set of carbohydrates including glucose, fructose, maltose, trehalose, N-acetyl-glucosamine, and ribose could be used by all strains, whereas utilisation of other carbohydrates like xylose, galactose, and lactose, and typical host-derived carbohydrates such as fucose, mannose, N-acetyl-galactosamine and inositol is limited to a subset of strains. Finally, the roles of selected carbohydrate transporters and utilisation systems in specific niches such as soil, foods and the human host are discussed.
Genotyping Applications for Transplantation and Transfusion Management: The Emory Experience.

Science.gov (United States)

Fasano, Ross M; Sullivan, Harold Cliff; Bray, Robert A; Gebel, Howard M; Meyer, Erin K; Winkler, Annie M; Josephson, Cassandra D; Stowell, Sean R; Sandy Duncan, Alexander; Roback, John D

2017-03-01

Current genotyping methodologies for transplantation and transfusion management employ multiplex systems that allow for simultaneous detection of multiple HLA antigens, human platelet antigens, and red blood cell (RBC) antigens. The development of high-resolution, molecular HLA typing has led to improved outcomes in unrelated hematopoietic stem cell transplants by better identifying compatible alleles of the HLA-A, B, C, DRB1, and DQB1 antigens. In solid organ transplantation, the combination of high-resolution HLA typing with solid-phase antibody identification has proven of value for highly sensitized patients and has significantly reduced incompatible crossmatches at the time of organ allocation. This database-driven, combined HLA antigen/antibody testing has enabled routine implementation of "virtual crossmatching" and may even obviate the need for physical crossmatching. In addition, DNA-based testing for RBC antigens provides an alternative typing method that mitigates many of the limitations of hemagglutination-based phenotyping. Although RBC genotyping has utility in various transfusion settings, it has arguably been most useful for minimizing alloimmunization in the management of transfusion-dependent patients with sickle cell disease or thalassemia. The availability of high-throughput RBC genotyping for both individuals and large populations of donors, along with coordinated informatics systems to compare patients' antigen profiles with available antigen-negative and/or rare blood-typed donors, holds promise for improving the efficiency, reliability, and extent of RBC matching for this population.
Multiple imputation of rainfall missing data in the Iberian Mediterranean context

Science.gov (United States)

Miró, Juan Javier; Caselles, Vicente; Estrela, María José

2017-11-01

Given the increasing need for complete rainfall data networks, in recent years have been proposed diverse methods for filling gaps in observed precipitation series, progressively more advanced that traditional approaches to overcome the problem. The present study has consisted in validate 10 methods (6 linear, 2 non-linear and 2 hybrid) that allow multiple imputation, i.e., fill at the same time missing data of multiple incomplete series in a dense network of neighboring stations. These were applied for daily and monthly rainfall in two sectors in the Júcar River Basin Authority (east Iberian Peninsula), which is characterized by a high spatial irregularity and difficulty of rainfall estimation. A classification of precipitation according to their genetic origin was applied as pre-processing, and a quantile-mapping adjusting as post-processing technique. The results showed in general a better performance for the non-linear and hybrid methods, highlighting that the non-linear PCA (NLPCA) method outperforms considerably the Self Organizing Maps (SOM) method within non-linear approaches. On linear methods, the Regularized Expectation Maximization method (RegEM) was the best, but far from NLPCA. Applying EOF filtering as post-processing of NLPCA (hybrid approach) yielded the best results.
Heterogeneous recombination among Hepatitis B virus genotypes.

Science.gov (United States)

Castelhano, Nadine; Araujo, Natalia M; Arenas, Miguel

2017-10-01

The rapid evolution of Hepatitis B virus (HBV) through both evolutionary forces, mutation and recombination, allows this virus to generate a large variety of adapted variants at both intra and inter-host levels. It can, for instance, generate drug resistance or the diverse viral genotypes that currently exist in the HBV epidemics. Concerning the latter, it is known that recombination played a major role in the emergence and genetic diversification of novel genotypes. In this regard, the quantification of viral recombination in each genotype can provide relevant information to devise expectations about the evolutionary trends of the epidemic. Here we measured the amount of this evolutionary force by estimating global and local recombination rates in >4700 HBV complete genome sequences corresponding to nine (A to I) HBV genotypes. Counterintuitively, we found that genotype E presents extremely high levels of recombination, followed by genotypes B and C. On the other hand, genotype G presents the lowest level, where recombination is almost negligible. We discuss these findings in the light of known characteristics of these genotypes. Additionally, we present a phylogenetic network to depict the evolutionary history of the studied HBV genotypes. This network clearly classified all genotypes into specific groups and indicated that diverse pairs of genotypes are derived from a common ancestor (i.e., C-I, D-E and, F-H) although still the origin of this virus presented large uncertainty. Altogether we conclude that the amount of observed recombination is heterogeneous among HBV genotypes and that this heterogeneity can influence on the future expansion of the epidemic. Copyright © 2017 Elsevier B.V. All rights reserved.
Evaluation of High Resolution Melting for MTHFR C677T Genotyping in Congenital Heart Disease.

Directory of Open Access Journals (Sweden)

Ying Wang

Full Text Available High resolution melting (HRM is a simple, flexible and low-cost mutation screening technique. The methylenetetrahydrofolate reductase (MTHFR gene encoding a critical enzyme, potentially affects susceptibility to some congenital defects like congenital heart disease (CHD. We evaluate the performance of HRM for genotyping of the MTHFR gene C677T locus in CHD cases and healthy controls of Chinese Han population.A total of 315 blood samples from 147 CHD patients (male72, female 75 and 168 healthy controls (male 92, female 76 were enrolled in the study. HRM was utilized to genotype MTHFR C677T locus of all the samples. The results were compared to that of PCR-RFLP and Sanger sequencing. The association of the MTHFR C677T genotypes and the risk of CHD was analyzed using odds ratio with their 95% confidence interval (CIs from unconditional logistic regression.All the samples were successfully genotyped by HRM within 1 hour and 30 minutes while at least 6 hours were needed for PCR-RFLP and sequencing. The genotypes of MTHFR C677T CC, CT, and TT were 9.52%, 49.66%, and 40.82% in CHD group but 29.17%, 50% and 20.83% in control group, which were identical using both methods of HRM and PCR-RFLP, demonstrating the sensitivity and specificity of HRM were all 100%.MTHFR C677T is a potential risk factor for CHD in our local residents of Shandong province in China. HRM is a fast, sensitive, specific and reliable method for clinical application of genotyping.
Genomic analyses identify hundreds of variants associated with age at menarche and support a role for puberty timing in cancer risk

NARCIS (Netherlands)

Day, Felix R; Thompson, Deborah J; Helgason, Hannes; Chasman, Daniel I; Finucane, Hilary; Sulem, Patrick; Ruth, Katherine S; Whalen, Sean; Sarkar, Abhishek K; Albrecht, Eva; Altmaier, Elisabeth; Amini, Marzyeh; Barbieri, Caterina M; Boutin, Thibaud; Campbell, Archie; Demerath, Ellen; Giri, Ayush; He, Chunyan; Hottenga, Jouke J; Karlsson, Robert; Kolcic, Ivana; Loh, Po-Ru; Lunetta, Kathryn L; Mangino, Massimo; Marco, Brumat; McMahon, George; Medland, Sarah E; Nolte, Ilja M; Noordam, Raymond; Nutile, Teresa; Paternoster, Lavinia; Perjakova, Natalia; Porcu, Eleonora; Rose, Lynda M; Schraut, Katharina E; Segrè, Ayellet V; Smith, Albert V; Stolk, Lisette; Teumer, Alexander; Andrulis, Irene L; Bandinelli, Stefania; Beckmann, Matthias W; Benitez, Javier; Bergmann, Sven; Bochud, Murielle; de Geus, Eco J C N; Mbarek, Hamdi; Willemsen, Gonneke; Boomsma, Dorret I; Visser, Jenny A

2017-01-01

The timing of puberty is a highly polygenic childhood trait that is epidemiologically associated with various adult diseases. Using 1000 Genomes Project-imputed genotype data in up to ∼370,000 women, we identify 389 independent signals (P < 5 × 10(-8)) for age at menarche, a milestone in female
[Evaluation of hepatitis B virus genotyping EIA kit].

Science.gov (United States)

Tanaka, Yasuhito; Sugauchi, Fuminaka; Matsuuraa, Kentaro; Naganuma, Hatsue; Tatematsu, Kanako; Takagi, Kazumi; Hiramatsu, Kumiko; Kani, Satomi; Gotoh, Takaaki; Wakimoto, Yukio; Mizokami, Masashi

2009-01-01

Clinical significance of Hepatitis B virus(HBV) genotyping is increasingly recognized. The aim of this study was to evaluate reproducibility, accuracy, and sensitivity of an enzyme immunoassay (EIA) based HBV genotyping kit, which designed to discriminate between genotypes to A, B, C, or D by detecting genotype-specific epitopes in PreS2 region. Using the four genotypes panels, the EIA demonstrated complete inter and intra-assay genotyping reproducibility. Serum specimens had stable results after 8 days at 4 degrees C, or 10 cycles of freezing-thawing. In 91 samples that have been genotyped by DNA sequencing, 87(95.6%) were in complete accordance with EIA genotyping. Of examined 344 HBsAg-positive serum specimens, genotypes A, B, C and D were determined in 26 (7.6%), 62 (18.0%), 228 (66.3%), and 9 (2.6%) cases, respectively. Of 19 (5.5%) specimens unclassified by the EIA, 13 were found to have low titer of HBsAg concentration (< 3 IU/ml), and the other 5 had amino acid mutations or deletions within targeted PreS2 epitopes. The EIA allowed genotyping even in HBV DNA negative samples (96.2%). In conclusion, HBV genotype EIA is reliable, sensitive and easy assay for HBV genotyping. The assay would be useful for clinical use.
The Comparison of Growth, Slaughter and Carcass Traits of Meat Chicken Genotype Produced by Back-Crossing with A Commercial Broiler Genotype

Directory of Open Access Journals (Sweden)

Musa Sarıca

2014-01-01

Full Text Available This study was conducted to determine the growth and some slaughter traits between commercial fast growing chickens and three-way cross M2 genotypes. 260 male female mixed chickens from each genotype was reared 10 replicate per genotype in the same house. Two different slaughtering ages were applied to commercial chickens and slaughtered at 6 and 7 weeks of age for comparing with cross genotypes. F chickens reached to slaughtering age at 42 days, whereas cross groups reached at 49 days. Genotypes consumed same amount of feed until slaughtering ages, but F genotype had better feed conversion ratio. The differences between dressing percentage and carcass parts ratios of genotypes were found significant, and F genotype had higher dressing percentage. Carcass parts of all genotypes were found in acceptable limits.
Genetic composition of social groups influences male aggressive behaviour and fitness in natural genotypes of Drosophila melanogaster.

Science.gov (United States)

Saltz, Julia B

2013-11-22

Indirect genetic effects (IGEs) describe how an individual's behaviour-which is influenced by his or her genotype-can affect the behaviours of interacting individuals. IGE research has focused on dyads. However, insights from social networks research, and other studies of group behaviour, suggest that dyadic interactions are affected by the behaviour of other individuals in the group. To extend IGE inferences to groups of three or more, IGEs must be considered from a group perspective. Here, I introduce the 'focal interaction' approach to study IGEs in groups. I illustrate the utility of this approach by studying aggression among natural genotypes of Drosophila melanogaster. I chose two natural genotypes as 'focal interactants': the behavioural interaction between them was the 'focal interaction'. One male from each focal interactant genotype was present in every group, and I varied the genotype of the third male-the 'treatment male'. Genetic variation in the treatment male's aggressive behaviour influenced the focal interaction, demonstrating that IGEs in groups are not a straightforward extension of IGEs measured in dyads. Further, the focal interaction influenced male mating success, illustrating the role of IGEs in behavioural evolution. These results represent the first manipulative evidence for IGEs at the group level.
Identification of unusual Chlamydia pecorum genotypes in Victorian koalas (Phascolarctos cinereus) and clinical variables associated with infection.

Science.gov (United States)

Legione, Alistair R; Patterson, Jade L S; Whiteley, Pam L; Amery-Gale, Jemima; Lynch, Michael; Haynes, Leesa; Gilkerson, James R; Polkinghorne, Adam; Devlin, Joanne M; Sansom, Fiona M

2016-05-01

Chlamydia pecorum infection is a threat to the health of free-ranging koalas (Phascolarctos cinereus) in Australia. Utilizing an extensive sample archive we determined the prevalence of C. pecorum in koalas within six regions of Victoria, Australia. The ompA genotypes of the detected C. pecorum were characterized to better understand the epidemiology of this pathogen in Victorian koalas. Despite many studies in northern Australia (i.e. Queensland and New South Wales), prior Chlamydia studies in Victorian koalas are limited. We detected C. pecorum in 125/820 (15 %) urogenital swabs, but in only one ocular swab. Nucleotide sequencing of the molecular marker C. pecorum ompA revealed that the majority (90/114) of C. pecorum samples typed were genotype B. This genotype has not been reported in northern koalas. In general, Chlamydia infection in Victorian koalas is associated with milder clinical signs compared with infection in koalas in northern populations. Although disease pathogenesis is likely to be multifactorial, the high prevalence of genotype B in Victoria may suggest it is less pathogenic. All but three koalas had C. pecorum genotypes unique to southern koala populations (i.e. Victoria and South Australia). These included a novel C. pecorum ompA genotype and two genotypes associated with livestock. Regression analysis determined that significant factors for the presence of C. pecorum infection were sex and geographical location. The presence of 'wet bottom' in males and the presence of reproductive tract pathology in females were significantly associated with C. pecorum infection, suggesting variation in clinical disease manifestations between sexes.
A window into the transcriptomic basis of genotype-by-genotype interactions in the legume-rhizobia mutualism.

Science.gov (United States)

Wood, Corlett W; Stinchcombe, John R

2017-11-01

The maintenance of genetic variation in the benefits provided by mutualists is an evolutionary puzzle (Heath & Stinchcombe, ). Over time, natural selection should favour the benefit strategy that confers the highest fitness, eroding genetic variation in partner quality. Yet abundant genetic variation in partner quality exists in many systems (Heath & Stinchcombe, ). One possible resolution to this puzzle is that the genetic identity of both a host and its partner affects the benefits each mutualist provides to the other, a pattern known as a genotype-by-genotype interaction (Figure ). Mounting evidence suggests that genotype-by-genotype interactions between partners are pervasive at the phenotypic level (Barrett, Zee, Bever, Miller, & Thrall, ; Heath, ; Hoeksema & Thompson, ). Ultimately, however, to link these phenotypic patterns to the maintenance of genetic variation in mutualisms we need to answer two questions: How much variation in mutualism phenotypes is attributable to genotype-by-genotype interactions, and what mutualistic functions are influenced by each partner and by the interaction between their genomes? In this issue of Molecular Ecology, Burghardt et al. (2017) use transcriptomics to address both questions in the legume-rhizobia mutualism. © 2017 John Wiley & Sons Ltd.
Stress Sensitivity Is Associated with Differential Accumulation of Reactive Oxygen and Nitrogen Species in Maize Genotypes with Contrasting Levels of Drought Tolerance

Science.gov (United States)

Yang, Liming; Fountain, Jake C.; Wang, Hui; Ni, Xinzhi; Ji, Pingsheng; Lee, Robert D.; Kemerait, Robert C.; Scully, Brian T.; Guo, Baozhu

2015-01-01

Drought stress decreases crop growth, yield, and can further exacerbate pre-harvest aflatoxin contamination. Tolerance and adaptation to drought stress is an important trait of agricultural crops like maize. However, maize genotypes with contrasting drought tolerances have been shown to possess both common and genotype-specific adaptations to cope with drought stress. In this research, the physiological and metabolic response patterns in the leaves of maize seedlings subjected to drought stress were investigated using six maize genotypes including: A638, B73, Grace-E5, Lo964, Lo1016, and Va35. During drought treatments, drought-sensitive maize seedlings displayed more severe symptoms such as chlorosis and wilting, exhibited significant decreases in photosynthetic parameters, and accumulated significantly more reactive oxygen species (ROS) and reactive nitrogen species (RNS) than tolerant genotypes. Sensitive genotypes also showed rapid increases in enzyme activities involved in ROS and RNS metabolism. However, the measured antioxidant enzyme activities were higher in the tolerant genotypes than in the sensitive genotypes in which increased rapidly following drought stress. The results suggest that drought stress causes differential responses to oxidative and nitrosative stress in maize genotypes with tolerant genotypes with slower reaction and less ROS and RNS production than sensitive ones. These differential patterns may be utilized as potential biological markers for use in marker assisted breeding. PMID:26492235
Stress Sensitivity Is Associated with Differential Accumulation of Reactive Oxygen and Nitrogen Species in Maize Genotypes with Contrasting Levels of Drought Tolerance

Directory of Open Access Journals (Sweden)

Liming Yang

2015-10-01

Full Text Available Drought stress decreases crop growth, yield, and can further exacerbate pre-harvest aflatoxin contamination. Tolerance and adaptation to drought stress is an important trait of agricultural crops like maize. However, maize genotypes with contrasting drought tolerances have been shown to possess both common and genotype-specific adaptations to cope with drought stress. In this research, the physiological and metabolic response patterns in the leaves of maize seedlings subjected to drought stress were investigated using six maize genotypes including: A638, B73, Grace-E5, Lo964, Lo1016, and Va35. During drought treatments, drought-sensitive maize seedlings displayed more severe symptoms such as chlorosis and wilting, exhibited significant decreases in photosynthetic parameters, and accumulated significantly more reactive oxygen species (ROS and reactive nitrogen species (RNS than tolerant genotypes. Sensitive genotypes also showed rapid increases in enzyme activities involved in ROS and RNS metabolism. However, the measured antioxidant enzyme activities were higher in the tolerant genotypes than in the sensitive genotypes in which increased rapidly following drought stress. The results suggest that drought stress causes differential responses to oxidative and nitrosative stress in maize genotypes with tolerant genotypes with slower reaction and less ROS and RNS production than sensitive ones. These differential patterns may be utilized as potential biological markers for use in marker assisted breeding.
Population-based V3 genotypic tropism assay: a retrospective analysis using screening samples from the A4001029 and MOTIVATE studies.

Science.gov (United States)

McGovern, Rachel A; Thielen, Alexander; Mo, Theresa; Dong, Winnie; Woods, Conan K; Chapman, Douglass; Lewis, Marilyn; James, Ian; Heera, Jayvant; Valdez, Hernan; Harrigan, P Richard

2010-10-23

The MOTIVATE-1 and 2 studies compared maraviroc (MVC) along with optimized background therapy (OBT) vs. placebo along with OBT in treatment-experienced patients screened as having R5-HIV (original Monogram Trofile). A subset screened with non-R5 HIV were treated with MVC or placebo along with OBT in a sister safety trial, A4001029. This analysis retrospectively examined the performance of population-based sequence analysis of HIV-1 env V3-loop to predict coreceptor tropism. Triplicate V3-loop sequences were generated using stored screening plasma samples and data was processed using custom software ('ReCall'), blinded to clinical response. Tropism was inferred using geno2pheno ('g2p'; 5% false positive rate). Primary outcomes were viral load changes after starting maraviroc; and concordance with prior screening Trofile results. Genotype and Trofile results were available for 1164 individuals with virological outcome data (N = 169 non-R5 by Trofile). Compared with Trofile, V3 genotyping had a specificity of 92.6% and a sensitivity of 67.4% for detecting non-R5 virus. However, when compared with clinical outcome, virological responses were consistently similar between Trofile and V3 genotype at weeks 8 and 24 following the initiation of therapy for patients categorized as R5. Despite differences in sensitivity for predicting non-R5 HIV, week 8 and 24 week virological responses were similar in this treatment-experienced population. These findings suggest the potential utility of V3 genotyping as an accessible assay to select patients who may benefit from maraviroc treatment. Optimization of the predictive tropism algorithm may lead to further improvement in the clinical utility of HIV genotypic tropism assays.
Popcorn genotypes resistance to fall armyworm

Directory of Open Access Journals (Sweden)

Nádia Cristina de Oliveira

2018-02-01

Full Text Available ABSTRACT: The aim of this study was to evaluate popcorn genotypes for resistance to the fall armyworm, Spodoptera frugiperda. The experiment used a completely randomized design with 30 replicates. The popcorn genotypes Aelton, Arzm 05 083, Beija-Flor, Colombiana, Composto Chico, Composto Gaúcha, Márcia, Mateus, Ufvm Barão Viçosa, Vanin, and Viviane were evaluated,along with the common maize variety Zapalote Chico. Newly hatched fall armyworm larvae were individually assessed with regard to biological development and consumption of food. The data were subjected to multivariate analyses of variance and genetic divergence among genotypes was evaluated through the clustering methods of Tocher based on generalized Mahalanobis distances and canonical variable analyses. Seven popcorn genotypes, namely, Aelton, Arzm 05 083, Composto Chico, Composto Gaúcha, Márcia, Mateus, and Viviane,were shown to form a cluster (cluster I that had antibiosis as the mechanism of resistance to the pest. Cluster I genotypes and the Zapalote Chico genotype could be used for stacking genes for antibiosis and non-preference resistance.
The potential of plant viruses to promote genotypic diversity via genotype x environment interactions

DEFF Research Database (Denmark)

van Mölken, Tamara; Stuefer, Josef F.

2011-01-01

† Background and Aims Genotype by environment (G × E) interactions are important for the long-term persistence of plant species in heterogeneous environments. It has often been suggested that disease is a key factor for the maintenance of genotypic diversity in plant populations. However, empirical...... and the G × E interactions were examined with respect to genotypespecific plant responses to WClMV infection. Thus, the environment is defined as the presence or absence of the virus. † Key Results WClMV had a negative effect on plant performance as shown by a decrease in biomass and number of ramets...... evidence for this contention is scarce. Here virus infection is proposed as a possible candidate for maintaining genotypic diversity in their host plants. † Methods The effects of White clover mosaic virus (WClMV) on the performance and development of different Trifolium repens genotypes were analysed...
Identification of QTL on chromosome 18 associated with non-coagulating milk in Swedish Red cows

Directory of Open Access Journals (Sweden)

Sandrine I. Duchemin

2016-04-01

Full Text Available Non-coagulating (NC milk, defined as milk not coagulating within 40 min after rennet-addition, can have a negative influence on cheese production. Its prevalence is estimated at 18% in the Swedish Red (SR cow population. Our study aimed at identifying genomic regions and causal variants associated with NC milk in SR cows, by doing a GWAS using 777k SNP genotypes and using imputed sequences to fine map the most promising genomic region. Phenotypes were available from 382 SR cows belonging to 21 herds in the south of Sweden, from which individual morning milk was sampled. NC milk was treated as a binary trait, receiving a score of one in case of non-coagulation within 40 minutes. For all 382 SR cows, 777k SNP genotypes were available as well as the combined genotypes of the genetic variants of αs1-β-κ-caseins. In addition, whole–genome sequences from the 1000Bull Genome Consortium (Run 3 were available for 429 animals of 15 different breeds. From these sequences, 33 sequences belonged to SR and Finish Ayrshire bulls with a large impact in the SR cow population. Single-marker analyses were run in ASReml using an animal model. After fitting the casein loci, 14 associations at –Log10(Pvalue > 6 identified a promising region located on BTA18. We imputed sequences to the 382 genotyped SR cows using Beagle 4 for half of BTA18, and ran a region-wide association study with imputed sequences. In a 7 mega base-pairs region on BTA18, our strongest association with NC milk explained almost 34% of the genetic variation in NC milk. Since it is possible that multiple QTL are in strong LD in this region, 59 haplotypes were built, genetically differentiated by means of a phylogenetic tree, and tested in phenotype-genotype association studies. Haplotype analyses support the existence of one QTL underlying NC milk in SR cows. A candidate gene of interest is the VPS35 gene, for which one of our strongest association is an intron SNP in this gene. The VPS35
Combination of individual tree detection and area-based approach in imputation of forest variables using airborne laser data

Science.gov (United States)

Vastaranta, Mikko; Kankare, Ville; Holopainen, Markus; Yu, Xiaowei; Hyyppä, Juha; Hyyppä, Hannu

2012-01-01

The two main approaches to deriving forest variables from laser-scanning data are the statistical area-based approach (ABA) and individual tree detection (ITD). With ITD it is feasible to acquire single tree information, as in field measurements. Here, ITD was used for measuring training data for the ABA. In addition to automatic ITD (ITD auto), we tested a combination of ITD auto and visual interpretation (ITD visual). ITD visual had two stages: in the first, ITD auto was carried out and in the second, the results of the ITD auto were visually corrected by interpreting three-dimensional laser point clouds. The field data comprised 509 circular plots ( r = 10 m) that were divided equally for testing and training. ITD-derived forest variables were used for training the ABA and the accuracies of the k-most similar neighbor ( k-MSN) imputations were evaluated and compared with the ABA trained with traditional measurements. The root-mean-squared error (RMSE) in the mean volume was 24.8%, 25.9%, and 27.2% with the ABA trained with field measurements, ITD auto, and ITD visual, respectively. When ITD methods were applied in acquiring training data, the mean volume, basal area, and basal area-weighted mean diameter were underestimated in the ABA by 2.7-9.2%. This project constituted a pilot study for using ITD measurements as training data for the ABA. Further studies are needed to reduce the bias and to determine the accuracy obtained in imputation of species-specific variables. The method could be applied in areas with sparse road networks or when the costs of fieldwork must be minimized.
Sequence data and association statistics from 12,940 type 2 diabetes cases and controls.

Science.gov (United States)

Flannick, Jason; Fuchsberger, Christian; Mahajan, Anubha; Teslovich, Tanya M; Agarwala, Vineeta; Gaulton, Kyle J; Caulkins, Lizz; Koesterer, Ryan; Ma, Clement; Moutsianas, Loukas; McCarthy, Davis J; Rivas, Manuel A; Perry, John R B; Sim, Xueling; Blackwell, Thomas W; Robertson, Neil R; Rayner, N William; Cingolani, Pablo; Locke, Adam E; Tajes, Juan Fernandez; Highland, Heather M; Dupuis, Josee; Chines, Peter S; Lindgren, Cecilia M; Hartl, Christopher; Jackson, Anne U; Chen, Han; Huyghe, Jeroen R; van de Bunt, Martijn; Pearson, Richard D; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M; Gamazon, Eric R; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A; Below, Jennifer E; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L; Pasko, Dorota; Parker, Stephen C J; Varga, Tibor V; Green, Todd; Beer, Nicola L; Day-Williams, Aaron G; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F; Han, Bok-Ghee; Jenkinson, Christopher P; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C Y; Palmer, Nicholette D; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D; Neale, Benjamin M; Purcell, Shaun; Butterworth, Adam S; Howson, Joanna M M; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K L; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H T; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E; Rybin, Dennis; Farook, Vidya S; Fowler, Sharon P; Freedman, Barry I; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; Loh, Marie; Musani, Solomon K; Puppala, Sobha; Scott, William R; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C; Mangino, Massimo; Bonnycastle, Lori L; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L; Herder, Christian; Groves, Christopher J; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A; Doney, Alex S F; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeri; Hollensted, Mette; Jørgensen, Marit E; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H; Stirrups, Kathleen; Wood, Andrew R; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N A; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M; Syvänen, Ann-Christine; Bergman, Richard N; Bharadwaj, Dwaipayan; Bottinger, Erwin P; Cho, Yoon Shin; Chandak, Giriraj R; Chan, Juliana Cn; Chia, Kee Seng; Daly, Mark J; Ebrahim, Shah B; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A; Lehman, Donna M; Jia, Weiping; Ma, Ronald C W; Pollin, Toni I; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J F; Small, Kerrin S; Ried, Janina S; DeFronzo, Ralph A; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R; Gloyn, Anna L; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D; Hattersley, Andrew T; Bowden, Donald W; Collins, Francis S; Atzmon, Gil; Chambers, John C; Spector, Timothy D; Laakso, Markku; Strom, Tim M; Bell, Graeme I; Blangero, John; Duggirala, Ravindranath; Tai, E Shyong; McVean, Gilean; Hanis, Craig L; Wilson, James G; Seielstad, Mark; Frayling, Timothy M; Meigs, James B; Cox, Nancy J; Sladek, Rob; Lander, Eric S; Gabriel, Stacey; Mohlke, Karen L; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Scott, Laura J; Morris, Andrew P; Kang, Hyun Min; Altshuler, David; Burtt, Noël P; Florez, Jose C; Boehnke, Michael; McCarthy, Mark I

2017-12-19

To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D.
Effect of UBE2L3 genotype on regulation of the linear ubiquitin chain assembly complex in systemic lupus erythematosus.

Science.gov (United States)

Lewis, Myles; Vyse, Simon; Shields, Adrian; Boeltz, Sebastian; Gordon, Patrick; Spector, Timothy; Lehner, Paul; Walczak, Henning; Vyse, Timothy

2015-02-26

A single risk haplotype across UBE2L3 is strongly associated with systemic lupus erythematosus (SLE) and many other autoimmune diseases. UBE2L3 is an E2 ubiquitin-conjugating enzyme with specificity for RING-in-between-RING E3 ligases, including HOIL-1 and HOIP, components of the linear ubiquitin chain assembly complex (LUBAC), which has a pivotal role in inflammation, through crucial regulation of NF-κB. We aimed to determine whether UBE2L3 regulates LUBAC-mediated activation of NF-κB, and determine the effect of UBE2L3 genotype on NF-κB activation and B-cell differentiation. UBE2L3 genotype data from SLE genome-wide association studies was imputed by use of 1000 Genomes data. UBE2L3 function was studied in a HEK293-NF-κB reporter cell line with standard molecular biology techniques. p65 NF-κB translocation in ex-vivo B cells and monocytes from genotyped healthy individuals was quantified by imaging flow cytometry. B-cell subsets from healthy individuals and patients with SLE, stratified by UBE2L3 genotype, were determined by multicolour flow cytometry. rs140490, located at -270 base pairs of the UBE2L3 promoter, was identified as the most strongly associated single nucleotide polymorphism (p=8·6 × 10(-14), odds ratio 1·30, 95% CI 1·21-1·39). The rs140490 risk allele increased UBE2L3 expression in B cells and monocytes. Marked upregulation of NF-κB was observed with combined overexpression of UBE2L3 and LUBAC, but abolished by dominant-negative mutant UBE2L3 (C86S), or UBE2L3 silencing. The rs140490 genotype correlated with basal NF-κB activation in ex-vivo human B cells and monocytes, as well as NF-κB sensitivity to CD40 or tumour necrosis factor (TNF) stimulation. UBE2L3 expression was 3-4 times higher in circulating plasmablasts and plasma cells than in other B-cell subsets, with higher levels in patients with SLE than in controls. The rs140490 genotype correlated with increasing plasmablast and plasma cell differentiation in patients with SLE

Hepatitis C Virus: Virology and Genotypes

KAUST Repository

Abdelaziz, Ahmed

2017-12-01

Hepatitis C virus (HCV) is a major causative agent of chronic liver disease worldwide. HCV is characterized by genetic heterogeneity, with at least six genotypes identified. The geographic distribution of genotypes has shown variations in different parts of the world over the past decade because of variations in population structure, immigration, and routes of transmission. Genotype differences are of epidemiologic interest and help the study of viral transmission dynamics to trace the source of HCV infection in a given population. HCV genotypes are also of considerable clinical importance because they affect response to antiviral therapy and represent a challenging obstacle for vaccine development.
Use of Sequenom sample ID Plus® SNP genotyping in identification of FFPE tumor samples.

Directory of Open Access Journals (Sweden)

Jessica K Miller

Full Text Available Short tandem repeat (STR analysis, such as the AmpFlSTR® Identifiler® Plus kit, is a standard, PCR-based human genotyping method used in the field of forensics. Misidentification of cell line and tissue DNA can be costly if not detected early; therefore it is necessary to have quality control measures such as STR profiling in place. A major issue in large-scale research studies involving archival formalin-fixed paraffin embedded (FFPE tissues is that varying levels of DNA degradation can result in failure to correctly identify samples using STR genotyping. PCR amplification of STRs of several hundred base pairs is not always possible when DNA is degraded. The Sample ID Plus® panel from Sequenom allows for human DNA identification and authentication using SNP genotyping. In comparison to lengthy STR amplicons, this multiplexing PCR assay requires amplification of only 76-139 base pairs, and utilizes 47 SNPs to discriminate between individual samples. In this study, we evaluated both STR and SNP genotyping methods of sample identification, with a focus on paired FFPE tumor/normal DNA samples intended for next-generation sequencing (NGS. The ability to successfully validate the identity of FFPE samples can enable cost savings by reducing rework.
Genetic potentiality of indigenous rice genotypes from Eastern India with reference to submergence tolerance and deepwater traits

Directory of Open Access Journals (Sweden)

Sayani Goswami

2017-09-01

Full Text Available Submergence tolerance in rice varieties is crucial for maintaining stable yields in low land areas, where recurrence of flooding is a constant phenomenon during monsoon. We have conducted detailed physiological and genotyping studies of 27 rice genotypes and one wild rice relative, popularly grown in low land areas of the two major rice growing states of eastern India, West Bengal and Odisha with a focus on submergence tolerance traits and Sub1 loci. We found that these genotypes show varying degree (50–100% survival rate during post submergence recovery period, and high degree of polymorphism in the Sub1 linked rice microsatellite loci RM219 and RM7175. Detailed allelic diversity study of Sub1A loci suggests that rice varieties IR42, Panibhasha, Khoda and Kalaputia share a common allele that is different from FR13A, Keralasundari, Bhashakalmi, Kumrogore. Two other genotypes Meghi and Khoda shares both alleles of Sub1A loci (present in IR42 and FR13A groups in addition to a new variant. Detailed sequence analysis of the amplified product for the Sub1A loci from these genotypes showed several single nucleotide changes with respect to reference Oryza sativa Sub1A loci (DQ011598. Three rice genotypes (Meghi, Bhashakalmi and Keralasundari showed beneficial properties in relation to induced submergence stress and can be considered as valuable genetic source in context of utilization of natural rice genetic resources in breeding program for submergence tolerance.
Developmental plasticity: re-conceiving the genotype.

Science.gov (United States)

Sultan, Sonia E

2017-10-06

In recent decades, the phenotype of an organism (i.e. its traits and behaviour) has been studied as the outcome of a developmental 'programme' coded in its genotype. This deterministic view is implicit in the Modern Synthesis approach to adaptive evolution as a sorting process among genetic variants. Studies of developmental pathways have revealed that genotypes are in fact differently expressed depending on environmental conditions. Accordingly, the genotype can be understood as a repertoire of potential developmental outcomes or norm of reaction. Reconceiving the genotype as an environmental response repertoire rather than a fixed developmental programme leads to three critical evolutionary insights. First, plastic responses to specific conditions often comprise functionally appropriate trait adjustments, resulting in an individual-level, developmental mode of adaptive variation. Second, because genotypes are differently expressed depending on the environment, the genetic diversity available to natural selection is itself environmentally contingent. Finally, environmental influences on development can extend across multiple generations via cytoplasmic and epigenetic factors transmitted to progeny individuals, altering their responses to their own, immediate environmental conditions and, in some cases, leading to inherited but non-genetic adaptations. Together, these insights suggest a more nuanced understanding of the genotype and its evolutionary role, as well as a shift in research focus to investigating the complex developmental interactions among genotypes, environments and previous environments.
Performance of chickpea genotypes under Swat valley conditions

International Nuclear Information System (INIS)

Khan, A.; Rahim, M.; Ahmad, F.; Ali, A.

2004-01-01

Twenty-two genetically diverse chickpeas genotypes were studied for their physiological efficiency to select the most desirable genotype/genotypes for breeding program on chickpea. Genotype 'CM7-1' was found physiologically efficient stain with maximum harvest index (37.33%) followed by genotype 'CM1571-1-A' with harvest index of 35.73%. Genotype '90206' produced maximum biological yield (7463 kg ha/sup -1/) followed by genotypes 'CM31-1' and 'E-2034' with biological yield of 7352 and 7167 kg ha/sup -1/, respectively. Harvest index and economic yield showed significant positive correlation value of (r=+0.595), while negative correlation value of (r = -0.435) was observed between harvest index and biological yield. (author)
Corticostriatal Connectivity in Antisocial Personality Disorder by MAO-A Genotype and Its Relationship to Aggressive Behavior.

Science.gov (United States)

Kolla, Nathan J; Dunlop, Katharine; Meyer, Jeffrey H; Downar, Jonathan

2018-05-09

The influence of genetic variation on resting-state neural networks represents a burgeoning line of inquiry in psychiatric research. Monoamine oxidase A, an X-linked gene, is one example of a molecular target linked to brain activity in psychiatric illness. Monoamine oxidase A genetic variants, including the high and low variable nucleotide tandem repeat polymorphisms, have been shown to differentially affect brain functional connectivity in healthy humans. However, it is currently unknown whether these same polymorphisms influence resting-state brain activity in clinical conditions. Given its high burden on society and strong connection to violent behavior, antisocial personality disorder is a logical condition to study, since in vivo markers of monoamine oxidase A brain enzyme are reduced in key affect-modulating regions, and striatal levels of monoamine oxidase A show a relation with the functional connectivity of this same region. We utilized monoamine oxidase A genotyping and seed-to-voxel-based functional connectivity to investigate the relationship between genotype and corticostriatal connectivity in 21 male participants with severe antisocial personality disorder and 19 male healthy controls. Dorsal striatal connectivity to the frontal pole and anterior cingulate gyrus differentiated antisocial personality disorder subjects and healthy controls by monoamine oxidase A genotype. Furthermore, the linear relationship of proactive aggression to superior ventral striatal-angular gyrus functional connectivity differed by monoamine oxidase A genotype in the antisocial personality disorder groups. These results suggest that monoamine oxidase A genotype may affect corticostriatal connectivity in antisocial personality disorder and that these functional connections may also underlie use of proactive aggression in a genotype-specific manner.
Human papillomavirus genotyping using an automated film-based chip array.

Science.gov (United States)

Erali, Maria; Pattison, David C; Wittwer, Carl T; Petti, Cathy A

2009-09-01

The INFINITI HPV-QUAD assay is a commercially available genotyping platform for human papillomavirus (HPV) that uses multiplex PCR, followed by automated processing for primer extension, hybridization, and detection. The analytical performance of the HPV-QUAD assay was evaluated using liquid cervical cytology specimens, and the results were compared with those results obtained using the digene High-Risk HPV hc2 Test (HC2). The specimen types included Surepath and PreservCyt transport media, as well as residual SurePath and HC2 transport media from the HC2 assay. The overall concordance of positive and negative results following the resolution of indeterminate and intermediate results was 83% among the 197 specimens tested. HC2 positive (+) and HPV-QUAD negative (-) results were noted in 24 specimens that were shown by real-time PCR and sequence analysis to contain no HPV, HPV types that were cross-reactive in the HC2 assay, or low virus levels. Conversely, HC2 (-) and HPV-QUAD (+) results were noted in four specimens and were subsequently attributed to cross-contamination. The most common HPV types to be identified in this study were HPV16, HPV18, HPV52/58, and HPV39/56. We show that the HPV-QUAD assay is a user friendly, automated system for the identification of distinct HPV genotypes. Based on its analytical performance, future studies with this platform are warranted to assess its clinical utility for HPV detection and genotyping.
Further characterization of field strains of rotavirus from Nigeria VP4 genotype P6 most frequently identified among symptomatically infected children.

Science.gov (United States)

Adah, M I; Rohwedder, A; Olaleye, O D; Durojaiye, O A; Werchau, H

1997-10-01

Polymerase chain reaction was utilized to characterize the VP4 types of 39 Rotavirus field isolates from symptomatically infected children in Nigeria. Genotype P6 was identified most frequently, occurring in 41.03 per cent of the typed specimens. Genotype P8 was identified as the next most prevalent (33.3% per cent). Genotype p6 was widespread (68.75 per cent) among infected neonates in Southern Nigeria, but mix infection was more prevalent (70 per cent) among Northern Nigerian children. Four distinct strains were identified with four different P genotypes. Overall strain G1P8 predominated (22.22 per cent) followed by G3P6 (17.8 per cent). Strain G1P8 was most prevalent (70 per cent) among infants aged 3.1-9 months, but strain G3P6 was most frequently identified among neonates occurance of mix infection genotype demonstrates the potential for reassortment events among different rotavirus genogroups in Nigeria. The epidemiological implications of these findings for rotavirus vaccine development and application in the country were discussed.
Immunochip analyses identify a novel risk locus for primary biliary cirrhosis at 13q14, multiple independent associations at four established risk loci and epistasis between 1p31 and 7q32 risk variants

Science.gov (United States)

Juran, Brian D.; Hirschfield, Gideon M.; Invernizzi, Pietro; Atkinson, Elizabeth J.; Li, Yafang; Xie, Gang; Kosoy, Roman; Ransom, Michael; Sun, Ye; Bianchi, Ilaria; Schlicht, Erik M.; Lleo, Ana; Coltescu, Catalina; Bernuzzi, Francesca; Podda, Mauro; Lammert, Craig; Shigeta, Russell; Chan, Landon L.; Balschun, Tobias; Marconi, Maurizio; Cusi, Daniele; Heathcote, E. Jenny; Mason, Andrew L.; Myers, Robert P.; Milkiewicz, Piotr; Odin, Joseph A.; Luketic, Velimir A.; Bacon, Bruce R.; Bodenheimer, Henry C.; Liakina, Valentina; Vincent, Catherine; Levy, Cynthia; Franke, Andre; Gregersen, Peter K.; Bossa, Fabrizio; Gershwin, M. Eric; deAndrade, Mariza; Amos, Christopher I.; Lazaridis, Konstantinos N.; Seldin, Michael F.; Siminovitch, Katherine A.

2012-01-01

To further characterize the genetic basis of primary biliary cirrhosis (PBC), we genotyped 2426 PBC patients and 5731 unaffected controls from three independent cohorts using a single nucleotide polymorphism (SNP) array (Immunochip) enriched for autoimmune disease risk loci. Meta-analysis of the genotype data sets identified a novel disease-associated locus near the TNFSF11 gene at 13q14, provided evidence for association at six additional immune-related loci not previously implicated in PBC and confirmed associations at 19 of 22 established risk loci. Results of conditional analyses also provided evidence for multiple independent association signals at four risk loci, with haplotype analyses suggesting independent SNP effects at the 2q32 and 16p13 loci, but complex haplotype driven effects at the 3q25 and 6p21 loci. By imputing classical HLA alleles from this data set, four class II alleles independently contributing to the association signal from this region were identified. Imputation of genotypes at the non-HLA loci also provided additional associations, but none with stronger effects than the genotyped variants. An epistatic interaction between the IL12RB2 risk locus at 1p31and the IRF5 risk locus at 7q32 was also identified and suggests a complementary effect of these loci in predisposing to disease. These data expand the repertoire of genes with potential roles in PBC pathogenesis that need to be explored by follow-up biological studies. PMID:22936693
Rare coding variants in PLCG2, ABI3 and TREM2 implicate microglial-mediated innate immunity in Alzheimer’s disease

Science.gov (United States)

Sims, Rebecca; van der Lee, Sven J.; Naj, Adam C.; Bellenguez, Céline; Badarinarayan, Nandini; Jakobsdottir, Johanna; Kunkle, Brian W.; Boland, Anne; Raybould, Rachel; Bis, Joshua C.; Martin, Eden R.; Grenier-Boley, Benjamin; Heilmann-Heimbach, Stefanie; Chouraki, Vincent; Kuzma, Amanda B.; Sleegers, Kristel; Vronskaya, Maria; Ruiz, Agustin; Graham, Robert R.; Olaso, Robert; Hoffmann, Per; Grove, Megan L.; Vardarajan, Badri N.; Hiltunen, Mikko; Nöthen, Markus M.; White, Charles C.; Hamilton-Nelson, Kara L.; Epelbaum, Jacques; Maier, Wolfgang; Choi, Seung-Hoan; Beecham, Gary W.; Dulary, Cécile; Herms, Stefan; Smith, Albert V.; Funk, Cory C.; Derbois, Céline; Forstner, Andreas J.; Ahmad, Shahzad; Li, Hongdong; Bacq, Delphine; Harold, Denise; Satizabal, Claudia L.; Valladares, Otto; Squassina, Alessio; Thomas, Rhodri; Brody, Jennifer A.; Qu, Liming; Sanchez-Juan, Pascual; Morgan, Taniesha; Wolters, Frank J.; Zhao, Yi; Garcia, Florentino Sanchez; Denning, Nicola; Fornage, Myriam; Malamon, John; Naranjo, Maria Candida Deniz; Majounie, Elisa; Mosley, Thomas H.; Dombroski, Beth; Wallon, David; Lupton, Michelle K; Dupuis, Josée; Whitehead, Patrice; Fratiglioni, Laura; Medway, Christopher; Jian, Xueqiu; Mukherjee, Shubhabrata; Keller, Lina; Brown, Kristelle; Lin, Honghuang; Cantwell, Laura B.; Panza, Francesco; McGuinness, Bernadette; Moreno-Grau, Sonia; Burgess, Jeremy D.; Solfrizzi, Vincenzo; Proitsi, Petra; Adams, Hieab H.; Allen, Mariet; Seripa, Davide; Pastor, Pau; Cupples, L. Adrienne; Price, Nathan D; Hannequin, Didier; Frank-García, Ana; Levy, Daniel; Chakrabarty, Paramita; Caffarra, Paolo; Giegling, Ina; Beiser, Alexa S.; Giedraitis, Vimantas; Hampel, Harald; Garcia, Melissa E.; Wang, Xue; Lannfelt, Lars; Mecocci, Patrizia; Eiriksdottir, Gudny; Crane, Paul K.; Pasquier, Florence; Boccardi, Virginia; Henández, Isabel; Barber, Robert C.; Scherer, Martin; Tarraga, Lluis; Adams, Perrie M.; Leber, Markus; Chen, Yuning; Albert, Marilyn S.; Riedel-Heller, Steffi; Emilsson, Valur; Beekly, Duane; Braae, Anne; Schmidt, Reinhold; Blacker, Deborah; Masullo, Carlo; Schmidt, Helena; Doody, Rachelle S.; Spalletta, Gianfranco; Longstreth, WT; Fairchild, Thomas J.; Bossù, Paola; Lopez, Oscar L.; Frosch, Matthew P.; Sacchinelli, Eleonora; Ghetti, Bernardino; Sánchez-Juan, Pascual; Yang, Qiong; Huebinger, Ryan M.; Jessen, Frank; Li, Shuo; Kamboh, M. Ilyas; Morris, John; Sotolongo-Grau, Oscar; Katz, Mindy J.; Corcoran, Chris; Himali, Jayanadra J.; Keene, C. Dirk; Tschanz, JoAnn; Fitzpatrick, Annette L.; Kukull, Walter A.; Norton, Maria; Aspelund, Thor; Larson, Eric B.; Munger, Ron; Rotter, Jerome I.; Lipton, Richard B.; Bullido, María J; Hofman, Albert; Montine, Thomas J.; Coto, Eliecer; Boerwinkle, Eric; Petersen, Ronald C.; Alvarez, Victoria; Rivadeneira, Fernando; Reiman, Eric M.; Gallo, Maura; O’Donnell, Christopher J.; Reisch, Joan S.; Bruni, Amalia Cecilia; Royall, Donald R.; Dichgans, Martin; Sano, Mary; Galimberti, Daniela; St George-Hyslop, Peter; Scarpini, Elio; Tsuang, Debby W.; Mancuso, Michelangelo; Bonuccelli, Ubaldo; Winslow, Ashley R.; Daniele, Antonio; Wu, Chuang-Kuo; Peters, Oliver; Nacmias, Benedetta; Riemenschneider, Matthias; Heun, Reinhard; Brayne, Carol; Rubinsztein, David C; Bras, Jose; Guerreiro, Rita; Hardy, John; Al-Chalabi, Ammar; Shaw, Christopher E; Collinge, John; Mann, David; Tsolaki, Magda; Clarimón, Jordi; Sussams, Rebecca; Lovestone, Simon; O’Donovan, Michael C; Owen, Michael J; Behrens, Timothy W.; Mead, Simon; Goate, Alison M.; Uitterlinden, Andre G.; Holmes, Clive; Cruchaga, Carlos; Ingelsson, Martin; Bennett, David A.; Powell, John; Golde, Todd E.; Graff, Caroline; De Jager, Philip L.; Morgan, Kevin; Ertekin-Taner, Nilufer; Combarros, Onofre; Psaty, Bruce M.; Passmore, Peter; Younkin, Steven G; Berr, Claudine; Gudnason, Vilmundur; Rujescu, Dan; Dickson, Dennis W.; Dartigues, Jean-Francois; DeStefano, Anita L.; Ortega-Cubero, Sara; Hakonarson, Hakon; Campion, Dominique; Boada, Merce; Kauwe, John “Keoni”; Farrer, Lindsay A.; Van Broeckhoven, Christine; Ikram, M. Arfan; Jones, Lesley; Haines, Johnathan; Tzourio, Christophe; Launer, Lenore J.; Escott-Price, Valentina; Mayeux, Richard; Deleuze, Jean-François; Amin, Najaf; Holmans, Peter A; Pericak-Vance, Margaret A.; Amouyel, Philippe; van Duijn, Cornelia M.; Ramirez, Alfredo; Wang, Li-San; Lambert, Jean-Charles; Seshadri, Sudha; Williams, Julie; Schellenberg, Gerard D.

2017-01-01

Introduction We identified rare coding variants associated with Alzheimer’s disease (AD) in a 3-stage case-control study of 85,133 subjects. In stage 1, 34,174 samples were genotyped using a whole-exome microarray. In stage 2, we tested associated variants (P<1×10-4) in 35,962 independent samples using de novo genotyping and imputed genotypes. In stage 3, an additional 14,997 samples were used to test the most significant stage 2 associations (P<5×10-8) using imputed genotypes. We observed 3 novel genome-wide significant (GWS) AD associated non-synonymous variants; a protective variant in PLCG2 (rs72824905/p.P522R, P=5.38×10-10, OR=0.68, MAFcases=0.0059, MAFcontrols=0.0093), a risk variant in ABI3 (rs616338/p.S209F, P=4.56×10-10, OR=1.43, MAFcases=0.011, MAFcontrols=0.008), and a novel GWS variant in TREM2 (rs143332484/p.R62H, P=1.55×10-14, OR=1.67, MAFcases=0.0143, MAFcontrols=0.0089), a known AD susceptibility gene. These protein-coding changes are in genes highly expressed in microglia and highlight an immune-related protein-protein interaction network enriched for previously identified AD risk genes. These genetic findings provide additional evidence that the microglia-mediated innate immune response contributes directly to AD development. PMID:28714976
Hepatitis C virus genotypes in Bahawalpur

International Nuclear Information System (INIS)

Qazi, M.A.; Fayyaz, M.; Chaudhry, G.M.D.; Jamil, A.

2006-01-01

This study was conducted at Medical Unit-II Bahawal Victoria Hospital / Quaid-e-Azam Medical College Bahawalpur from May 1st , 2005 to December 31st 2005. The objective of this study was to determine hepatitis C virus (HCV) genotypes in Bahawalpur, Pakistan. In consecutive 105 anti-HCV (ELISA-3) positive patients, complete history and physical examination was performed. Liver function tests, complete blood counts and platelet count, blood sugar fasting and 2 hours after breakfast, prothrombin time, serum albumin, serum globulin and abdominal ultrasound were carried out in all the patients. Tru cut biopsy was performed on 17 patients. We studied HCV RNA in all these patients by Nested PCR method. HCV RNA was detected in 98 patients and geno typing assay was done by genotype specific PCR. Among total of 105 anti-HCV positive patients, HCV-RNA was detected in 98 patients. Out of these 98 patients there were 57 (58.2%) males and 41 (42.8%) females. Their age range was 18-75 years. The age 18-29 years 26 (26.5%), 30-39 years 35 (35.7%) and 40-75 37 (37.8%), while 10 (10.2%) patients were diabetics and 34 (34.7%) patients were obese. Liver cirrhosis was present in 10 (10.2%) patients. Forty two (43.9%) patients were symptomatic while 56 (57.1%) were asymptomatic. Out of 98 patients 11 (11.2%) were un type-able and 87 (88.8%) were type able. 70/98 (71.4%) were genotype 3; 10/98 (10.2%) were genotype 1; 03/98 (3.1%) were genotype 2; 03/98 (3.1%) were mixed genotype 2 and 3; 01/98 (1%) were mixed genotype 3a and 3b. Genotype 3 is the most common HCV virus in our area which shows that both virological and biochemical response will be better. Because HCV genotype 3 is more frequent among the drug users which points towards unsafe injection practices in our area. (author)
Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture

DEFF Research Database (Denmark)

Zheng, Hou-Feng; Forgetta, Vincenzo; Hsu, Yi-Hsiang

2015-01-01

. Associations for BMD were derived from whole-genome sequencing (n = 2,882 from UK10K (ref. 10); a population-based genome sequencing consortium), whole-exome sequencing (n = 3,549), deep imputation of genotyped samples using a combined UK10K/1000 Genomes reference panel (n = 26,534), and de novo replication...
Imputing Variants in HLA-DR Beta Genes Reveals That HLA-DRB1 Is Solely Associated with Rheumatoid Arthritis and Systemic Lupus Erythematosus.

Directory of Open Access Journals (Sweden)

Kwangwoo Kim

Full Text Available The genetic association of HLA-DRB1 with rheumatoid arthritis (RA and systemic lupus erythematosus (SLE is well documented, but association with other HLA-DR beta genes (HLA-DRB3, HLA-DRB4 and HLA-DRB5 has not been thoroughly studied, despite their similar functions and chromosomal positions. We examined variants in all functional HLA-DR beta genes in RA and SLE patients and controls, down to the amino-acid level, to better understand disease association with the HLA-DR locus. To this end, we improved an existing HLA reference panel to impute variants in all protein-coding HLA-DR beta genes. Using the reference panel, HLA variants were inferred from high-density SNP data of 9,271 RA-control subjects and 5,342 SLE-control subjects. Disease association tests were performed by logistic regression and log-likelihood ratio tests. After imputation using the newly constructed HLA reference panel and statistical analysis, we observed that HLA-DRB1 variants better accounted for the association between MHC and susceptibility to RA and SLE than did the other three HLA-DRB variants. Moreover, there were no secondary effects in HLA-DRB3, HLA-DRB4, or HLA-DRB5 in RA or SLE. Of all the HLA-DR beta chain paralogs, those encoded by HLA-DRB1 solely or dominantly influence susceptibility to RA and SLE.
Grain yield stability of early maize genotypes

Directory of Open Access Journals (Sweden)

Chitra Bahadur Kunwar

2016-12-01

Full Text Available The objective of this study was to estimate grain yield stability of early maize genotypes. Five early maize genotypes namely Pool-17, Arun1EV, Arun-4, Arun-2 and Farmer’s variety were evaluated using Randomized Complete Block Design along with three replications at four different locations namely Rampur, Rajahar, Pakhribas and Kabre districts of Nepal during summer seasons of three consecutive years from 2010 to 2012 under farmer’s fields. Genotype and genotype × environment (GGE biplot was used to identify superior genotype for grain yield and stability pattern. The genotypes Arun-1 EV and Arun-4 were better adapted for Kabre and Pakhribas where as pool-17 for Rajahar environments. The overall findings showed that Arun-1EV was more stable followed by Arun-2 therefore these two varieties can be recommended to farmers for cultivation in both environments.
Scanning fluorescence detector for high-throughput DNA genotyping

Science.gov (United States)

Rusch, Terry L.; Petsinger, Jeremy; Christensen, Carl; Vaske, David A.; Brumley, Robert L., Jr.; Luckey, John A.; Weber, James L.

1996-04-01

A new scanning fluorescence detector (SCAFUD) was developed for high-throughput genotyping of short tandem repeat polymorphisms (STRPs). Fluorescent dyes are incorporated into relatively short DNA fragments via polymerase chain reaction (PCR) and are separated by electrophoresis in short, wide polyacrylamide gels (144 lanes with well to read distances of 14 cm). Excitation light from an argon laser with primary lines at 488 and 514 nm is introduced into the gel through a fiber optic cable, dichroic mirror, and 40X microscope objective. Emitted fluorescent light is collected confocally through a second fiber. The confocal head is translated across the bottom of the gel at 0.5 Hz. The detection unit utilizes dichroic mirrors and band pass filters to direct light with 10 - 20 nm bandwidths to four photomultiplier tubes (PMTs). PMT signals are independently amplified with variable gain and then sampled at a rate of 2500 points per scan using a computer based A/D board. LabView software (National Instruments) is used for instrument operation. Currently, three fluorescent dyes (Fam, Hex and Rox) are simultaneously detected with peak detection wavelengths of 543, 567, and 613 nm, respectively. The detection limit for fluorescein-labeled primers is about 100 attomoles. Planned SCAFUD upgrades include rearrangement of laser head geometry, use of additional excitation lasers for simultaneous detection of more dyes, and the use of detector arrays instead of individual PMTs. Extensive software has been written for automatic analysis of SCAFUD images. The software enables background subtraction, band identification, multiple- dye signal resolution, lane finding, band sizing and allele calling. Whole genome screens are currently underway to search for loci influencing such complex diseases as diabetes, asthma, and hypertension. Seven production SCAFUDs are currently in operation. Genotyping output for the coming year is projected to be about one million total genotypes (DNA
Specificity of the Linear Array HPV Genotyping Test for detecting human papillomavirus genotype 52 (HPV-52)

OpenAIRE

Kocjan, Boštjan; Poljak, Mario; Oštrbenk, Anja

2015-01-01

Introduction: HPV-52 is one of the most frequent human papillomavirus (HPV) genotypes causing significant cervical pathology. The most widely used HPV genotyping assay, the Roche Linear Array HPV Genotyping Test (Linear Array), is unable to identify HPV- 52 status in samples containing HPV-33, HPV-35, and/or HPV-58. Methods: Linear Array HPV-52 analytical specificity was established by testing 100 specimens reactive with the Linear Array HPV- 33/35/52/58 cross-reactive probe, but not with the...
Genotypic diversity of root and shoot characteristics of

Directory of Open Access Journals (Sweden)

ali ganjali

2009-06-01

Full Text Available Root and shoot characteristics of chickpea (Cicer arietinum L. genotypes are believed to be important in drought tolerance. There is a little information about the response of genotypes root growth in hydroponics and greenhouse culture, also the relationships between root size and drought tolerance. This study was conducted to observe whether genotypes differ in root size, and to see that root size is associated with drought tolerance during early vegetative growth. We found significant differences (p0.01 in root dry weight, total root length, tap root length, root area, leaf dry weight, leaf area and shoot biomass per plant among 30 genotypes of chickpea grown in hydroponics culture for three weeks. Each of these parameters correlated with all others, positively. Among 30 genotypes, 10 genotypes with different root sizes were selected and were grown in a greenhouse in sand culture experiment under drought stress (FC %30 for three weeks. There were not linear or non-linear significant correlations between root characters in hydroponics and greenhouse environments. It seems that environmental factors are dominant on genetic factors in seedling stage and so, the expression of genotypics potential for root growth characteristics of genotypes are different in hydroponic and greenhouse conditions. In this study, the selection of genotypes with vigorous roots system in hydroponic condition did not lead to genotypes with the same root characters in greenhouse environment. The genotype×drought interactions for root characters of chickpea seedlings in 30 days were not significant (p
Cost Effectiveness of Genotype-Guided Warfarin Dosing in Patients with Mechanical Heart Valve Replacement Under the Fee-for-Service System.

Science.gov (United States)

Kim, Dong-Jin; Kim, Ho-Sook; Oh, Minkyung; Kim, Eun-Young; Shin, Jae-Gook

2017-10-01

Although studies assessing the cost effectiveness of genotype-guided warfarin dosing for the management of atrial fibrillation, deep vein thrombosis, and pulmonary embolism have been reported, no publications have addressed genotype-guided warfarin therapy in mechanical heart valve replacement (MHVR) patients or genotype-guided warfarin therapy under the fee-for-service (FFS) insurance system. The aim of this study was to evaluate the cost effectiveness of genotype-guided warfarin dosing in patients with MHVR under the FFS system from the Korea healthcare sector perspective. A decision-analytic Markov model was developed to evaluate the cost effectiveness of genotype-guided warfarin dosing compared with standard dosing. Estimates of clinical adverse event rates and health state utilities were derived from the published literature. The outcome measure was the incremental cost-effectiveness ratio (ICER) per quality-adjusted life-year (QALY). One-way and probabilistic sensitivity analyses were performed to explore the range of plausible results. In a base-case analysis, genotype-guided warfarin dosing was associated with marginally higher QALYs than standard warfarin dosing (6.088 vs. 6.083, respectively), at a slightly higher cost (US$6.8) (year 2016 values). The ICER was US$1356.2 per QALY gained. In probabilistic sensitivity analysis, there was an 82.7% probability that genotype-guided dosing was dominant compared with standard dosing, and a 99.8% probability that it was cost effective at a willingness-to-pay threshold of US$50,000 per QALY gained. Compared with only standard warfarin therapy, genotype-guided warfarin dosing was cost effective in MHVR patients under the FFS insurance system.
Comparative genotyping of Clostridium thermocellum strains isolated from biogas plants: genetic markers and characterization of cellulolytic potential.

Science.gov (United States)

Koeck, Daniela E; Zverlov, Vladimir V; Liebl, Wolfgang; Schwarz, Wolfgang H

2014-07-01

Clostridium thermocellum is among the most prevalent of known anaerobic cellulolytic bacteria. In this study, genetic and phenotypic variations among C. thermocellum strains isolated from different biogas plants were determined and different genotyping methods were evaluated on these isolates. At least two C. thermocellum strains were isolated independently from each of nine different biogas plants via enrichment on cellulose. Various DNA-based genotyping methods such as ribotyping, RAPD (Random Amplified Polymorphic DNA) and VNTR (Variable Number of Tandem Repeats) were applied to these isolates. One novel approach - the amplification of unknown target sequences between copies of a previously discovered Random Inserted Mobile Element (RIME) - was also tested. The genotyping method with the highest discriminatory power was found to be the amplification of the sequences between the insertion elements, where isolates from each biogas plant yielded a different band pattern. Cellulolytic potentials, optimal growth conditions and substrate spectra of all isolates were characterized to help identify phenotypic variations. Irrespective of the genotyping method used, the isolates from each individual biogas plant always exhibited identical patterns. This is suggestive of a single C. thermocellum strain exhibiting dominance in each biogas plant. The genotypic groups reflect the results of the physiological characterization of the isolates like substrate diversity and cellulase activity. Conversely, strains isolated across a range of biogas plants differed in their genotyping results and physiological properties. Both strains isolated from one biogas plant had the best specific cellulose-degrading properties and might therefore achieve superior substrate utilization yields in biogas fermenters. Copyright © 2014 Elsevier GmbH. All rights reserved.
Welcome to the neighbourhood: interspecific genotype by genotype interactions in Solidago influence above- and belowground biomass and associated communities.

Science.gov (United States)

Genung, Mark A; Bailey, Joseph K; Schweitzer, Jennifer A

2012-01-01

Intra- and interspecific plant-plant interactions are fundamental to patterns of community assembly and to the mixture effects observed in biodiversity studies. Although much research has been conducted at the species level, very little is understood about how genetic variation within and among interacting species may drive these processes. Using clones of both Solidago altissima and Solidago gigantea, we found that genotypic variation in a plant's neighbours affected both above- and belowground plant traits, and that genotype by genotype interactions between neighbouring plants impacted associated pollinator communities. The traits for which focal plant genotypic variation explained the most variation varied by plant species, whereas neighbour genotypic variation explained the most variation in coarse root biomass. Our results provide new insight into genotypic and species diversity effects in plant-neighbour interactions, the extended consequences of diversity effects, and the potential for evolution in response to competitive or to facilitative plant-neighbour interactions. © 2011 Blackwell Publishing Ltd/CNRS.

Echinococcus granulosus genotypes in Iran

Science.gov (United States)

Sharafi, Seyedeh Maryam; Rostami-Nejad, Mohammad; Moazeni, Mohammad; Yousefi, Morteza; Saneie, Behnam; Hosseini-Safa, Ahmad

2014-01-01

Hydatidosis, caused by Echinococcus granulosus is one of the most important zoonotic diseases, throughout most parts of the world. Hydatidosis is endemic in Iran and responsible for approximately 1% of admission to surgical wards. There are extensive genetic variations within E. granulosus and 10 different genotypes (G1–G10) within this parasite have been reported. Identification of strains is important for improvement of control and prevention of the disease. No new review article presented the situation of Echinococcus granulosus genotypes in Iran in the recent years; therefore in this paper we reviewed the different studies regarding Echinococcus granulosus genotypes in Iran. PMID:24834298
Genotypic Resistance Tests Sequences Reveal the Role of Marginalized Populations in HIV-1 Transmission in Switzerland.

Science.gov (United States)

Shilaih, Mohaned; Marzel, Alex; Yang, Wan Lin; Scherrer, Alexandra U; Schüpbach, Jörg; Böni, Jürg; Yerly, Sabine; Hirsch, Hans H; Aubert, Vincent; Cavassini, Matthias; Klimkait, Thomas; Vernazza, Pietro L; Bernasconi, Enos; Furrer, Hansjakob; Günthard, Huldrych F; Kouyos, Roger

2016-06-14

Targeting hard-to-reach/marginalized populations is essential for preventing HIV-transmission. A unique opportunity to identify such populations in Switzerland is provided by a database of all genotypic-resistance-tests from Switzerland, including both sequences from the Swiss HIV Cohort Study (SHCS) and non-cohort sequences. A phylogenetic tree was built using 11,127 SHCS and 2,875 Swiss non-SHCS sequences. Demographics were imputed for non-SHCS patients using a phylogenetic proximity approach. Factors associated with non-cohort outbreaks were determined using logistic regression. Non-B subtype (univariable odds-ratio (OR): 1.9; 95% confidence interval (CI): 1.8-2.1), female gender (OR: 1.6; 95% CI: 1.4-1.7), black ethnicity (OR: 1.9; 95% CI: 1.7-2.1) and heterosexual transmission group (OR:1.8; 95% CI: 1.6-2.0), were all associated with underrepresentation in the SHCS. We found 344 purely non-SHCS transmission clusters, however, these outbreaks were small (median 2, maximum 7 patients) with a strong overlap with the SHCS'. 65% of non-SHCS sequences were part of clusters composed of >= 50% SHCS sequences. Our data suggests that marginalized-populations are underrepresented in the SHCS. However, the limited size of outbreaks among non-SHCS patients in-care implies that no major HIV outbreak in Switzerland was missed by the SHCS surveillance. This study demonstrates the potential of sequence data to assess and extend the scope of infectious-disease surveillance.
Experimental evidence for competitive growth advantage of genotype VII over VI: implications for foot-and-mouth disease virus serotype A genotype turnover in nature.

Science.gov (United States)

Mohapatra, J K; Subramaniam, S; Singh, N K; Sanyal, A; Pattnaik, B

2012-04-01

In India, systematic genotype replacement has been observed for serotype A foot-and-mouth disease virus. After a decade of co-circulation of genotypes VI and VII, genotype VII emerged as the single dominant genotype since 2001. To derive possible explanations for such epochal evolution dynamics, in vitro intergenotype growth competition experiments involving both co- and superinfection regimes were conducted. Coinfection of BHK-21 cells demonstrated abrupt loss in the genotype VI viral load with commensurate increase in the load of genotype VII as measured by the genotype differentiating ELISA, RT-PCR and real-time RT-PCR. The superinfection dynamics was shaped by temporal spacing of infection, where the invading genotype VII took more number of passages than coinfection to eventually overtake the resident genotype VI. It was speculated that such superior replicative fitness of genotype VII could have been a possible factor for the ultimate dominance of genotype VII in nature. Copyright © 2011 Elsevier Ltd. All rights reserved.
Assessment of RAPD Markers to Analyse the Genetic Diversity among Sunflower (Helianthus annuus L. Genotypes

Directory of Open Access Journals (Sweden)

Ali Raza

2018-02-01

Full Text Available Genetic diversity estimation among different species is an important tool for genetic improvement to maximize the yield, desirable quality, wider adaptation, pest and insect resistance that ultimately boosting traditional plant breeding methods. The most efficient way of diversity estimation is application of molecular markers. In this study, twenty random amplified polymorphic DNA (RAPD primers were utilized to estimate the genetic diversity between ten sunflower genotypes. Overall 227 bands were amplified by 20 primers with an average of 11.35 bands per primer. RAPD data showed 86.34% polymorophic bands and 13.65% of monomorophic bands. Genetic similarity was ranged from 50.22% to 87.22%. The lowest similarity (50.22% was observed between FH-352 and FH-359 and the maximum similarity 87.22% was observed between A-23 and G-46. Polymorphic information content (PIC values were varying from 0.05 to 0.12 with a mean of 0.09. Cluster analysis based on RAPD results displayed two major distinct groups 1 and 2. Group-2 contains FH-352 which was the most diverse genotype, while group-1 consists of few sub groups with all other genotypes. Ample diversity was found in all the genotypes. Present study reveals novel information about sunflower genome which can be used in future studies for sunflower improvement.
Single-Step BLUP with Varying Genotyping Effort in Open-Pollinated Picea glauca

Directory of Open Access Journals (Sweden)

Blaise Ratcliffe

2017-03-01

Full Text Available Maximization of genetic gain in forest tree breeding programs is contingent on the accuracy of the predicted breeding values and precision of the estimated genetic parameters. We investigated the effect of the combined use of contemporary pedigree information and genomic relatedness estimates on the accuracy of predicted breeding values and precision of estimated genetic parameters, as well as rankings of selection candidates, using single-step genomic evaluation (HBLUP. In this study, two traits with diverse heritabilities [tree height (HT and wood density (WD] were assessed at various levels of family genotyping efforts (0, 25, 50, 75, and 100% from a population of white spruce (Picea glauca consisting of 1694 trees from 214 open-pollinated families, representing 43 provenances in Québec, Canada. The results revealed that HBLUP bivariate analysis is effective in reducing the known bias in heritability estimates of open-pollinated populations, as it exposes hidden relatedness, potential pedigree errors, and inbreeding. The addition of genomic information in the analysis considerably improved the accuracy in breeding value estimates by accounting for both Mendelian sampling and historical coancestry that were not captured by the contemporary pedigree alone. Increasing family genotyping efforts were associated with continuous improvement in model fit, precision of genetic parameters, and breeding value accuracy. Yet, improvements were observed even at minimal genotyping effort, indicating that even modest genotyping effort is effective in improving genetic evaluation. The combined utilization of both pedigree and genomic information may be a cost-effective approach to increase the accuracy of breeding values in forest tree breeding programs where shallow pedigrees and large testing populations are the norm.
Single-Step BLUP with Varying Genotyping Effort in Open-Pollinated Picea glauca.

Science.gov (United States)

Ratcliffe, Blaise; El-Dien, Omnia Gamal; Cappa, Eduardo P; Porth, Ilga; Klápště, Jaroslav; Chen, Charles; El-Kassaby, Yousry A

2017-03-10

Maximization of genetic gain in forest tree breeding programs is contingent on the accuracy of the predicted breeding values and precision of the estimated genetic parameters. We investigated the effect of the combined use of contemporary pedigree information and genomic relatedness estimates on the accuracy of predicted breeding values and precision of estimated genetic parameters, as well as rankings of selection candidates, using single-step genomic evaluation (HBLUP). In this study, two traits with diverse heritabilities [tree height (HT) and wood density (WD)] were assessed at various levels of family genotyping efforts (0, 25, 50, 75, and 100%) from a population of white spruce ( Picea glauca ) consisting of 1694 trees from 214 open-pollinated families, representing 43 provenances in Québec, Canada. The results revealed that HBLUP bivariate analysis is effective in reducing the known bias in heritability estimates of open-pollinated populations, as it exposes hidden relatedness, potential pedigree errors, and inbreeding. The addition of genomic information in the analysis considerably improved the accuracy in breeding value estimates by accounting for both Mendelian sampling and historical coancestry that were not captured by the contemporary pedigree alone. Increasing family genotyping efforts were associated with continuous improvement in model fit, precision of genetic parameters, and breeding value accuracy. Yet, improvements were observed even at minimal genotyping effort, indicating that even modest genotyping effort is effective in improving genetic evaluation. The combined utilization of both pedigree and genomic information may be a cost-effective approach to increase the accuracy of breeding values in forest tree breeding programs where shallow pedigrees and large testing populations are the norm. Copyright © 2017 Ratcliffe et al.
Identification and characterization of novel associations in the CASP8/ALS2CR12 region on chromosome 2 with breast cancer risk

DEFF Research Database (Denmark)

Lin, Wei-Yu; Camp, Nicola J; Ghoussaini, Maya

2015-01-01

-nucleotide polymorphisms (SNPs) spanning a 1 Mb region around CASP8 were genotyped in 46 450 breast cancer cases and 42 600 controls of European origin from 41 studies participating in the BCAC as part of a custom genotyping array experiment (iCOGS). Missing genotypes and SNPs were imputed and, after quality exclusions......Previous studies have suggested that polymorphisms in CASP8 on chromosome 2 are associated with breast cancer risk. To clarify the role of CASP8 in breast cancer susceptibility, we carried out dense genotyping of this region in the Breast Cancer Association Consortium (BCAC). Single.......04-1.08), P = 1 × 10(-9). Analyses of gene expression associations in peripheral blood and normal breast tissue indicate that CASP8 might be the target gene, suggesting a mechanism involving apoptosis....
Genotype x environment interaction and optimum resource ...

African Journals Online (AJOL)

... x E) interaction and to determine the optimum resource allocation for cassava yield trials. The effects of environment, genotype and G x E interaction were highly significant for all yield traits. Variations due to G x E interaction were greater than those due to genotypic differences for all yield traits. Genotype x location x year ...
Variations in the growth, oil quantity and quality, and mineral nutrients of chamomile genotypes under salinity stress

Directory of Open Access Journals (Sweden)

Omid Askari-Khorasgani

2017-03-01

Full Text Available Understanding how plants respond to salinity, which severely restricts plant growth, productivity, and survival, is highly important in agriculture. Using three genotypes of Matricaria recutita L. (Shiraz, Ahvaz, and Isfahan with different sensitivity to NaCl, the effect of long-term (about 110 days NaCl treatments (2.5, 6, 9, and 12 dS*m-1 on crop growth, oil quality and quantity, and nutrient variations were investigated to underpin its agricultural management in the future. The adaptation strategy and plant responses were influenced by salinity level, genotype, and genotype × salinity interactions. With higher productivity compared to the Isfahan genotype, the Shiraz and Ahvaz genotypes had efficient Na+ exclusion at root surface as an avoidance strategy; however, under higher NaCl concentration, their higher performance were mainly attributed to the Na+ sequestration in root vacuoles and higher Ca2+/Na+, Mg2+/Na+, and root/shoot ratios as tolerance strategies. The higher oil yield and chamazulene percentage in the Isfahan genotype were not affected by salinity level and were only genotype dependent. Under 12 dS*m-1 NaCl, roots of the Shiraz and Ahvaz genotypes accumulated markedly higher Ca2+ (2.5% and 1.5% respectively and Mg2+ (1.6% and 1.3% respectively, required for membrane stability and chlorophyll synthesis, respectively, more than the Isfahan genotype (0.2% Ca and 0.1% Mg2+ and considerably more than the control plants to keep low concentrations of ion toxicity of Na2+ and Cl- in shoots. Overall, greater salt tolerance found in the Shiraz and Ahvaz genotypes could be due to a variety of mechanisms, including higher efficiency of nutrient uptake (Ca2+, Mg2+, and Zn2+, utilization (N, P, Ca2+, and Mg2+, compartmentation (Na in roots, and maintenance of higher root/shoot ratios. Taking flower and oil yield as well as chamazulene percentage into consideration, the findings recommended cultivation of the Ahvaz genotype in the absence of
Cloning of the unculturable parasite Pasteuria ramosa and its Daphnia host reveals extreme genotype-genotype interactions.

Science.gov (United States)

Luijckx, Pepijn; Ben-Ami, Frida; Mouton, Laurence; Du Pasquier, Louis; Ebert, Dieter

2011-02-01

The degree of specificity in host-parasite interactions has important implications for ecology and evolution. Unfortunately, specificity can be difficult to determine when parasites cannot be cultured. In such cases, studies often use isolates of unknown genetic composition, which may lead to an underestimation of specificity. We obtained the first clones of the unculturable bacterium Pasteuria ramosa, a parasite of Daphnia magna. Clonal genotypes of the parasite exhibited much more specific interactions with host genotypes than previous studies using isolates. Clones of P. ramosa infected fewer D. magna genotypes than isolates and host clones were either fully susceptible or fully resistant to the parasite. Our finding enhances our understanding of the evolution of virulence and coevolutionary dynamics in this system. We recommend caution when using P. ramosa isolates as the presence of multiple genotypes may influence the outcome and interpretation of some experiments. © 2010 Blackwell Publishing Ltd/CNRS.
Hepatitis C Virus: Virology and Genotypes

KAUST Repository

Abdelaziz, Ahmed

2017-01-01

Hepatitis C virus (HCV) is a major causative agent of chronic liver disease worldwide. HCV is characterized by genetic heterogeneity, with at least six genotypes identified. The geographic distribution of genotypes has shown variations in different
Whole-Genome Sequencing and iPLEX MassARRAY Genotyping Map an EMS-Induced Mutation Affecting Cell Competition in Drosophila melanogaster

Directory of Open Access Journals (Sweden)

Chang-Hyun Lee

2016-10-01

Full Text Available Cell competition, the conditional loss of viable genotypes only when surrounded by other cells, is a phenomenon observed in certain genetic mosaic conditions. We conducted a chemical mutagenesis and screen to recover new mutations that affect cell competition between wild-type and RpS3 heterozygous cells. Mutations were identified by whole-genome sequencing, making use of software tools that greatly facilitate the distinction between newly induced mutations and other sources of apparent sequence polymorphism, thereby reducing false-positive and false-negative identification rates. In addition, we utilized iPLEX MassARRAY for genotyping recombinant chromosomes. These approaches permitted the mapping of a new mutation affecting cell competition when only a single allele existed, with a phenotype assessed only in genetic mosaics, without the benefit of complementation with existing mutations, deletions, or duplications. These techniques expand the utility of chemical mutagenesis and whole-genome sequencing for mutant identification. We discuss mutations in the Atm and Xrp1 genes identified in this screen.
Discovery and fine-mapping of adiposity loci using high density imputation of genome-wide association studies in individuals of African ancestry: African Ancestry Anthropometry Genetics Consortium.

Science.gov (United States)

Ng, Maggie C Y; Graff, Mariaelisa; Lu, Yingchang; Justice, Anne E; Mudgal, Poorva; Liu, Ching-Ti; Young, Kristin; Yanek, Lisa R; Feitosa, Mary F; Wojczynski, Mary K; Rand, Kristin; Brody, Jennifer A; Cade, Brian E; Dimitrov, Latchezar; Duan, Qing; Guo, Xiuqing; Lange, Leslie A; Nalls, Michael A; Okut, Hayrettin; Tajuddin, Salman M; Tayo, Bamidele O; Vedantam, Sailaja; Bradfield, Jonathan P; Chen, Guanjie; Chen, Wei-Min; Chesi, Alessandra; Irvin, Marguerite R; Padhukasahasram, Badri; Smith, Jennifer A; Zheng, Wei; Allison, Matthew A; Ambrosone, Christine B; Bandera, Elisa V; Bartz, Traci M; Berndt, Sonja I; Bernstein, Leslie; Blot, William J; Bottinger, Erwin P; Carpten, John; Chanock, Stephen J; Chen, Yii-Der Ida; Conti, David V; Cooper, Richard S; Fornage, Myriam; Freedman, Barry I; Garcia, Melissa; Goodman, Phyllis J; Hsu, Yu-Han H; Hu, Jennifer; Huff, Chad D; Ingles, Sue A; John, Esther M; Kittles, Rick; Klein, Eric; Li, Jin; McKnight, Barbara; Nayak, Uma; Nemesure, Barbara; Ogunniyi, Adesola; Olshan, Andrew; Press, Michael F; Rohde, Rebecca; Rybicki, Benjamin A; Salako, Babatunde; Sanderson, Maureen; Shao, Yaming; Siscovick, David S; Stanford, Janet L; Stevens, Victoria L; Stram, Alex; Strom, Sara S; Vaidya, Dhananjay; Witte, John S; Yao, Jie; Zhu, Xiaofeng; Ziegler, Regina G; Zonderman, Alan B; Adeyemo, Adebowale; Ambs, Stefan; Cushman, Mary; Faul, Jessica D; Hakonarson, Hakon; Levin, Albert M; Nathanson, Katherine L; Ware, Erin B; Weir, David R; Zhao, Wei; Zhi, Degui; Arnett, Donna K; Grant, Struan F A; Kardia, Sharon L R; Oloapde, Olufunmilayo I; Rao, D C; Rotimi, Charles N; Sale, Michele M; Williams, L Keoki; Zemel, Babette S; Becker, Diane M; Borecki, Ingrid B; Evans, Michele K; Harris, Tamara B; Hirschhorn, Joel N; Li, Yun; Patel, Sanjay R; Psaty, Bruce M; Rotter, Jerome I; Wilson, James G; Bowden, Donald W; Cupples, L Adrienne; Haiman, Christopher A; Loos, Ruth J F; North, Kari E

2017-04-01

Genome-wide association studies (GWAS) have identified >300 loci associated with measures of adiposity including body mass index (BMI) and waist-to-hip ratio (adjusted for BMI, WHRadjBMI), but few have been identified through screening of the African ancestry genomes. We performed large scale meta-analyses and replications in up to 52,895 individuals for BMI and up to 23,095 individuals for WHRadjBMI from the African Ancestry Anthropometry Genetics Consortium (AAAGC) using 1000 Genomes phase 1 imputed GWAS to improve coverage of both common and low frequency variants in the low linkage disequilibrium African ancestry genomes. In the sex-combined analyses, we identified one novel locus (TCF7L2/HABP2) for WHRadjBMI and eight previously established loci at P African ancestry individuals. An additional novel locus (SPRYD7/DLEU2) was identified for WHRadjBMI when combined with European GWAS. In the sex-stratified analyses, we identified three novel loci for BMI (INTS10/LPL and MLC1 in men, IRX4/IRX2 in women) and four for WHRadjBMI (SSX2IP, CASC8, PDE3B and ZDHHC1/HSD11B2 in women) in individuals of African ancestry or both African and European ancestry. For four of the novel variants, the minor allele frequency was low (African ancestry sex-combined and sex-stratified analyses, 26 BMI loci and 17 WHRadjBMI loci contained ≤ 20 variants in the credible sets that jointly account for 99% posterior probability of driving the associations. The lead variants in 13 of these loci had a high probability of being causal. As compared to our previous HapMap imputed GWAS for BMI and WHRadjBMI including up to 71,412 and 27,350 African ancestry individuals, respectively, our results suggest that 1000 Genomes imputation showed modest improvement in identifying GWAS loci including low frequency variants. Trans-ethnic meta-analyses further improved fine mapping of putative causal variants in loci shared between the African and European ancestry populations.
RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning.

Directory of Open Access Journals (Sweden)

Ji-Sung Kim

2018-04-01

Full Text Available Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest, support vector machines, and gradient-boosted decision trees. RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error, precision, recall, and area under the curve for receiver operating characteristic plots (all p < 10-9. We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1 a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2 uneven accessibility and subjective importance of prophylactic health, (3 possible variation in lifestyle, such as dietary habits, and (4 differences in background genetic variation which predispose to diseases.
RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

KAUST Repository

Kim, Ji-Sung

2018-04-26

Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest, support vector machines, and gradient-boosted decision trees). RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error), precision, recall, and area under the curve for receiver operating characteristic plots (all p < 10-9). We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1) a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2) uneven accessibility and subjective importance of prophylactic health, (3) possible variation in lifestyle, such as dietary habits, and (4) differences in background genetic variation which predispose to diseases.
Relationship of some upland rice genotype after gamma irradiation

Science.gov (United States)

Suliartini, N. W. S.; Wijayanto, T.; Madiki, A.; Boer, D.; Muhidin; Juniawan

2018-02-01

The objective of the research was to group local upland rice genotypes after being treated with gamma irradiation. The research materials were upland rice genotypes resulted from mutation of the second generation and two parents: Pae Loilo (K3D0) and Pae Pongasi (K2D0) Cultivars. The research was conducted at the Indonesian Sweetener and Fiber Crops Research Institute, Malang Regency, and used the augmented design method. Research data were analyzed with R Program. Eight hundred and seventy one genotypes were selected with the selection criteria were based on yields on the average parents added 1.5 standard deviation. Based on the selection, eighty genotypes were analyzed with cluster analyses. Nine observation variables were used to develop cluster dendrogram using average linked method. Genetic distance was measured by euclidean distance. The results of cluster dendrogram showed that tested genotypes were divided into eight groups. Group 1, 2, 7, and 8 each had one genotype, group 3 and 6 each had two genotypes, group 4 had 25 genotypes, and group 5 had 51 genotypes. Check genotypes formed a separate group. Group 6 had the highest yield per plant of 126.11 gram, followed by groups 5 and 4 of 97.63 and 94.08 gram, respectively.
Linking genotypes database with locus-specific database and genotype-phenotype correlation in phenylketonuria.

Science.gov (United States)

Wettstein, Sarah; Underhaug, Jarl; Perez, Belen; Marsden, Brian D; Yue, Wyatt W; Martinez, Aurora; Blau, Nenad

2015-03-01

The wide range of metabolic phenotypes in phenylketonuria is due to a large number of variants causing variable impairment in phenylalanine hydroxylase function. A total of 834 phenylalanine hydroxylase gene variants from the locus-specific database PAHvdb and genotypes of 4181 phenylketonuria patients from the BIOPKU database were characterized using FoldX, SIFT Blink, Polyphen-2 and SNPs3D algorithms. Obtained data was correlated with residual enzyme activity, patients' phenotype and tetrahydrobiopterin responsiveness. A descriptive analysis of both databases was compiled and an interactive viewer in PAHvdb database was implemented for structure visualization of missense variants. We found a quantitative relationship between phenylalanine hydroxylase protein stability and enzyme activity (r(s) = 0.479), between protein stability and allelic phenotype (r(s) = -0.458), as well as between enzyme activity and allelic phenotype (r(s) = 0.799). Enzyme stability algorithms (FoldX and SNPs3D), allelic phenotype and enzyme activity were most powerful to predict patients' phenotype and tetrahydrobiopterin response. Phenotype prediction was most accurate in deleterious genotypes (≈ 100%), followed by homozygous (92.9%), hemizygous (94.8%), and compound heterozygous genotypes (77.9%), while tetrahydrobiopterin response was correctly predicted in 71.0% of all cases. To our knowledge this is the largest study using algorithms for the prediction of patients' phenotype and tetrahydrobiopterin responsiveness in phenylketonuria patients, using data from the locus-specific and genotypes database.
Forensic SNP genotyping with SNaPshot

DEFF Research Database (Denmark)

Fondevila, M; Børsting, C; Phillips, C

2017-01-01

to routine STR profiling, use of SNaPshot is an important part of the development of SNP sets for a wide range of forensic applications with these markers, from genotyping highly degraded DNA with very short amplicons to the introduction of SNPs to ascertain the ancestry and physical characteristics......This review explores the key factors that influence the optimization, routine use, and profile interpretation of the SNaPshot single-base extension (SBE) system applied to forensic single-nucleotide polymorphism (SNP) genotyping. Despite being a mainly complimentary DNA genotyping technique...... of an unidentified contact trace donor. However, this technology, as resourceful as it is, displays several features that depart from the usual STR genotyping far enough to demand a certain degree of expertise from the forensic analyst before tackling the complex casework on which SNaPshot application provides...
Assessment of salinity tolerance in bell pepper (capsicum annuum l.) genotypes on the basis of germination, emergence and growth attributes

International Nuclear Information System (INIS)

Tehseen, S.; Ayyub, C.M.; Amjad, M.

2016-01-01

Abiotic stresses are principal threat to crop growth and productivity all over the world. The most devastating one is soil salinity which adversely affects the plants, so a comprehensive study was conducted to categorize different available bell pepper (Capsicum annuum L.) genotypes into salt tolerant, moderately tolerant and sensitive ones on the basis of germination and emergence parameters. Genotypes were exposed to different saline treatments (2, 4, 6 and 8 dS m-1) along with control (0 dS m-1). Germination test, conducted in petri dishes in incubator, revealed that salinity stress significantly decreased final germination percentage, germination index and embryo axis length of tested genotypes. On the other hand, mean germination time and time to 50% seeds germination were increased with the increasing salinity level from 2 to 8 dS m-1. Emergence test of bell pepper genotypes conducted in pots under greenhouse conditions, shown that salinity decreased the seedlings fresh and dry biomass, number of leaves, leaf area and root and shoot length. On the basis of overall percent decrease ranking table, genotypes were grouped into comparatively salt tolerant (Zard, Tasty, Super shimla, Aristotle), moderately tolerant (Capistrano, CW-03, Kaka-01, Orable, Yolo wonder, Crusadar) and sensitive ones (PEP-311, Admiral, Lafayette, Colossol). From these results, it can be extracted that germination and emergence tests are reliable screening tools for evaluating pepper genotypes for salt stress at seedling stage. Moreover, results of this study can be useful for local farmers to utilize their marginal soils by growing relatively salt tolerant bell pepper genotypes. (author)
Sucrose and raffinose family oligosaccharides (RFOs) in soybean seeds as influenced by genotype and growing location.

Science.gov (United States)

Kumar, Vineet; Rani, Anita; Goyal, Lokesh; Dixit, Amit Kumar; Manjaya, J G; Dev, Jai; Swamy, M

2010-04-28

Sucrose content in soybean seeds is desired to be high because as a sweetness-imparting component, it helps in wider acceptance of soy-derived food products. Conversely, galactosyl derivatives of sucrose, that is, raffinose and stachyose, which are flatulence-inducing components, need to be in low concentration in soybean seeds not only for augmenting utilization of the crop in food uses but also for delivering soy meal with improved metabolizable energy for monogastric animals. In the present study, analysis of 148 soybean genotypes for sucrose and total raffinose family oligosaccharides (RFOs) contents revealed a higher variation (4.80-fold) for sucrose than for RFOs content (2.63-fold). High-performance liquid chromatography analyses revealed ranges of 0.64-2.53 and 2.09-7.1 mmol/100 g for raffinose and stachyose contents, respectively. As information concerning the environmental effects on the sucrose and RFOs content in soybean seeds is not available, we also investigated a set of seven genotypes raised at widely different geographical locations for these quality traits. Sucrose content was found to be significantly higher at cooler location (Palampur); however, differences observed for raffinose and stachyose contents across the growing locations were genotype-dependent. The results suggest that soybean genotypes grown at cooler locations may be better suited for processing soy food products with improved taste and flavor.

An epidemiologic survey of methicillin-resistant Staphylococcus aureus by combined use of mec-HVR genotyping and toxin genotyping in a university hospital in Japan.

Science.gov (United States)

Nishi, Junichiro; Yoshinaga, Masao; Miyanohara, Hiroaki; Kawahara, Motoshi; Kawabata, Masaharu; Motoya, Toshiro; Owaki, Tetsuhiro; Oiso, Shigeru; Kawakami, Masayuki; Kamewari, Shigeko; Koyama, Yumiko; Wakimoto, Naoko; Tokuda, Koichi; Manago, Kunihiro; Maruyama, Ikuro

2002-09-01

To evaluate the usefulness of an assay using two polymerase chain reaction-based genotyping methods in the practical surveillance of methicillin-resistant Staphylococcus aureus (MRSA). Nosocomial infection and colonization were surveyed monthly in a university hospital in Japan for 20 months. Genotyping with mec-HVR is based on the size of the mec-associated hypervariable region amplified by polymerase chain reaction. Toxin genotyping uses a multiplex polymerase chain reaction method to amplify eight staphylococcal toxin genes. Eight hundred nine MRSA isolates were classified into 49 genotypes. We observed differing prevalences of genotypes for different hospital wards, and could rapidly demonstrate the similarity of genotype for outbreak isolates. The incidence of genotype D: SEC/TSST1 was significantly higher in isolates causing nosocomial infections (49.5%; 48 of 97) than in nasal isolates (31.4%; 54 of 172) (P = .004), suggesting that this genotype may represent the nosocomial strains. The combined use of these two genotyping methods resulted in improved discriminatory ability and should be further investigated.
Preliminary Studies of the Performance of Quinoa (Chenopodium quinoa Willd.) Genotypes under Irrigated and Rainfed Conditions of Central Malawi.

Science.gov (United States)

Maliro, Moses F A; Guwela, Veronica F; Nyaika, Jacinta; Murphy, Kevin M

2017-01-01

The goal of sustainable intensification of agriculture in Malawi has led to the evaluation of innovative, regionally novel or under-utilized crop species. Quinoa ( Chenopodium quinoa Willd.) has the potential to provide a drought tolerant, nutritious alternative to maize. We evaluated 11 diverse varieties of quinoa for their yield and agronomic performance at two locations, Bunda and Bembeke, in Malawi. The varieties originated from Ecuador, Chile and Bolivia in South America; the United States and Canada in North America; and, Denmark in Europe, and were chosen based on their variation in morphological and agronomic traits, and their potential for adaptation to the climate of Malawi. Plant height, panicle length, days to maturity, harvest index, and seed yield were recorded for each variety under irrigation at Bunda and Bembeke, and under rainfed conditions at Bunda. Plant height was significantly influenced by both genotype and environment. There were also significant differences between the two locations for panicle length whereas genotype and genotype × environment (G × E) interaction were not significantly different. Differences were found for genotype and G × E interaction for harvest index. Notably, differences for genotype, environment and G × E were found for grain yield. Seed yield was higher at Bunda (237-3019 kg/ha) than Bembeke (62-692 kg/ha) under irrigated conditions. The highest yielding genotype at Bunda was Titicaca (3019 kg/ha) whereas Multi-Hued was the highest (692 kg/ha) at Bembeke. Strong positive correlations between seed yield and (1) plant height ( r = 0.74), (2) days to maturity ( r = 0.76), and (3) biomass ( r = 0.87) were found under irrigated conditions. The rainfed evaluations at Bunda revealed significant differences in seed yield, plant biomass, and seed size among the genotypes. The highest yielding genotype was Black Seeded (2050 kg/ha) followed by Multi-Hued (1603 kg/ha) and Bio-Bio (1446 kg/ha). Ecuadorian (257 kg/ha) was
Rare coding variants in PLCG2, ABI3, and TREM2 implicate microglial-mediated innate immunity in Alzheimer's disease.

Science.gov (United States)

Sims, Rebecca; van der Lee, Sven J; Naj, Adam C; Bellenguez, Céline; Badarinarayan, Nandini; Jakobsdottir, Johanna; Kunkle, Brian W; Boland, Anne; Raybould, Rachel; Bis, Joshua C; Martin, Eden R; Grenier-Boley, Benjamin; Heilmann-Heimbach, Stefanie; Chouraki, Vincent; Kuzma, Amanda B; Sleegers, Kristel; Vronskaya, Maria; Ruiz, Agustin; Graham, Robert R; Olaso, Robert; Hoffmann, Per; Grove, Megan L; Vardarajan, Badri N; Hiltunen, Mikko; Nöthen, Markus M; White, Charles C; Hamilton-Nelson, Kara L; Epelbaum, Jacques; Maier, Wolfgang; Choi, Seung-Hoan; Beecham, Gary W; Dulary, Cécile; Herms, Stefan; Smith, Albert V; Funk, Cory C; Derbois, Céline; Forstner, Andreas J; Ahmad, Shahzad; Li, Hongdong; Bacq, Delphine; Harold, Denise; Satizabal, Claudia L; Valladares, Otto; Squassina, Alessio; Thomas, Rhodri; Brody, Jennifer A; Qu, Liming; Sánchez-Juan, Pascual; Morgan, Taniesha; Wolters, Frank J; Zhao, Yi; Garcia, Florentino Sanchez; Denning, Nicola; Fornage, Myriam; Malamon, John; Naranjo, Maria Candida Deniz; Majounie, Elisa; Mosley, Thomas H; Dombroski, Beth; Wallon, David; Lupton, Michelle K; Dupuis, Josée; Whitehead, Patrice; Fratiglioni, Laura; Medway, Christopher; Jian, Xueqiu; Mukherjee, Shubhabrata; Keller, Lina; Brown, Kristelle; Lin, Honghuang; Cantwell, Laura B; Panza, Francesco; McGuinness, Bernadette; Moreno-Grau, Sonia; Burgess, Jeremy D; Solfrizzi, Vincenzo; Proitsi, Petra; Adams, Hieab H; Allen, Mariet; Seripa, Davide; Pastor, Pau; Cupples, L Adrienne; Price, Nathan D; Hannequin, Didier; Frank-García, Ana; Levy, Daniel; Chakrabarty, Paramita; Caffarra, Paolo; Giegling, Ina; Beiser, Alexa S; Giedraitis, Vilmantas; Hampel, Harald; Garcia, Melissa E; Wang, Xue; Lannfelt, Lars; Mecocci, Patrizia; Eiriksdottir, Gudny; Crane, Paul K; Pasquier, Florence; Boccardi, Virginia; Henández, Isabel; Barber, Robert C; Scherer, Martin; Tarraga, Lluis; Adams, Perrie M; Leber, Markus; Chen, Yuning; Albert, Marilyn S; Riedel-Heller, Steffi; Emilsson, Valur; Beekly, Duane; Braae, Anne; Schmidt, Reinhold; Blacker, Deborah; Masullo, Carlo; Schmidt, Helena; Doody, Rachelle S; Spalletta, Gianfranco; Longstreth, W T; Fairchild, Thomas J; Bossù, Paola; Lopez, Oscar L; Frosch, Matthew P; Sacchinelli, Eleonora; Ghetti, Bernardino; Yang, Qiong; Huebinger, Ryan M; Jessen, Frank; Li, Shuo; Kamboh, M Ilyas; Morris, John; Sotolongo-Grau, Oscar; Katz, Mindy J; Corcoran, Chris; Dunstan, Melanie; Braddel, Amy; Thomas, Charlene; Meggy, Alun; Marshall, Rachel; Gerrish, Amy; Chapman, Jade; Aguilar, Miquel; Taylor, Sarah; Hill, Matt; Fairén, Mònica Díez; Hodges, Angela; Vellas, Bruno; Soininen, Hilkka; Kloszewska, Iwona; Daniilidou, Makrina; Uphill, James; Patel, Yogen; Hughes, Joseph T; Lord, Jenny; Turton, James; Hartmann, Annette M; Cecchetti, Roberta; Fenoglio, Chiara; Serpente, Maria; Arcaro, Marina; Caltagirone, Carlo; Orfei, Maria Donata; Ciaramella, Antonio; Pichler, Sabrina; Mayhaus, Manuel; Gu, Wei; Lleó, Alberto; Fortea, Juan; Blesa, Rafael; Barber, Imelda S; Brookes, Keeley; Cupidi, Chiara; Maletta, Raffaele Giovanni; Carrell, David; Sorbi, Sandro; Moebus, Susanne; Urbano, Maria; Pilotto, Alberto; Kornhuber, Johannes; Bosco, Paolo; Todd, Stephen; Craig, David; Johnston, Janet; Gill, Michael; Lawlor, Brian; Lynch, Aoibhinn; Fox, Nick C; Hardy, John; Albin, Roger L; Apostolova, Liana G; Arnold, Steven E; Asthana, Sanjay; Atwood, Craig S; Baldwin, Clinton T; Barnes, Lisa L; Barral, Sandra; Beach, Thomas G; Becker, James T; Bigio, Eileen H; Bird, Thomas D; Boeve, Bradley F; Bowen, James D; Boxer, Adam; Burke, James R; Burns, Jeffrey M; Buxbaum, Joseph D; Cairns, Nigel J; Cao, Chuanhai; Carlson, Chris S; Carlsson, Cynthia M; Carney, Regina M; Carrasquillo, Minerva M; Carroll, Steven L; Diaz, Carolina Ceballos; Chui, Helena C; Clark, David G; Cribbs, David H; Crocco, Elizabeth A; DeCarli, Charles; Dick, Malcolm; Duara, Ranjan; Evans, Denis A; Faber, Kelley M; Fallon, Kenneth B; Fardo, David W; Farlow, Martin R; Ferris, Steven; Foroud, Tatiana M; Galasko, Douglas R; Gearing, Marla; Geschwind, Daniel H; Gilbert, John R; Graff-Radford, Neill R; Green, Robert C; Growdon, John H; Hamilton, Ronald L; Harrell, Lindy E; Honig, Lawrence S; Huentelman, Matthew J; Hulette, Christine M; Hyman, Bradley T; Jarvik, Gail P; Abner, Erin; Jin, Lee-Way; Jun, Gyungah; Karydas, Anna; Kaye, Jeffrey A; Kim, Ronald; Kowall, Neil W; Kramer, Joel H; LaFerla, Frank M; Lah, James J; Leverenz, James B; Levey, Allan I; Li, Ge; Lieberman, Andrew P; Lunetta, Kathryn L; Lyketsos, Constantine G; Marson, Daniel C; Martiniuk, Frank; Mash, Deborah C; Masliah, Eliezer; McCormick, Wayne C; McCurry, Susan M; McDavid, Andrew N; McKee, Ann C; Mesulam, Marsel; Miller, Bruce L; Miller, Carol A; Miller, Joshua W; Morris, John C; Murrell, Jill R; Myers, Amanda J; O'Bryant, Sid; Olichney, John M; Pankratz, Vernon S; Parisi, Joseph E; Paulson, Henry L; Perry, William; Peskind, Elaine; Pierce, Aimee; Poon, Wayne W; Potter, Huntington; Quinn, Joseph F; Raj, Ashok; Raskind, Murray; Reisberg, Barry; Reitz, Christiane; Ringman, John M; Roberson, Erik D; Rogaeva, Ekaterina; Rosen, Howard J; Rosenberg, Roger N; Sager, Mark A; Saykin, Andrew J; Schneider, Julie A; Schneider, Lon S; Seeley, William W; Smith, Amanda G; Sonnen, Joshua A; Spina, Salvatore; Stern, Robert A; Swerdlow, Russell H; Tanzi, Rudolph E; Thornton-Wells, Tricia A; Trojanowski, John Q; Troncoso, Juan C; Van Deerlin, Vivianna M; Van Eldik, Linda J; Vinters, Harry V; Vonsattel, Jean Paul; Weintraub, Sandra; Welsh-Bohmer, Kathleen A; Wilhelmsen, Kirk C; Williamson, Jennifer; Wingo, Thomas S; Woltjer, Randall L; Wright, Clinton B; Yu, Chang-En; Yu, Lei; Garzia, Fabienne; Golamaully, Feroze; Septier, Gislain; Engelborghs, Sebastien; Vandenberghe, Rik; De Deyn, Peter P; Fernadez, Carmen Muñoz; Benito, Yoland Aladro; Thonberg, Hakan; Forsell, Charlotte; Lilius, Lena; Kinhult-Stählbom, Anne; Kilander, Lena; Brundin, RoseMarie; Concari, Letizia; Helisalmi, Seppo; Koivisto, Anne Maria; Haapasalo, Annakaisa; Dermecourt, Vincent; Fievet, Nathalie; Hanon, Olivier; Dufouil, Carole; Brice, Alexis; Ritchie, Karen; Dubois, Bruno; Himali, Jayanadra J; Keene, C Dirk; Tschanz, JoAnn; Fitzpatrick, Annette L; Kukull, Walter A; Norton, Maria; Aspelund, Thor; Larson, Eric B; Munger, Ron; Rotter, Jerome I; Lipton, Richard B; Bullido, María J; Hofman, Albert; Montine, Thomas J; Coto, Eliecer; Boerwinkle, Eric; Petersen, Ronald C; Alvarez, Victoria; Rivadeneira, Fernando; Reiman, Eric M; Gallo, Maura; O'Donnell, Christopher J; Reisch, Joan S; Bruni, Amalia Cecilia; Royall, Donald R; Dichgans, Martin; Sano, Mary; Galimberti, Daniela; St George-Hyslop, Peter; Scarpini, Elio; Tsuang, Debby W; Mancuso, Michelangelo; Bonuccelli, Ubaldo; Winslow, Ashley R; Daniele, Antonio; Wu, Chuang-Kuo; Peters, Oliver; Nacmias, Benedetta; Riemenschneider, Matthias; Heun, Reinhard; Brayne, Carol; Rubinsztein, David C; Bras, Jose; Guerreiro, Rita; Al-Chalabi, Ammar; Shaw, Christopher E; Collinge, John; Mann, David; Tsolaki, Magda; Clarimón, Jordi; Sussams, Rebecca; Lovestone, Simon; O'Donovan, Michael C; Owen, Michael J; Behrens, Timothy W; Mead, Simon; Goate, Alison M; Uitterlinden, Andre G; Holmes, Clive; Cruchaga, Carlos; Ingelsson, Martin; Bennett, David A; Powell, John; Golde, Todd E; Graff, Caroline; De Jager, Philip L; Morgan, Kevin; Ertekin-Taner, Nilufer; Combarros, Onofre; Psaty, Bruce M; Passmore, Peter; Younkin, Steven G; Berr, Claudine; Gudnason, Vilmundur; Rujescu, Dan; Dickson, Dennis W; Dartigues, Jean-François; DeStefano, Anita L; Ortega-Cubero, Sara; Hakonarson, Hakon; Campion, Dominique; Boada, Merce; Kauwe, John Keoni; Farrer, Lindsay A; Van Broeckhoven, Christine; Ikram, M Arfan; Jones, Lesley; Haines, Jonathan L; Tzourio, Christophe; Launer, Lenore J; Escott-Price, Valentina; Mayeux, Richard; Deleuze, Jean-François; Amin, Najaf; Holmans, Peter A; Pericak-Vance, Margaret A; Amouyel, Philippe; van Duijn, Cornelia M; Ramirez, Alfredo; Wang, Li-San; Lambert, Jean-Charles; Seshadri, Sudha; Williams, Julie; Schellenberg, Gerard D

2017-09-01

We identified rare coding variants associated with Alzheimer's disease in a three-stage case-control study of 85,133 subjects. In stage 1, we genotyped 34,174 samples using a whole-exome microarray. In stage 2, we tested associated variants (P < 1 × 10 -4 ) in 35,962 independent samples using de novo genotyping and imputed genotypes. In stage 3, we used an additional 14,997 samples to test the most significant stage 2 associations (P < 5 × 10 -8 ) using imputed genotypes. We observed three new genome-wide significant nonsynonymous variants associated with Alzheimer's disease: a protective variant in PLCG2 (rs72824905: p.Pro522Arg, P = 5.38 × 10 -10 , odds ratio (OR) = 0.68, minor allele frequency (MAF) cases = 0.0059, MAF controls = 0.0093), a risk variant in ABI3 (rs616338: p.Ser209Phe, P = 4.56 × 10 -10 , OR = 1.43, MAF cases = 0.011, MAF controls = 0.008), and a new genome-wide significant variant in TREM2 (rs143332484: p.Arg62His, P = 1.55 × 10 -14 , OR = 1.67, MAF cases = 0.0143, MAF controls = 0.0089), a known susceptibility gene for Alzheimer's disease. These protein-altering changes are in genes highly expressed in microglia and highlight an immune-related protein-protein interaction network enriched for previously identified risk genes in Alzheimer's disease. These genetic findings provide additional evidence that the microglia-mediated innate immune response contributes directly to the development of Alzheimer's disease.
Two-temperature LATE-PCR endpoint genotyping

Directory of Open Access Journals (Sweden)

Reis Arthur H

2006-12-01

Full Text Available Abstract Background In conventional PCR, total amplicon yield becomes independent of starting template number as amplification reaches plateau and varies significantly among replicate reactions. This paper describes a strategy for reconfiguring PCR so that the signal intensity of a single fluorescent detection probe after PCR thermal cycling reflects genomic composition. The resulting method corrects for product yield variations among replicate amplification reactions, permits resolution of homozygous and heterozygous genotypes based on endpoint fluorescence signal intensities, and readily identifies imbalanced allele ratios equivalent to those arising from gene/chromosomal duplications. Furthermore, the use of only a single colored probe for genotyping enhances the multiplex detection capacity of the assay. Results Two-Temperature LATE-PCR endpoint genotyping combines Linear-After-The-Exponential (LATE-PCR (an advanced form of asymmetric PCR that efficiently generates single-stranded DNA and mismatch-tolerant probes capable of detecting allele-specific targets at high temperature and total single-stranded amplicons at a lower temperature in the same reaction. The method is demonstrated here for genotyping single-nucleotide alleles of the human HEXA gene responsible for Tay-Sachs disease and for genotyping SNP alleles near the human p53 tumor suppressor gene. In each case, the final probe signals were normalized against total single-stranded DNA generated in the same reaction. Normalization reduces the coefficient of variation among replicates from 17.22% to as little as 2.78% and permits endpoint genotyping with >99.7% accuracy. These assays are robust because they are consistent over a wide range of input DNA concentrations and give the same results regardless of how many cycles of linear amplification have elapsed. The method is also sufficiently powerful to distinguish between samples with a 1:1 ratio of two alleles from samples comprised of
Genetic Diversity and Population Structure of F3:6 Nebraska Winter Wheat Genotypes Using Genotyping-By-Sequencing.

Science.gov (United States)

Eltaher, Shamseldeen; Sallam, Ahmed; Belamkar, Vikas; Emara, Hamdy A; Nower, Ahmed A; Salem, Khaled F M; Poland, Jesse; Baenziger, Peter S

2018-01-01

The availability of information on the genetic diversity and population structure in wheat ( Triticum aestivum L.) breeding lines will help wheat breeders to better use their genetic resources and manage genetic variation in their breeding program. The recent advances in sequencing technology provide the opportunity to identify tens or hundreds of thousands of single nucleotide polymorphism (SNPs) in large genome species (e.g., wheat). These SNPs can be utilized for understanding genetic diversity and performing genome wide association studies (GWAS) for complex traits. In this study, the genetic diversity and population structure were investigated in a set of 230 genotypes (F 3:6 ) derived from various crosses as a prerequisite for GWAS and genomic selection. Genotyping-by-sequencing provided 25,566 high-quality SNPs. The polymorphism information content (PIC) across chromosomes ranged from 0.09 to 0.37 with an average of 0.23. The distribution of SNPs markers on the 21 chromosomes ranged from 319 on chromosome 3D to 2,370 on chromosome 3B. The analysis of population structure revealed three subpopulations (G1, G2, and G3). Analysis of molecular variance identified 8% variance among and 92% within subpopulations. Of the three subpopulations, G2 had the highest level of genetic diversity based on three genetic diversity indices: Shannon's information index ( I ) = 0.494, diversity index ( h ) = 0.328 and unbiased diversity index (uh) = 0.331, while G3 had lowest level of genetic diversity ( I = 0.348, h = 0.226 and uh = 0.236). This high genetic diversity identified among the subpopulations can be used to develop new wheat cultivars.
Common variation at 2p13.3, 3q29, 7p13 and 17q25.1 associated with susceptibility to pancreatic cancer

Czech Academy of Sciences Publication Activity Database

Childs, E.J.; Mocci, E.; Campa, D.; Bracci, P. M.; Gallinger, S.; Goggins, M.; Li, D.; Neale, R.E.; Olson, S. H.; Scelo, G.; Amundadottir, L. T.; Bamlet, W.R.; Bijlsma, M.F.; Blackford, A.; Borges, M.; Brennan, P.; Brenner, H.; Bueno-de-Mesquita, H. B.; Canzian, F.; Capurso, G.; Cavestro, G.M.; Chaffee, K.G.; Chanock, S. J.; Cleary, S.P.; Cotterchio, M.; Foretová, L.; Fuchs, Ch.; Funel, N.; Gazouli, M.; Hassan, M.; Herman, J.M.; Holcatová, I.; Holly, E. A.; Hoover, R.N.; Hung, R.J.; Janout, V.; Key, T.J.; Kupcinskas, J.; Kurtz, R. C.; Landi, S.; Lu, S.; Malecka-Panas, E.; Mambrini, A.; Mohelníková-Duchoňová, B.; Neoptolemos, J.P.; Oberg, A. L.; Orlow, I.; Pasquali, C.; Pezzilli, R.; Rizzato, C.; Saldia, A.; Scarpa, A.; Stolzenberg-Solomon, R. S.; Strobel, O.; Tavano, F.; Vashist, Y.K.; Vodička, Pavel; Wolpin, B. M.; Yu, H.; Petersen, G.; Risch, H. A.; Klein, A. P.

2015-01-01

Roč. 47, č. 8 (2015), s. 911-918 ISSN 1061-4036 R&D Projects: GA MZd NR9422; GA ČR GAP301/12/1734 Institutional support: RVO:68378041 Keywords : genome-wide association * genotype imputation * lung adenocarcinoma * cigarette-smoking * genetic-variation * alpha-gene * risk * loci * population * mutations Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 31.616, year: 2015
Genotype-based personalised nutrition for obesity prevention and ...

African Journals Online (AJOL)

Typically, genotype-based personalised nutrition involves genotyping for a number of susceptibility SNPs associated with the prevention, or management, of a particular disease. Dietary advice is then personalised to the individual's genotype to ensure optimal prevention or treatment outcomes. To ensure evidence-based ...
Network Based Integrated Analysis of Phenotype-Genotype Data for Prioritization of Candidate Symptom Genes

Directory of Open Access Journals (Sweden)

Xing Li

2014-01-01

Full Text Available Background. Symptoms and signs (symptoms in brief are the essential clinical manifestations for individualized diagnosis and treatment in traditional Chinese medicine (TCM. To gain insights into the molecular mechanism of symptoms, we develop a computational approach to identify the candidate genes of symptoms. Methods. This paper presents a network-based approach for the integrated analysis of multiple phenotype-genotype data sources and the prediction of the prioritizing genes for the associated symptoms. The method first calculates the similarities between symptoms and diseases based on the symptom-disease relationships retrieved from the PubMed bibliographic database. Then the disease-gene associations and protein-protein interactions are utilized to construct a phenotype-genotype network. The PRINCE algorithm is finally used to rank the potential genes for the associated symptoms. Results. The proposed method gets reliable gene rank list with AUC (area under curve 0.616 in classification. Some novel genes like CALCA, ESR1, and MTHFR were predicted to be associated with headache symptoms, which are not recorded in the benchmark data set, but have been reported in recent published literatures. Conclusions. Our study demonstrated that by integrating phenotype-genotype relationships into a complex network framework it provides an effective approach to identify candidate genes of symptoms.
Identifying a few foot-and-mouth disease virus signature nucleotide strings for computational genotyping

Directory of Open Access Journals (Sweden)

Xu Lizhe

2008-06-01

Full Text Available Abstract Background Serotypes of the Foot-and-Mouth disease viruses (FMDVs were generally determined by biological experiments. The computational genotyping is not well studied even with the availability of whole viral genomes, due to uneven evolution among genes as well as frequent genetic recombination. Naively using sequence comparison for genotyping is only able to achieve a limited extent of success. Results We used 129 FMDV strains with known serotype as training strains to select as many as 140 most serotype-specific nucleotide strings. We then constructed a linear-kernel Support Vector Machine classifier using these 140 strings. Under the leave-one-out cross validation scheme, this classifier was able to assign correct serotype to 127 of these 129 strains, achieving 98.45% accuracy. It also assigned serotype correctly to an independent test set of 83 other FMDV strains downloaded separately from NCBI GenBank. Conclusion Computational genotyping is much faster and much cheaper than the wet-lab based biological experiments, upon the availability of the detailed molecular sequences. The high accuracy of our proposed method suggests the potential of utilizing a few signature nucleotide strings instead of whole genomes to determine the serotypes of novel FMDV strains.
Transcriptome-Wide Single Nucleotide Polymorphisms (SNPs for Abalone (Haliotis midae: Validation and Application Using GoldenGate Medium-Throughput Genotyping Assays

Directory of Open Access Journals (Sweden)

Rouvay Roodt-Wilding

2013-09-01

Full Text Available Haliotis midae is one of the most valuable commercial abalone species in the world, but is highly vulnerable, due to exploitation, habitat destruction and predation. In order to preserve wild and cultured stocks, genetic management and improvement of the species has become crucial. Fundamental to this is the availability and employment of molecular markers, such as microsatellites and Single Nucleotide Polymorphisms (SNPs . Transcriptome sequences generated through sequencing-by-synthesis technology were utilized for the in vitro and in silico identification of 505 putative SNPs from a total of 316 selected contigs. A subset of 234 SNPs were further validated and characterized in wild and cultured abalone using two Illumina GoldenGate genotyping assays. Combined with VeraCode technology, this genotyping platform yielded a 65%−69% conversion rate (percentage polymorphic markers with a global genotyping success rate of 76%−85% and provided a viable means for validating SNP markers in a non-model species. The utility of 31 of the validated SNPs in population structure analysis was confirmed, while a large number of SNPs (174 were shown to be informative and are, thus, good candidates for linkage map construction. The non-synonymous SNPs (50 located in coding regions of genes that showed similarities with known proteins will also be useful for genetic applications, such as the marker-assisted selection of genes of relevance to abalone aquaculture.
Transcriptome-wide single nucleotide polymorphisms (SNPs) for abalone (Haliotis midae): validation and application using GoldenGate medium-throughput genotyping assays.

Science.gov (United States)

Bester-Van Der Merwe, Aletta; Blaauw, Sonja; Du Plessis, Jana; Roodt-Wilding, Rouvay

2013-09-23

Haliotis midae is one of the most valuable commercial abalone species in the world, but is highly vulnerable, due to exploitation, habitat destruction and predation. In order to preserve wild and cultured stocks, genetic management and improvement of the species has become crucial. Fundamental to this is the availability and employment of molecular markers, such as microsatellites and single nucleotide (SNPs). Transcriptome sequences generated through sequencing-by-synthesis technology were utilized for the in vitro and in silico identification of 505 putative SNPs from a total of 316 selected contigs. A subset of 234 SNPs were further validated and characterized in wild and cultured abalone using two Illumina GoldenGate genotyping assays. Combined with VeraCode technology, this genotyping platform yielded a 65%-69% conversion rate (percentage polymorphic markers) with a global genotyping success rate of 76%-85% and provided a viable means for validating SNP markers in a non-model species. The utility of 31 of the validated SNPs in population structure analysis was confirmed, while a large number of SNPs (174) were shown to be informative and are, thus, good candidates for linkage map construction. The non-synonymous SNPs (50) located in coding regions of genes that showed similarities with known proteins will also be useful for genetic applications, such as the marker-assisted selection of genes of relevance to abalone aquaculture.
Multiple imputation for estimating the risk of developing dementia and its impact on survival.

Science.gov (United States)

Yu, Binbing; Saczynski, Jane S; Launer, Lenore

2010-10-01

Dementia, Alzheimer's disease in particular, is one of the major causes of disability and decreased quality of life among the elderly and a leading obstacle to successful aging. Given the profound impact on public health, much research has focused on the age-specific risk of developing dementia and the impact on survival. Early work has discussed various methods of estimating age-specific incidence of dementia, among which the illness-death model is popular for modeling disease progression. In this article we use multiple imputation to fit multi-state models for survival data with interval censoring and left truncation. This approach allows semi-Markov models in which survival after dementia depends on onset age. Such models can be used to estimate the cumulative risk of developing dementia in the presence of the competing risk of dementia-free death. Simulations are carried out to examine the performance of the proposed method. Data from the Honolulu Asia Aging Study are analyzed to estimate the age-specific and cumulative risks of dementia and to examine the effect of major risk factors on dementia onset and death.
Sensitivity analysis in multiple imputation in effectiveness studies of psychotherapy.

Science.gov (United States)

Crameri, Aureliano; von Wyl, Agnes; Koemeda, Margit; Schulthess, Peter; Tschuschke, Volker

2015-01-01

The importance of preventing and treating incomplete data in effectiveness studies is nowadays emphasized. However, most of the publications focus on randomized clinical trials (RCT). One flexible technique for statistical inference with missing data is multiple imputation (MI). Since methods such as MI rely on the assumption of missing data being at random (MAR), a sensitivity analysis for testing the robustness against departures from this assumption is required. In this paper we present a sensitivity analysis technique based on posterior predictive checking, which takes into consideration the concept of clinical significance used in the evaluation of intra-individual changes. We demonstrate the possibilities this technique can offer with the example of irregular longitudinal data collected with the Outcome Questionnaire-45 (OQ-45) and the Helping Alliance Questionnaire (HAQ) in a sample of 260 outpatients. The sensitivity analysis can be used to (1) quantify the degree of bias introduced by missing not at random data (MNAR) in a worst reasonable case scenario, (2) compare the performance of different analysis methods for dealing with missing data, or (3) detect the influence of possible violations to the model assumptions (e.g., lack of normality). Moreover, our analysis showed that ratings from the patient's and therapist's version of the HAQ could significantly improve the predictive value of the routine outcome monitoring based on the OQ-45. Since analysis dropouts always occur, repeated measurements with the OQ-45 and the HAQ analyzed with MI are useful to improve the accuracy of outcome estimates in quality assurance assessments and non-randomized effectiveness studies in the field of outpatient psychotherapy.
Genotyping of Coxiella burnetii from domestic ruminants in northern Spain

Directory of Open Access Journals (Sweden)

Astobiza Ianire

2012-12-01

Full Text Available Abstract Background Information on the genotypic diversity of Coxiella burnetii isolates from infected domestic ruminants in Spain is limited. The aim of this study was to identify the C. burnetii genotypes infecting livestock in Northern Spain and compare them to other European genotypes. A commercial real-time PCR targeting the IS1111a insertion element was used to detect the presence of C. burnetii DNA in domestic ruminants from Spain. Genotypes were determined by a 6-loci Multiple Locus Variable number tandem repeat analysis (MLVA panel and Multispacer Sequence Typing (MST. Results A total of 45 samples from 4 goat herds (placentas, N = 4, 12 dairy cattle herds (vaginal mucus, individual milk, bulk tank milk, aerosols, N = 20 and 5 sheep flocks (placenta, vaginal swabs, faeces, air samples, dust, N = 21 were included in the study. Samples from goats and sheep were obtained from herds which had suffered abortions suspected to be caused by C. burnetii, whereas cattle samples were obtained from animals with reproductive problems compatible with C. burnetii infection, or consisted of bulk tank milk (BTM samples from a Q fever surveillance programme. C. burnetii genotypes identified in ruminants from Spain were compared to those detected in other countries. Three MLVA genotypes were found in 4 goat farms, 7 MLVA genotypes were identified in 12 cattle herds and 4 MLVA genotypes were identified in 5 sheep flocks. Clustering of the MLVA genotypes using the minimum spanning tree method showed a high degree of genetic similarity between most MLVA genotypes. Overall 11 different MLVA genotypes were obtained corresponding to 4 different MST genotypes: MST genotype 13, identified in goat, sheep and cattle from Spain; MST genotype 18, only identified in goats; and, MST genotypes 8 and 20, identified in small ruminants and cattle, respectively. All these genotypes had been previously identified in animal and human clinical samples from several
Forecasting Brassica rapa: Merging climate models with genotype specific process models for evaluation whole species response to climate change.

Science.gov (United States)

Pleban, J. R.; Mackay, D. S.; Ewers, B. E.; Weinig, C.; Guadagno, C. L.

2016-12-01

Human society has modified agriculture management practices and utilized a variety of breeding approaches to adapt to changing environments. Presently a dual pronged challenge has emerged as environmental change is occurring more rapidly while the demand of population growth on food supply is rising. Knowledge of how current agricultural practices will respond to these challenges can be informed through crafted prognostic modeling approaches. Amongst the uncertainties associated with forecasting agricultural production in a changing environment is evaluation of the responses across the existing genotypic diversity of crop species. Mechanistic models of plant productivity provide a means of genotype level parameterization allowing for a prognostic evaluation of varietal performance under changing climate. Brassica rapa represents an excellent species for this type of investigation because of its wide cultivation as well as large morphological and physiological diversity. We incorporated genotypic parameterization of B. rapa genotypes based on unique CO2 assimilation strategies, vulnerabilities to cavitation, and root to leaf area relationships into the TREES model. Three climate drivers, following the "business-as-usual" greenhouse gas emissions scenario (RCP 8.5) from Coupled Model Intercomparison Project, Phase 5 (CMIP5) were considered: temperature (T) along with associated changes in vapor pressure deficit (VPD), increasing CO2, as well as alternatives in irrigation regime across a temporal scale of present day to 2100. Genotypic responses to these drivers were evaluated using net primary productivity (NPP) and percent loss hydraulic conductance (PLC) as a measure of tolerance for a particular watering regime. Genotypic responses to T were witnessed as water demand driven by increases in VPD at 2050 and 2100 drove some genotypes to greater PLC and in a subset of these saw periodic decreases in NPP during a growing season. Genotypes able to withstand the greater
Genotypic Regulation of Aflatoxin Accumulation but Not Aspergillus Fungal Growth upon Post-Harvest Infection of Peanut (Arachis hypogaea L. Seeds

Directory of Open Access Journals (Sweden)

Walid Ahmed Korani

2017-07-01

Full Text Available Aflatoxin contamination is a major economic and food safety concern for the peanut industry that largely could be mitigated by genetic resistance. To screen peanut for aflatoxin resistance, ten genotypes were infected with a green fluorescent protein (GFP—expressing Aspergillus flavus strain. Percentages of fungal infected area and fungal GFP signal intensity were documented by visual ratings every 8 h for 72 h after inoculation. Significant genotypic differences in fungal growth rates were documented by repeated measures and area under the disease progress curve (AUDPC analyses. SICIA (Seed Infection Coverage and Intensity Analyzer, an image processing software, was developed to digitize fungal GFP signals. Data from SICIA image analysis confirmed visual rating results validating its utility for quantifying fungal growth. Among the tested peanut genotypes, NC 3033 and GT-C20 supported the lowest and highest fungal growth on the surface of peanut seeds, respectively. Although differential fungal growth was observed on the surface of peanut seeds, total fungal growth in the seeds was not significantly different across genotypes based on a fluorometric GFP assay. Significant differences in aflatoxin B levels were detected across peanut genotypes. ICG 1471 had the lowest aflatoxin level whereas Florida-07 had the highest. Two-year aflatoxin tests under simulated late-season drought also showed that ICG 1471 had reduced aflatoxin production under pre-harvest field conditions. These results suggest that all peanut genotypes support A. flavus fungal growth yet differentially influence aflatoxin production.
Genotypic Regulation of Aflatoxin Accumulation but Not Aspergillus Fungal Growth upon Post-Harvest Infection of Peanut (Arachis hypogaea L.) Seeds.

Science.gov (United States)

Korani, Walid Ahmed; Chu, Ye; Holbrook, Corley; Clevenger, Josh; Ozias-Akins, Peggy

2017-07-12

Aflatoxin contamination is a major economic and food safety concern for the peanut industry that largely could be mitigated by genetic resistance. To screen peanut for aflatoxin resistance, ten genotypes were infected with a green fluorescent protein (GFP)-expressing Aspergillus flavus strain. Percentages of fungal infected area and fungal GFP signal intensity were documented by visual ratings every 8 h for 72 h after inoculation. Significant genotypic differences in fungal growth rates were documented by repeated measures and area under the disease progress curve (AUDPC) analyses. SICIA (Seed Infection Coverage and Intensity Analyzer), an image processing software, was developed to digitize fungal GFP signals. Data from SICIA image analysis confirmed visual rating results validating its utility for quantifying fungal growth. Among the tested peanut genotypes, NC 3033 and GT-C20 supported the lowest and highest fungal growth on the surface of peanut seeds, respectively. Although differential fungal growth was observed on the surface of peanut seeds, total fungal growth in the seeds was not significantly different across genotypes based on a fluorometric GFP assay. Significant differences in aflatoxin B levels were detected across peanut genotypes. ICG 1471 had the lowest aflatoxin level whereas Florida-07 had the highest. Two-year aflatoxin tests under simulated late-season drought also showed that ICG 1471 had reduced aflatoxin production under pre-harvest field conditions. These results suggest that all peanut genotypes support A. flavus fungal growth yet differentially influence aflatoxin production.
Integrating common and rare genetic variation in diverse human populations.

Science.gov (United States)

Altshuler, David M; Gibbs, Richard A; Peltonen, Leena; Altshuler, David M; Gibbs, Richard A; Peltonen, Leena; Dermitzakis, Emmanouil; Schaffner, Stephen F; Yu, Fuli; Peltonen, Leena; Dermitzakis, Emmanouil; Bonnen, Penelope E; Altshuler, David M; Gibbs, Richard A; de Bakker, Paul I W; Deloukas, Panos; Gabriel, Stacey B; Gwilliam, Rhian; Hunt, Sarah; Inouye, Michael; Jia, Xiaoming; Palotie, Aarno; Parkin, Melissa; Whittaker, Pamela; Yu, Fuli; Chang, Kyle; Hawes, Alicia; Lewis, Lora R; Ren, Yanru; Wheeler, David; Gibbs, Richard A; Muzny, Donna Marie; Barnes, Chris; Darvishi, Katayoon; Hurles, Matthew; Korn, Joshua M; Kristiansson, Kati; Lee, Charles; McCarrol, Steven A; Nemesh, James; Dermitzakis, Emmanouil; Keinan, Alon; Montgomery, Stephen B; Pollack, Samuela; Price, Alkes L; Soranzo, Nicole; Bonnen, Penelope E; Gibbs, Richard A; Gonzaga-Jauregui, Claudia; Keinan, Alon; Price, Alkes L; Yu, Fuli; Anttila, Verneri; Brodeur, Wendy; Daly, Mark J; Leslie, Stephen; McVean, Gil; Moutsianas, Loukas; Nguyen, Huy; Schaffner, Stephen F; Zhang, Qingrun; Ghori, Mohammed J R; McGinnis, Ralph; McLaren, William; Pollack, Samuela; Price, Alkes L; Schaffner, Stephen F; Takeuchi, Fumihiko; Grossman, Sharon R; Shlyakhter, Ilya; Hostetter, Elizabeth B; Sabeti, Pardis C; Adebamowo, Clement A; Foster, Morris W; Gordon, Deborah R; Licinio, Julio; Manca, Maria Cristina; Marshall, Patricia A; Matsuda, Ichiro; Ngare, Duncan; Wang, Vivian Ota; Reddy, Deepa; Rotimi, Charles N; Royal, Charmaine D; Sharp, Richard R; Zeng, Changqing; Brooks, Lisa D; McEwen, Jean E

2010-09-02

Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of imputing newly discovered CNPs and SNPs. This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation.
The Comparison of Growth, Slaughter and Carcass Traits of Meat Chicken Genotype Produced by Back-Crossing with A Commercial Broiler Genotype

OpenAIRE

Musa Sarıca; Umut Sami Yamak; Mehmet Akif Boz; Ahmet Uçar

2014-01-01

This study was conducted to determine the growth and some slaughter traits between commercial fast growing chickens and three-way cross M2 genotypes. 260 male female mixed chickens from each genotype was reared 10 replicate per genotype in the same house. Two different slaughtering ages were applied to commercial chickens and slaughtered at 6 and 7 weeks of age for comparing with cross genotypes. F chickens reached to slaughtering age at 42 days, whereas cross groups reached at 49 days. Genot...
Identification of a BRCA2-Specific Modifier Locus at 6p24 Related to Breast Cancer Risk

DEFF Research Database (Denmark)

Gaudet, Mia M; Kuchenbaecker, Karoline B; Vijai, Joseph

2013-01-01

of a multi-consortial project. DNA samples from 3,881 breast cancer affected and 4,330 unaffected BRCA2 mutation carriers from 47 studies belonging to the Consortium of Investigators of Modifiers of BRCA1/2 were genotyped and available for analysis. We replicated previously reported breast cancer...... carriers, we conducted a deep replication of an ongoing GWAS discovery study. Using the ranked P-values of the breast cancer associations with the imputed genotype of 1.4 M SNPs, 19,029 SNPs were selected and designed for inclusion on a custom Illumina array that included a total of 211,155 SNPs as part...

Genome of the Netherlands population-specific imputations identify an ABCA6 variant associated with cholesterol levels

Science.gov (United States)

van Leeuwen, Elisabeth M.; Karssen, Lennart C.; Deelen, Joris; Isaacs, Aaron; Medina-Gomez, Carolina; Mbarek, Hamdi; Kanterakis, Alexandros; Trompet, Stella; Postmus, Iris; Verweij, Niek; van Enckevort, David J.; Huffman, Jennifer E.; White, Charles C.; Feitosa, Mary F.; Bartz, Traci M.; Manichaikul, Ani; Joshi, Peter K.; Peloso, Gina M.; Deelen, Patrick; van Dijk, Freerk; Willemsen, Gonneke; de Geus, Eco J.; Milaneschi, Yuri; Penninx, Brenda W.J.H.; Francioli, Laurent C.; Menelaou, Androniki; Pulit, Sara L.; Rivadeneira, Fernando; Hofman, Albert; Oostra, Ben A.; Franco, Oscar H.; Leach, Irene Mateo; Beekman, Marian; de Craen, Anton J.M.; Uh, Hae-Won; Trochet, Holly; Hocking, Lynne J.; Porteous, David J.; Sattar, Naveed; Packard, Chris J.; Buckley, Brendan M.; Brody, Jennifer A.; Bis, Joshua C.; Rotter, Jerome I.; Mychaleckyj, Josyf C.; Campbell, Harry; Duan, Qing; Lange, Leslie A.; Wilson, James F.; Hayward, Caroline; Polasek, Ozren; Vitart, Veronique; Rudan, Igor; Wright, Alan F.; Rich, Stephen S.; Psaty, Bruce M.; Borecki, Ingrid B.; Kearney, Patricia M.; Stott, David J.; Adrienne Cupples, L.; Neerincx, Pieter B.T.; Elbers, Clara C.; Francesco Palamara, Pier; Pe'er, Itsik; Abdellaoui, Abdel; Kloosterman, Wigard P.; van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F.J.; Stoneking, Mark; de Knijff, Peter; Kayser, Manfred; Veldink, Jan H.; van den Berg, Leonard H.; Byelas, Heorhiy; den Dunnen, Johan T.; Dijkstra, Martijn; Amin, Najaf; Joeri van der Velde, K.; van Setten, Jessica; Kattenberg, Mathijs; van Schaik, Barbera D.C.; Bot, Jan; Nijman, Isaäc J.; Mei, Hailiang; Koval, Vyacheslav; Ye, Kai; Lameijer, Eric-Wubbo; Moed, Matthijs H.; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Sunyaev, Shamil R.; Sohail, Mashaal; Hormozdiari, Fereydoun; Marschall, Tobias; Schönhuth, Alexander; Guryev, Victor; Suchiman, H. Eka D.; Wolffenbuttel, Bruce H.; Platteel, Mathieu; Pitts, Steven J.; Potluri, Shobha; Cox, David R.; Li, Qibin; Li, Yingrui; Du, Yuanping; Chen, Ruoyan; Cao, Hongzhi; Li, Ning; Cao, Sujie; Wang, Jun; Bovenberg, Jasper A.; Jukema, J. Wouter; van der Harst, Pim; Sijbrands, Eric J.; Hottenga, Jouke-Jan; Uitterlinden, Andre G.; Swertz, Morris A.; van Ommen, Gert-Jan B.; de Bakker, Paul I.W.; Eline Slagboom, P.; Boomsma, Dorret I.; Wijmenga, Cisca; van Duijn, Cornelia M.

2015-01-01

Variants associated with blood lipid levels may be population-specific. To identify low-frequency variants associated with this phenotype, population-specific reference panels may be used. Here we impute nine large Dutch biobanks (~35,000 samples) with the population-specific reference panel created by the Genome of the Netherlands Project and perform association testing with blood lipid levels. We report the discovery of five novel associations at four loci (P value <6.61 × 10−4), including a rare missense variant in ABCA6 (rs77542162, p.Cys1359Arg, frequency 0.034), which is predicted to be deleterious. The frequency of this ABCA6 variant is 3.65-fold increased in the Dutch and its effect (βLDL-C=0.135, βTC=0.140) is estimated to be very similar to those observed for single variants in well-known lipid genes, such as LDLR. PMID:25751400
Travel time to maternity care and its effect on utilization in rural Ghana: a multilevel analysis.

Science.gov (United States)

Masters, Samuel H; Burstein, Roy; Amofah, George; Abaogye, Patrick; Kumar, Santosh; Hanlon, Michael

2013-09-01

Rates of neonatal and maternal mortality are high in Ghana. In-facility delivery and other maternal services could reduce this burden, yet utilization rates of key maternal services are relatively low, especially in rural areas. We tested a theoretical implication that travel time negatively affects the use of in-facility delivery and other maternal services. Empirically, we used geospatial techniques to estimate travel times between populations and health facilities. To account for uncertainty in Ghana Demographic and Health Survey cluster locations, we adopted a novel approach of treating the location selection as an imputation problem. We estimated a multilevel random-intercept logistic regression model. For rural households, we found that travel time had a significant effect on the likelihood of in-facility delivery and antenatal care visits, holding constant education, wealth, maternal age, facility capacity, female autonomy, and the season of birth. In contrast, a facility's capacity to provide sophisticated maternity care had no detectable effect on utilization. As the Ghanaian health network expands, our results suggest that increasing the availability of basic obstetric services and improving transport infrastructure may be important interventions. Copyright © 2013 Elsevier Ltd. All rights reserved.
Evaluation of promising sweetpotato genotypes for high altitude ...

African Journals Online (AJOL)

The trials were set up to identify sweetpotato genotypes with adaptation to highland agroecologies with special reference to resistance to Ahemaria blight ... growth and at harvest, four genotypes and the local check, Magabari, bad high levels of resistance toA/Jemaria blight. Eight genotypes had total storage root yield ...
Porphyromonas gingivalis Fim-A genotype distribution among Colombians

Science.gov (United States)

Jaramillo, Adriana; Parra, Beatriz; Botero, Javier Enrique; Contreras, Adolfo

2015-01-01

Introduction: Porphyromonas gingivalis is associated with periodontitis and exhibit a wide array of virulence factors, including fimbriae which is encoded by the FimA gene representing six known genotypes. Objetive: To identify FimA genotypes of P. gingivalis in subjects from Cali-Colombia, including the co-infection with Aggregatibacter actinomycetemcomitans, Treponema denticola, and Tannerella forsythia. Methods: Subgingival samples were collected from 151 people exhibiting diverse periodontal condition. The occurrence of P. gingivalis, FimA genotypes and other bacteria was determined by PCR. Results: P. gingivalis was positive in 85 patients. Genotype FimA II was more prevalent without reach significant differences among study groups (54.3%), FimA IV was also prevalent in gingivitis (13.0%). A high correlation (p= 0.000) was found among P. gingivalis, T. denticola, and T. forsythia co-infection. The FimA II genotype correlated with concomitant detection of T. denticola and T. forsythia. Conclusions: Porphyromonas gingivalis was high even in the healthy group at the study population. A trend toward a greater frequency of FimA II genotype in patients with moderate and severe periodontitis was determined. The FimA II genotype was also associated with increased pocket depth, greater loss of attachment level, and patients co-infected with T. denticola and T. forsythia. PMID:26600627
Distribution of Hepatitis C Virus Genotypes in the South Marmara Region

Directory of Open Access Journals (Sweden)

Harun Agca

2014-03-01

Full Text Available Aim: Hepatitis C virus (HCV is an important caustive agent of hepatitis, cirrhosis and hepatocellular carcinoma both in our country and the world. Prognosis and response to treatment is related with the genotype of HCV which has six genotypes and over a hundred quasispecies. Knowing the HCV genotype is also important for epidemiological data. In this study we aimed to investigate the HCV genotypes of samples sent to Uludag University Hospital Microbiology Laboratory which is the reference centre in the South Marmara Region. Material and Method: This study was done retrospectively to analyse the HCV patients%u2019 sera sent to our laboratory between July 2010and December 2012 for HCV genotyping. Artus HCV QS-RGQ PCR kit (Qiagene,Hilden, Germany was used in Rotor-Gene Q (Qiagene, Hilden Germany for detection of HCV RNA. HCV RNA positive samples of patients%u2019 sera were were used for genotyping by the Linear Array HCV genotyping test (Roche, NJ, USA.Results: 214 (92.6 % of total 231 patients included in the study were genotype 1, one (0.4 % was genotype 2, nine (3.9 % were genotype 3 and, seven (3.4 % were found genotype 4. Three of genotype 3 patients were of foreign nationality, two were born abroad and one of the genotype 4 patients were born abroad. Discussion: Concordant with our country data the most frequent genotype was 1, genotype 2 was seen in patients especially related with foreign countries and genotype 4 was seen rare. The importance of genotype 1, which is seen more frequent in our country and region is; resistance to antiviral treatment and prolonged treatment duration in chronic hepatitis C patients.
Phenotypic and genotypic variation in Iranian Pistachios

Directory of Open Access Journals (Sweden)

Somayeh Tayefeh Aliakbarkhani

2015-12-01

Full Text Available As Iran is one of the richest pistachio germplasms a few studies have been conducted on different sexes of pistachio trees, in areas where this crop emerged. To this end, 40 male and female Iranian pistachio genotypes from Feizabad region, Khorasan, Iran; were evaluated using morphological characters and randomly amplified polymorphic DNA (RAPD markers. For morphological assessments, 54 variables were considered to investigate similarities between and among the studied genotypes. Morphological data indicated relative superiority in some female genotypes (such as Sefid 1, Sefid Sabuni 2, Garmesiah, and Ghermezdorosht Z regarding characters such as halfcrackedness, the percentages of protein and fat content. 115 polymorphic bands were recorded with 92.83% average polymorphism among all primers. The total resolving power (Rp of the primers was 74.54. The range of genetic similarity varied from about 0.31 to about 0.70. Genotypes were segregated into eight groups at the similarity limit of 0.41. Results of present investigation could be helpful for strategic decisions for maintaining Iranian pistachio genotypes.
Variants of PLCXD3 are not associated with variant or sporadic Creutzfeldt-Jakob disease in a large international study.

Science.gov (United States)

Balendra, Rubika; Uphill, James; Collinson, Claire; Druyeh, Ronald; Adamson, Gary; Hummerich, Holger; Zerr, Inga; Gambetti, Pierluigi; Collinge, John; Mead, Simon

2016-04-07

Human prion diseases are relentlessly progressive neurodegenerative disorders which include sporadic Creutzfeldt-Jakob disease (sCJD) and variant CJD (vCJD). Aside from variants of the prion protein gene (PRNP) replicated association at genome-wide levels of significance has proven elusive. A recent association study identified variants in or near to the PLCXD3 gene locus as strong disease risk factors in multiple human prion diseases. This study claimed the first non-PRNP locus to be highly significantly associated with prion disease in genomic studies. A sub-study of a genome-wide association study with imputation aiming to replicate the finding at PLCXD3 including 129 vCJD and 2500 sCJD samples. Whole exome sequencing to identify rare coding variants of PLCXD3. Imputation of relevant polymorphisms was accurate based on wet genotyping of a sample. We found no supportive evidence that PLCXD3 variants are associated with disease. The marked discordance in vCJD genotype frequencies between studies, despite extensive overlap in vCJD cases, and the finding of Hardy-Weinberg disequilibrium in the original study, suggests possible reasons for the discrepancies between studies.
Constructing linkage maps in the genomics era with MapDisto 2.0.

Science.gov (United States)

Heffelfinger, Christopher; Fragoso, Christopher A; Lorieux, Mathias

2017-07-15

Genotyping by sequencing (GBS) generates datasets that are challenging to handle by current genetic mapping software with graphical interface. Geneticists need new user-friendly computer programs that can analyze GBS data on desktop computers. This requires improvements in computation efficiency, both in terms of speed and use of random-access memory (RAM). MapDisto v.2.0 is a user-friendly computer program for construction of genetic linkage maps. It includes several new major features: (i) handling of very large genotyping datasets like the ones generated by GBS; (ii) direct importation and conversion of Variant Call Format (VCF) files; (iii) detection of linkage, i.e. construction of linkage groups in case of segregation distortion; (iv) data imputation on VCF files using a new approach, called LB-Impute. Features i to iv operate through inclusion of new Java modules that are used transparently by MapDisto; (v) QTL detection via a new R/qtl graphical interface. The program is available free of charge at mapdisto.free.fr. mapdisto@gmail.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Integration of curated databases to identify genotype-phenotype associations

Directory of Open Access Journals (Sweden)

Li Jianrong

2006-10-01

Full Text Available Abstract Background The ability to rapidly characterize an unknown microorganism is critical in both responding to infectious disease and biodefense. To do this, we need some way of anticipating an organism's phenotype based on the molecules encoded by its genome. However, the link between molecular composition (i.e. genotype and phenotype for microbes is not obvious. While there have been several studies that address this challenge, none have yet proposed a large-scale method integrating curated biological information. Here we utilize a systematic approach to discover genotype-phenotype associations that combines phenotypic information from a biomedical informatics database, GIDEON, with the molecular information contained in National Center for Biotechnology Information's Clusters of Orthologous Groups database (NCBI COGs. Results Integrating the information in the two databases, we are able to correlate the presence or absence of a given protein in a microbe with its phenotype as measured by certain morphological characteristics or survival in a particular growth media. With a 0.8 correlation score threshold, 66% of the associations found were confirmed by the literature and at a 0.9 correlation threshold, 86% were positively verified. Conclusion Our results suggest possible phenotypic manifestations for proteins biochemically associated with sugar metabolism and electron transport. Moreover, we believe our approach can be extended to linking pathogenic phenotypes with functionally related proteins.
Exploring the Interplay between Rescue Drugs, Data Imputation, and Study Outcomes: Conceptual Review and Qualitative Analysis of an Acute Pain Data Set.

Science.gov (United States)

Singla, Neil K; Meske, Diana S; Desjardins, Paul J

2017-12-01

In placebo-controlled acute surgical pain studies, provisions must be made for study subjects to receive adequate analgesic therapy. As such, most protocols allow study subjects to receive a pre-specified regimen of open-label analgesic drugs (rescue drugs) as needed. The selection of an appropriate rescue regimen is a critical experimental design choice. We hypothesized that a rescue regimen that is too liberal could lead to all study arms receiving similar levels of pain relief (thereby confounding experimental results), while a regimen that is too stringent could lead to a high subject dropout rate (giving rise to a preponderance of missing data). Despite the importance of rescue regimen as a study design feature, there exist no published review articles or meta-analysis focusing on the impact of rescue therapy on experimental outcomes. Therefore, when selecting a rescue regimen, researchers must rely on clinical factors (what analgesics do patients usually receive in similar surgical scenarios) and/or anecdotal evidence. In the following article, we attempt to bridge this gap by reviewing and discussing the experimental impacts of rescue therapy on a common acute surgical pain population: first metatarsal bunionectomy. The function of this analysis is to (1) create a framework for discussion and future exploration of rescue as a methodological study design feature, (2) discuss the interplay between data imputation techniques and rescue drugs, and (3) inform the readership regarding the impact of data imputation techniques on the validity of study conclusions. Our findings indicate that liberal rescue may degrade assay sensitivity, while stringent rescue may lead to unacceptably high dropout rates.
Establishment of a novel two-probe real-time PCR for simultaneously quantification of hepatitis B virus DNA and distinguishing genotype B from non-B genotypes.

Science.gov (United States)

Wang, Wei; Liang, Hongpin; Zeng, Yongbin; Lin, Jinpiao; Liu, Can; Jiang, Ling; Yang, Bin; Ou, Qishui

2014-11-01

Establishment of a simple, rapid and economical method for quantification and genotyping of hepatitis B virus (HBV) is of great importance for clinical diagnosis and treatment of chronic hepatitis B patients. We hereby aim to develop a novel two-probe real-time PCR for simultaneous quantification of HBV viral concentration and distinguishing genotype B from non-B genotypes. Conserved primers and TaqMan probes for genotype B and non-B genotypes were designed. The linear range, detection sensitivity, specificity and repeatability of the method were assessed. 539 serum samples from HBV-infected patients were assayed, and the results were compared with commercial HBV quantification and HBV genotyping kits. The detection sensitivity of the two-probe real-time PCR was 500IU/ml; the linear range was 10(3)-10(9)IU/ml, and the intra-assay CVs and inter-assay CVs were between 0.84% and 2.80%. No cross-reaction was observed between genotypes B and non-B. Of the 539 detected samples, 509 samples were HBV DNA positive. The results showed that 54.0% (275/509) of the samples were genotype B, 39.5% (201/509) were genotype non-B and 6.5% (33/509) were mixed genotype. The coincidence rate between the method and a commercial HBV DNA genotyping kit was 95.9% (488/509, kappa=0.923, PDNA qPCR kit were achieved. A novel two-probe real-time PCR method for simultaneous quantification of HBV viral concentration and distinguishing genotype B from non-B genotypes was established. The assay was sensitive, specific and reproducible which can be applied to areas prevalent with HBV genotypes B and C, especially in China. Copyright © 2014 Elsevier B.V. All rights reserved.
Genotyping of the 19-bp insertion/deletion polymorphism in the 5' flank of beta-hydroxylase gene by dissociation analysis of allele-specific PCR products

DEFF Research Database (Denmark)

Rasmussen, Henrik Berg; Werge, Thomas

2005-01-01

The 19-bp insertion/deletion polymorphism in the 5' flank of the dopamine beta-hydroxylase (DBH) gene has been associated with psychiatric disorders. We have developed a simple, reliable and inexpensive closed-tube assay for genotyping of this polymorphism based upon T(m) determination of amplified...... and a conventional approach based upon agarose gel electrophoresis of amplified fragments revealed complete concordance between the two procedures. The insights obtained in this study may be utilized to develop assays based upon dissociation analysis of PCR products for genotyping of other insertion...
SNPexp - A web tool for calculating and visualizing correlation between HapMap genotypes and gene expression levels

Directory of Open Access Journals (Sweden)

Franke Andre

2010-12-01

Full Text Available Abstract Background Expression levels for 47294 transcripts in lymphoblastoid cell lines from all 270 HapMap phase II individuals, and genotypes (both HapMap phase II and III of 3.96 million single nucleotide polymorphisms (SNPs in the same individuals are publicly available. We aimed to generate a user-friendly web based tool for visualization of the correlation between SNP genotypes within a specified genomic region and a gene of interest, which is also well-known as an expression quantitative trait locus (eQTL analysis. Results SNPexp is implemented as a server-side script, and publicly available on this website: http://tinyurl.com/snpexp. Correlation between genotype and transcript expression levels are calculated by performing linear regression and the Wald test as implemented in PLINK and visualized using the UCSC Genome Browser. Validation of SNPexp using previously published eQTLs yielded comparable results. Conclusions SNPexp provides a convenient and platform-independent way to calculate and visualize the correlation between HapMap genotypes within a specified genetic region anywhere in the genome and gene expression levels. This allows for investigation of both cis and trans effects. The web interface and utilization of publicly available and widely used software resources makes it an attractive supplement to more advanced bioinformatic tools. For the advanced user the program can be used on a local computer on custom datasets.
Multiple Imputation of Groundwater Data to Evaluate Spatial and Temporal Anthropogenic Influences on Subsurface Water Fluxes in Los Angeles, CA

Science.gov (United States)

Manago, K. F.; Hogue, T. S.; Hering, A. S.

2014-12-01

In the City of Los Angeles, groundwater accounts for 11% of the total water supply on average, and 30% during drought years. Due to ongoing drought in California, increased reliance on local water supply highlights the need for better understanding of regional groundwater dynamics and estimating sustainable groundwater supply. However, in an urban setting, such as Los Angeles, understanding or modeling groundwater levels is extremely complicated due to various anthropogenic influences such as groundwater pumping, artificial recharge, landscape irrigation, leaking infrastructure, seawater intrusion, and extensive impervious surfaces. This study analyzes anthropogenic effects on groundwater levels using groundwater monitoring well data from the County of Los Angeles Department of Public Works. The groundwater data is irregularly sampled with large gaps between samples, resulting in a sparsely populated dataset. A multiple imputation method is used to fill the missing data, allowing for multiple ensembles and improved error estimates. The filled data is interpolated to create spatial groundwater maps utilizing information from all wells. The groundwater data is evaluated at a monthly time step over the last several decades to analyze the effect of land cover and identify other influencing factors on groundwater levels spatially and temporally. Preliminary results show irrigated parks have the largest influence on groundwater fluctuations, resulting in large seasonal changes, exceeding changes in spreading grounds. It is assumed that these fluctuations are caused by watering practices required to sustain non-native vegetation. Conversely, high intensity urbanized areas resulted in muted groundwater fluctuations and behavior decoupling from climate patterns. Results provides improved understanding of anthropogenic effects on groundwater levels in addition to providing high quality datasets for validation of regional groundwater models.
Genotype-Specific Measles Transmissibility: A Branching Process Analysis.

Science.gov (United States)

Ackley, Sarah F; Hacker, Jill K; Enanoria, Wayne T A; Worden, Lee; Blumberg, Seth; Porco, Travis C; Zipprich, Jennifer

2018-04-03

Substantial heterogeneity in measles outbreak sizes may be due to genotype-specific transmissibility. Using a branching process analysis, we characterize differences in measles transmission by estimating the association between genotype and the reproduction number R among postelimination California measles cases during 2000-2015 (400 cases, 165 outbreaks). Assuming a negative binomial secondary case distribution, we fit a branching process model to the distribution of outbreak sizes using maximum likelihood and estimated the reproduction number R for a multigenotype model. Genotype B3 is found to be significantly more transmissible than other genotypes (P = .01) with an R of 0.64 (95% confidence interval [CI], .48-.71), while the R for all other genotypes combined is 0.43 (95% CI, .28-.54). This result is robust to excluding the 2014-2015 outbreak linked to Disneyland theme parks (referred to as "outbreak A" for conciseness and clarity) (P = .04) and modeling genotype as a random effect (P = .004 including outbreak A and P = .02 excluding outbreak A). This result was not accounted for by season of introduction, age of index case, or vaccination of the index case. The R for outbreaks with a school-aged index case is 0.69 (95% CI, .52-.78), while the R for outbreaks with a non-school-aged index case is 0.28 (95% CI, .19-.35), but this cannot account for differences between genotypes. Variability in measles transmissibility may have important implications for measles control; the vaccination threshold required for elimination may not be the same for all genotypes or age groups.
Oilseed rape genotypes response to boron toxicity

Directory of Open Access Journals (Sweden)

Savić Jasna

2013-01-01

Full Text Available Response of 16 oilseed rape genotypes to B (boron toxicity was analyzed by comparing the results of two experiments conducted in a glasshouse. In Experiment 1 plants were grown in standard nutrient solutions with 10 µMB (control and 1000 µM B. Relative root and shoot growth varied from 20-120% and 31-117%, respectively. Variation in B concentration in shoots was also wide (206.5-441.7 µg B g-1 DW as well as total B uptake by plant (62.3-281.2 µg B g1. Four selected genotypes were grown in Experiment 2 in pots filled with high B soil (8 kg ha-1 B; B8. Shoot growth was not affected by B8 treatment, while root and shoot B concentration was significantly increased compared to control. Genotypes Panther and Pronto which performed low relative root and shoot growth and high B accumulation in plants in Experiment 1, had good growth in B8 treatment. In Experiment 2 genotype NS-L-7 had significantly lower B concentration in shots under treatment B8, but also very high B accumulation in Experiment 1. In addition, cluster analyses classified genotypes in three groups according to traits contrasting in their significance for analyzing response to B toxicity. The first group included four varieties based on their shared characteristics that have small value for the relative growth of roots and shoots and large values of B concentration in shoot. In the second largest group were connected ten genotypes that are heterogeneous in traits and do not stand out on any characteristic. Genotypes NS-L-7 and Navajo were separated in the third group because they had big relative growth of root and shoot, but also a high concentration of B in the shoot, and high total B uptake. Results showed that none of tested genotypes could not be recommended for breeding process to tolerance for B toxicity. [Projekat Ministarstva nauke Republike Srbije, br. OI 173028
Estimating Classification Errors Under Edit Restrictions in Composite Survey-Register Data Using Multiple Imputation Latent Class Modelling (MILC

Directory of Open Access Journals (Sweden)

Boeschoten Laura

2017-12-01

Full Text Available Both registers and surveys can contain classification errors. These errors can be estimated by making use of a composite data set. We propose a new method based on latent class modelling to estimate the number of classification errors across several sources while taking into account impossible combinations with scores on other variables. Furthermore, the latent class model, by multiply imputing a new variable, enhances the quality of statistics based on the composite data set. The performance of this method is investigated by a simulation study, which shows that whether or not the method can be applied depends on the entropy R2 of the latent class model and the type of analysis a researcher is planning to do. Finally, the method is applied to public data from Statistics Netherlands.
Hepatitis C Virus: Viral Quasispecies and Genotypes.

Science.gov (United States)

Tsukiyama-Kohara, Kyoko; Kohara, Michinori

2017-12-22

Hepatitis C virus (HCV) mainly replicates in the cytoplasm, where it easily establishes persistent infection, resulting in chronic hepatitis, liver cirrhosis, and hepatocellular carcinoma. Due to its high rate of mutation, HCV forms viral quasispecies, categorized based on the highly variable regions in the envelope protein and nonstructural 5A protein. HCV possesses seven major genotypes, among which genotype 1 is the most prevalent globally. The distribution of HCV genotypes varies based on geography, and each genotype has a different sensitivity to interferon treatment. Recently-developed direct-acting antivirals (DAAs), which target viral proteases or polymerases, mediate drastically better antiviral effects than previous therapeutics. Although treatment with DAAs has led to the development of drug-resistant HCV mutants, the most recently approved DAAs show improved pan-genomic activity, with a higher barrier to viral resistance.
Response of cotton genotypes to boron under-b-adequate conditions

International Nuclear Information System (INIS)

Shah, J. A.; Sial, M. A.; Hassan, Z. U.; Rajpar, I.

2015-01-01

Balanced boron (B) application is well-known to enhance the cotton production; however, the narrow range between B-deficiency and toxicity levels makes it difficult to manage. Cotton genotypes extensively differ in their response to B requirements. The adequate dose of B for one genotype may be insufficient or even toxic to other genotype. The effects of boron (B) on seed cotton yield and its various yield associated traits were studied on 10 cotton genotypes of Pakistan. The pot studies were undertaken to categorize cotton genotypes using B-deficient (control) and B-adequate (2.0 kg B ha-1) levels arranged in CRD with four repeats. The results indicated that the seed cotton yield, yield attributes and B-uptake of genotypes were comparatively decreased in B-deficient stressed treatment. Genotype NIA-Ufaq exhibited wide range of adaptation and ranked as efficient-responsive, as it produced higher seed cotton yield under both B-regimes. SAU-2 and CIM-506 were highly-efficient and remaining all genotypes were medium-efficient. Genotype Sindh-1 produced low seed cotton yield under B deficient condition and ranked as low-efficient. B-efficient cotton genotypes can be grown in B deficient soils without B application. (author)
Micropropagation of six Paulownia genotypes through tissue culture

Directory of Open Access Journals (Sweden)

Lydia Shtereva

2014-12-01

Full Text Available We investigated the effect of genotype and culture medium on the in vitro germination and development of plantlets from seeds of 6 different Paulownia genotypes (P. tomentosa, hybrid lines P. tomentosa P. fortunei (Mega, Ganter and Caroline, P. elongata and hybrid line P. elongata P. fortunei. Nodal and shoot tip explants were used for micropropagation of Paulownia genotypes by manipulating plant growth regulators. The highest germination percentage for all genotypes was obtained for seeds inoculated on medium supplemented with 50 mg*L GA3 (MSG2. On Thidiazuron containing media, the explants of hybrid line P. elongata P. fortunei exhibited the highest frequency of axillary shoot proliferation following by P. tomentosa P. fortunei. The results are discussed with the perspective of applying an improved protocol for in vitro seed germination and plantlet formation in several economically valuable Paulownia genotypes.

Genotypic and phenotypic characterization of Chikungunya virus of different genotypes from Malaysia.

Directory of Open Access Journals (Sweden)

I-Ching Sam

Full Text Available BACKGROUND: Mosquito-borne Chikungunya virus (CHIKV has recently re-emerged globally. The epidemic East/Central/South African (ECSA strains have spread for the first time to Asia, which previously only had endemic Asian strains. In Malaysia, the ECSA strain caused an extensive nationwide outbreak in 2008, while the Asian strains only caused limited outbreaks prior to this. To gain insight into these observed epidemiological differences, we compared genotypic and phenotypic characteristics of CHIKV of Asian and ECSA genotypes isolated in Malaysia. METHODS AND FINDINGS: CHIKV of Asian and ECSA genotypes were isolated from patients during outbreaks in Bagan Panchor in 2006, and Johor in 2008. Sequencing of the CHIKV strains revealed 96.8% amino acid similarity, including an unusual 7 residue deletion in the nsP3 protein of the Asian strain. CHIKV replication in cells and Aedes mosquitoes was measured by virus titration. There were no differences in mammalian cell lines. The ECSA strain reached significantly higher titres in Ae. albopictus cells (C6/36. Both CHIKV strains infected Ae. albopictus mosquitoes at a higher rate than Ae. aegypti, but when compared to each other, the ECSA strain had much higher midgut infection and replication, and salivary gland dissemination, while the Asian strain infected Ae. aegypti at higher rates. CONCLUSIONS: The greater ability of the ECSA strain to replicate in Ae. albopictus may explain why it spread far more quickly and extensively in humans in Malaysia than the Asian strain ever did, particularly in rural areas where Ae. albopictus predominates. Intergenotypic genetic differences were found at E1, E2, and nsP3 sites previously reported to be determinants of host adaptability in alphaviruses. Transmission of CHIKV in humans is influenced by virus strain and vector species, which has implications for regions with more than one circulating CHIKV genotype and Aedes species.
Using Upland Rice Root Traits to Identify N Use Efficient Genotypes for Limited Soil Nutrient Conditions

Energy Technology Data Exchange (ETDEWEB)

Traore, K.; Traore, O. [INERA / Station de Farakoba, Bobo-Dioulasso (Burkina Faso); Bado, V. B. [Africa Rice Center (AfricaRice), Saint Louis (Senegal)

2013-11-15

Crop production in the Sahelian countries of Africa is limited by many factors. The most important are low potential yields of local varieties, low inherent soil fertility and low applications of external inputs (organic and mineral fertilizers). A field experiment was conducted from 2007 to 2008 with the objective to develop and validate screening protocols for plant traits that enhance N acquisition and utilization in upland rice grown in low N soils of two hundred (200) upland rice (Oryza sativa L.) genotypes from WAB, NERICA, CNA, CNAX, IRAT and IR lines. An experiment in small pots was carried out in a greenhouse of Farakoba research center. The pots were filled with a sandy soil and upland rice genotypes were grown during three weeks, harvested and studied for their root characteristics (seminal root length, adventitious root number, lateral root length and number and roots hair density). The small pot method was reliable for root trait characterisation at the seedling stage. A large variability among genotypes was exhibited for the root characteristics. The variability was larger within the NERICA and WAB lines compared to the other lines. The length of the seminal roots varied from 10 to 40 cm, the lateral root number ranged between 3 and 15 and the number of adventitious roots varied between 2 and 7. The selected root traits can be used to identify high nutrients and water use efficient genotypes. (author)
Comparison of Imputation Methods for Handling Missing Categorical Data with Univariate Pattern|| Una comparación de métodos de imputación de variables categóricas con patrón univariado

Directory of Open Access Journals (Sweden)

Torres Munguía, Juan Armando

2014-06-01

Full Text Available This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random forests. Results showed that the most effective methods for dealing with missing categorical data in most of the scenarios assessed in this paper were hot-deck and polytomous regression approaches. || El presente estudio examina la estimación de proporciones muestrales en la presencia de valores faltantes en una variable categórica. Se utiliza una encuesta de consumo de tabaco (Encuesta Nacional de Adicciones de México 2011 para crear bases de datos simuladas pero reales con 5% y 15% de valores perdidos para cada mecanismo de no respuesta MCAR, MAR y MNAR. Se evalúa el desempeño de seis métodos para tratar la falta de respuesta: listwise, imputación de moda, imputación aleatoria, hot-deck, imputación por regresión politómica y árboles de clasificación. Los resultados de las simulaciones indican que los métodos más efectivos para el tratamiento de la no respuesta en variables categóricas, bajo los escenarios simulados, son hot-deck y la regresión politómica.
Whole-Genome Sequencing and iPLEX MassARRAY Genotyping Map an EMS-Induced Mutation Affecting Cell Competition in Drosophila melanogaster.

Science.gov (United States)

Lee, Chang-Hyun; Rimesso, Gerard; Reynolds, David M; Cai, Jinlu; Baker, Nicholas E

2016-10-13

Cell competition, the conditional loss of viable genotypes only when surrounded by other cells, is a phenomenon observed in certain genetic mosaic conditions. We conducted a chemical mutagenesis and screen to recover new mutations that affect cell competition between wild-type and RpS3 heterozygous cells. Mutations were identified by whole-genome sequencing, making use of software tools that greatly facilitate the distinction between newly induced mutations and other sources of apparent sequence polymorphism, thereby reducing false-positive and false-negative identification rates. In addition, we utilized iPLEX MassARRAY for genotyping recombinant chromosomes. These approaches permitted the mapping of a new mutation affecting cell competition when only a single allele existed, with a phenotype assessed only in genetic mosaics, without the benefit of complementation with existing mutations, deletions, or duplications. These techniques expand the utility of chemical mutagenesis and whole-genome sequencing for mutant identification. We discuss mutations in the Atm and Xrp1 genes identified in this screen. Copyright © 2016 Lee et al.
Representativeness of Tuberculosis Genotyping Surveillance in the United States, 2009-2010.

Science.gov (United States)

Shak, Emma B; France, Anne Marie; Cowan, Lauren; Starks, Angela M; Grant, Juliana

2015-01-01

Genotyping of Mycobacterium tuberculosis isolates contributes to tuberculosis (TB) control through detection of possible outbreaks. However, 20% of U.S. cases do not have an isolate for testing, and 10% of cases with isolates do not have a genotype reported. TB outbreaks in populations with incomplete genotyping data might be missed by genotyping-based outbreak detection. Therefore, we assessed the representativeness of TB genotyping data by comparing characteristics of cases reported during January 1, 2009-December 31, 2010, that had a genotype result with those cases that did not. Of 22,476 cases, 14,922 (66%) had a genotype result. Cases without genotype results were more likely to be patients <19 years of age, with unknown HIV status, of female sex, U.S.-born, and with no recent history of homelessness or substance abuse. Although cases with a genotype result are largely representative of all reported U.S. TB cases, outbreak detection methods that rely solely on genotyping data may underestimate TB transmission among certain groups.
HPV genotypes in invasive cervical cancer in Danish women

DEFF Research Database (Denmark)

Kirschner, Benny; Junge, Jette; Holl, Katsiaryna

2013-01-01

Human papillomavirus (HPV) genotype distribution in invasive cervical cancers may differ by geographic region. The primary objective of this study was to estimate HPV-genotype distribution in Danish women with a diagnosis of invasive cervical cancer.......Human papillomavirus (HPV) genotype distribution in invasive cervical cancers may differ by geographic region. The primary objective of this study was to estimate HPV-genotype distribution in Danish women with a diagnosis of invasive cervical cancer....
Hepatitis C viral load, genotype 3 and interleukin-28B CC genotype predict mortality in HIV and hepatitis C-coinfected individuals

DEFF Research Database (Denmark)

Clausen, Louise Nygaard; Astvad, Karen; Ladelund, Steen

2012-01-01

OBJECTIVE: We hypothesized that hepatitis C virus (HCV) load and genotype may influence all-cause mortality in HIV-HCV-coinfected individuals. DESIGN AND METHODS: Observational prospective cohort study. Mortality rates were compared in a time-updated multivariate Poisson regression analysis....... RESULTS: We included 264 consecutive HIV-HCV-coinfected individuals. During 1143 person years at risk (PYR) 118 individuals died [overall mortality rate 10 (95% confidence interval; 8, 12)/100 PYR]. In multivariate analysis, a 1 log increase in HCV viral load was associated with a 30% higher mortality......) CC genotype was associated with 54% higher mortality risk [aMRR: 1.54 (0.89, 3.82] compared to TT genotype. CONCLUSION: High-HCV viral load, HCV genotype 3 and IL28B genotype CC had a significant influence on the risk of all-cause mortality among individuals coinfected with HIV-1. This may have...
Modeling coverage gaps in haplotype frequencies via Bayesian inference to improve stem cell donor selection.

Science.gov (United States)

Louzoun, Yoram; Alter, Idan; Gragert, Loren; Albrecht, Mark; Maiers, Martin

2018-05-01

Regardless of sampling depth, accurate genotype imputation is limited in regions of high polymorphism which often have a heavy-tailed haplotype frequency distribution. Many rare haplotypes are thus unobserved. Statistical methods to improve imputation by extending reference haplotype distributions using linkage disequilibrium patterns that relate allele and haplotype frequencies have not yet been explored. In the field of unrelated stem cell transplantation, imputation of highly polymorphic human leukocyte antigen (HLA) genes has an important application in identifying the best-matched stem cell donor when searching large registries totaling over 28,000,000 donors worldwide. Despite these large registry sizes, a significant proportion of searched patients present novel HLA haplotypes. Supporting this observation, HLA population genetic models have indicated that many extant HLA haplotypes remain unobserved. The absent haplotypes are a significant cause of error in haplotype matching. We have applied a Bayesian inference methodology for extending haplotype frequency distributions, using a model where new haplotypes are created by recombination of observed alleles. Applications of this joint probability model offer significant improvement in frequency distribution estimates over the best existing alternative methods, as we illustrate using five-locus HLA frequency data from the National Marrow Donor Program registry. Transplant matching algorithms and disease association studies involving phasing and imputation of rare variants may benefit from this statistical inference framework.
Cotton genotypes selection through artificial neural networks.

Science.gov (United States)

Júnior, E G Silva; Cardoso, D B O; Reis, M C; Nascimento, A F O; Bortolin, D I; Martins, M R; Sousa, L B

2017-09-27

Breeding programs currently use statistical analysis to assist in the identification of superior genotypes at various stages of a cultivar's development. Differently from these analyses, the computational intelligence approach has been little explored in genetic improvement of cotton. Thus, this study was carried out with the objective of presenting the use of artificial neural networks as auxiliary tools in the improvement of the cotton to improve fiber quality. To demonstrate the applicability of this approach, this research was carried out using the evaluation data of 40 genotypes. In order to classify the genotypes for fiber quality, the artificial neural networks were trained with replicate data of 20 genotypes of cotton evaluated in the harvests of 2013/14 and 2014/15, regarding fiber length, uniformity of length, fiber strength, micronaire index, elongation, short fiber index, maturity index, reflectance degree, and fiber quality index. This quality index was estimated by means of a weighted average on the determined score (1 to 5) of each characteristic of the HVI evaluated, according to its industry standards. The artificial neural networks presented a high capacity of correct classification of the 20 selected genotypes based on the fiber quality index, so that when using fiber length associated with the short fiber index, fiber maturation, and micronaire index, the artificial neural networks presented better results than using only fiber length and previous associations. It was also observed that to submit data of means of new genotypes to the neural networks trained with data of repetition, provides better results of classification of the genotypes. When observing the results obtained in the present study, it was verified that the artificial neural networks present great potential to be used in the different stages of a genetic improvement program of the cotton, aiming at the improvement of the fiber quality of the future cultivars.
Characterization of some sunflower genotypes using ISSR markers

International Nuclear Information System (INIS)

Mokrani, L.; Nabulsi, I.; MirAli, N.

2014-01-01

Sunflower (Helianthus annuus L.) is grown mostly as a source of vegetable oil of high quality and is especially used in food industry. It is generally produced by multinationals and sold as hybrids. Our research, based on two techniques (ISSR and RAPD), is considered as the first one to be interested in molecular characterization of sunflower genotypes in Syria. We used 25 ISSR primers and 13 RAPD primers to study 29 sunflower genotypes and two reference controls belonging to the same family (Calendula officinalis L. and Targets erecta L.). ISSR results revealed a low polymorphism when compared to other studies. We noticed also 11 genotypes genetically related where percent disagreement values (PDV) didn't exceed 1%, they are 7189 - 7191 - 7184 - 7183 - 443 - 441 - Ghab1 -Ghab2 - Ghab3 - Ghab4 - Ghab5 - Madakh halab - Sarghaya4 -Tarkibi knitra. Sarghaya4 and Tarkibi knitra have indeed the lowest yield and some common morphological characters. At the opposite, the genotype Hysum33 has the highest yield and is genetically distant from the other genotypes. All the genotypes could be used in QTL detection as we didn't notice any similarity between them. (author)
Large scale association analysis identifies three susceptibility loci for coronary artery disease.

Directory of Open Access Journals (Sweden)

Stephanie Saade

Full Text Available Genome wide association studies (GWAS and their replications that have associated DNA variants with myocardial infarction (MI and/or coronary artery disease (CAD are predominantly based on populations of European or Eastern Asian descent. Replication of the most significantly associated polymorphisms in multiple populations with distinctive genetic backgrounds and lifestyles is crucial to the understanding of the pathophysiology of a multifactorial disease like CAD. We have used our Lebanese cohort to perform a replication study of nine previously identified CAD/MI susceptibility loci (LTA, CDKN2A-CDKN2B, CELSR2-PSRC1-SORT1, CXCL12, MTHFD1L, WDR12, PCSK9, SH2B3, and SLC22A3, and 88 genes in related phenotypes. The study was conducted on 2,002 patients with detailed demographic, clinical characteristics, and cardiac catheterization results. One marker, rs6922269, in MTHFD1L was significantly protective against MI (OR=0.68, p=0.0035, while the variant rs4977574 in CDKN2A-CDKN2B was significantly associated with MI (OR=1.33, p=0.0086. Associations were detected after adjustment for family history of CAD, gender, hypertension, hyperlipidemia, diabetes, and smoking. The parallel study of 88 previously published genes in related phenotypes encompassed 20,225 markers, three quarters of which with imputed genotypes The study was based on our genome-wide genotype data set, with imputation across the whole genome to HapMap II release 22 using HapMap CEU population as a reference. Analysis was conducted on both the genotyped and imputed variants in the 88 regions covering selected genes. This approach replicated HNRNPA3P1-CXCL12 association with CAD and identified new significant associations of CDKAL1, ST6GAL1, and PTPRD with CAD. Our study provides evidence for the importance of the multifactorial aspect of CAD/MI and describes genes predisposing to their etiology.
Hepatitis C Virus: Viral Quasispecies and Genotypes

Directory of Open Access Journals (Sweden)

Kyoko Tsukiyama-Kohara

2017-12-01

Full Text Available Hepatitis C virus (HCV mainly replicates in the cytoplasm, where it easily establishes persistent infection, resulting in chronic hepatitis, liver cirrhosis, and hepatocellular carcinoma. Due to its high rate of mutation, HCV forms viral quasispecies, categorized based on the highly variable regions in the envelope protein and nonstructural 5A protein. HCV possesses seven major genotypes, among which genotype 1 is the most prevalent globally. The distribution of HCV genotypes varies based on geography, and each genotype has a different sensitivity to interferon treatment. Recently-developed direct-acting antivirals (DAAs, which target viral proteases or polymerases, mediate drastically better antiviral effects than previous therapeutics. Although treatment with DAAs has led to the development of drug-resistant HCV mutants, the most recently approved DAAs show improved pan-genomic activity, with a higher barrier to viral resistance.
Prevalence of genotype D in chronic liver disease patients with occult HBV infection in northern region of India

Directory of Open Access Journals (Sweden)

Meher Rizvi

2014-01-01

Full Text Available Background: Etiology of nearly 30% cases of chronic viral hepatitis remains undetected. Occult HBV infection (OBI has emerged as an important clinical entity in this scenario. Apart from prevalence and clinical outcome of OBI patients genotype was determined in northern region of India. Materials and Methods: A total of 847 patients with chronic liver disease (CLD were screened for common viral etiologies and others serological markers of HBV. Amplification of surface, precore and polymerase genes of HBV was performed in patients negative for other etiologies. Genotyping and sequencing of the precore region was performed for OBI cases. Results: Twenty-nine (7.61% cases of OBI were identifiedof which 9 had chronic liver disease (CHD, 11 liver cirrhosis (LC and 9 hepatocellular carcinoma (HCC. Majority of OBI cases were detected by amplification of surface gene 26 (89.6%, followed by pre-core gene 12 (41.3%. Their liver functions tests were significantly deranged in comparison to overt HBV cases. IgG anti HBc was present in 8 (27.6% OBI cases. Mutation was observed in 8 (32% in pre-core region at nt. 1896 of overt HBV cases. Genotype D was the predominant genotype. In conclusion: OBI in our study was characterized by predominance of genotype D and more severe clinical and biochemical profile in comparison to overt HBV. IgG anti HBc positivity could be utilized as a marker of OBI. We recommend use of sensitive nested PCR for diagnosis of OBI, amplifying at least surface and precore gene.
Neutralizing antibodies in patients with chronic hepatitis C, genotype 1, against a panel of genotype 1 culture viruses

DEFF Research Database (Denmark)

Pedersen, Jannie; Jensen, Tanja B; Carlsen, Thomas H R

2013-01-01

, infection treated with pegylated interferon-α and ribavirin. Thirty-nine patients with chronic hepatitis C, genotype 1a or 1b, with either sustained virologic response (n = 23) or non-sustained virologic response (n = 16) were enrolled. Samples taken prior to treatment were tested for their ability...... to neutralize 6 different HCV genotype 1 cell culture recombinants (1a: H77/JFH1, TN/JFH1, DH6/JFH1; 1b: J4/JFH1, DH1/JFH1, DH5/JFH1). The results were expressed as the highest dilution yielding 50% neutralization (NAb50-titer). We observed no genotype or subtype specific differences in NAb50-titers between......The correlation of neutralizing antibodies to treatment outcome in patients with chronic hepatitis C virus (HCV) infection has not been established. The aim of this study was to determine whether neutralizing antibodies could be used as an outcome predictor in patients with chronic HCV, genotype 1...
Genotypic diversity of european Phytophthora ramorum isolates based on SSR analysis

Science.gov (United States)

Kris Van Poucke; Annelies Vercauteren; Martine Maes; Sabine Werres; Kurt Heungens

2013-01-01

in Scotland were genotyped using seven microsatellite markers as described by Vercauteren et al. (2010). Thirty multilocus genotypes were identified within the Scottish population, with 51 percent of the isolates belonging to the main European genotype EU1MG1 and 13 unique detected genotypes. Ten of those genotypes were site specific, often represented by...
Development and characterization of a high density SNP genotyping assay for cattle.

Directory of Open Access Journals (Sweden)

Lakshmi K Matukumalli

Full Text Available The success of genome-wide association (GWA studies for the detection of sequence variation affecting complex traits in human has spurred interest in the use of large-scale high-density single nucleotide polymorphism (SNP genotyping for the identification of quantitative trait loci (QTL and for marker-assisted selection in model and agricultural species. A cost-effective and efficient approach for the development of a custom genotyping assay interrogating 54,001 SNP loci to support GWA applications in cattle is described. A novel algorithm for achieving a compressed inter-marker interval distribution proved remarkably successful, with median interval of 37 kb and maximum predicted gap of <350 kb. The assay was tested on a panel of 576 animals from 21 cattle breeds and six outgroup species and revealed that from 39,765 to 46,492 SNP are polymorphic within individual breeds (average minor allele frequency (MAF ranging from 0.24 to 0.27. The assay also identified 79 putative copy number variants in cattle. Utility for GWA was demonstrated by localizing known variation for coat color and the presence/absence of horns to their correct genomic locations. The combination of SNP selection and the novel spacing algorithm allows an efficient approach for the development of high-density genotyping platforms in species having full or even moderate quality draft sequence. Aspects of the approach can be exploited in species which lack an available genome sequence. The BovineSNP50 assay described here is commercially available from Illumina and provides a robust platform for mapping disease genes and QTL in cattle.
A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method.

Science.gov (United States)

Yang, Jun-He; Cheng, Ching-Hsue; Chan, Chia-Pan

2017-01-01

Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir's water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir's water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.
Relationship of status of polymorphic rapd bands with genotypic ...

African Journals Online (AJOL)

Relationship of status of polymorphic rapd bands with genotypic adaptation in early finger millet genotypes. S Das, RC Misra, GR Rout, MC Pattanaik, S Aparajita. Abstract. Molecular characterisation of the 15 early duration finger millet (Eleusine coracana G) genotypes was done through RAPD markers. Twenty-five ...
21 CFR 862.3360 - Drug metabolizing enzyme genotyping system.

Science.gov (United States)

2010-04-01

... 21 Food and Drugs 8 2010-04-01 2010-04-01 false Drug metabolizing enzyme genotyping system. 862... Test Systems § 862.3360 Drug metabolizing enzyme genotyping system. (a) Identification. A drug metabolizing enzyme genotyping system is a device intended for use in testing deoxyribonucleic acid (DNA...
Genotype by environment interactions and yield stability of stem ...

African Journals Online (AJOL)

In a maize breeding program, potential genotypes are usually evaluated in different environments before desirable ones are selected. Genotype x environment (G x E) interaction is associated with the differential performance of genotypes tested at different locations and in different years, and influences selection and ...

HIV genotype resistance testing in antiretroviral (ART) exposed Indian children--a need of the hour.

Science.gov (United States)

Shah, Ira; Parikh, Shefali

2013-04-01

Development of drug resistance in HIV infected children with treatment failure is a major impediment to selection of appropriate therapy. HIV genotype resistance assays predict drug resistance on the basis of mutations in the viral genome. However, their clinical utility, especially in a resource limited setting is still a subject of debate. The authors report two cases in which both the children suffered from treatment failure of various antiretroviral therapy regimes. In both the cases, Genotype Resistance Testing (GRT) prompted a radical change from proposed failure therapy as per existing guidelines. GRT was specifically important for the selection of a new dual Nucleoside reverse transcriptase inhibitors (NRTI) component of failure regimen by identifying TAMS and M184V mutations in the HIV genome. These case reports highlight the importance of GRT in children failing multiple antiretroviral regimes; and emphasizes the need to recognize situations where GRT is absolutely essential to guide appropriate therapy, even in a resource limited setting.
Comparative salinity responses among tomato genotypes and rootstocks

International Nuclear Information System (INIS)

Oztekin, G.B.; Tuzel, Y.

2011-01-01

Salinity is a major constraint limiting agricultural crop productivity in the world. However, plant species and cultivars differ greatly in their response to salinity. This study was conducted in a greenhouse to determine the response of 4 commercial tomato rootstocks, 21 cultivars and 8 candidate varieties to salinity stress. Seeds were germinated in peat and when the plants were at the fifth-true leaf stage, salt treatment was initiated except control treatment. NaCl was added to nutrient solution daily with 25 mM concentration and had been reached to 200 mM final concentration. On harvest day, genotypes were classified based on the severity of leaf symptoms caused by NaCl treatment. After symptom scoring, the plants were harvested and leaf number, root length, stem length and diameter per plant were measured. The plants were separated into shoots and roots for dry matter production. Our results showed that, on average, NaCl stress decreased all parameters and the rootstocks gave the highest performance than genotypes. Among all rootstocks, three varieties (2211 and 2275) and ten genotypes (Astona, Astona RN, Caracas, Deniz, Durinta, Export, Gokce, Target, Yeni Talya and 144 HY) were selected as tolerant with slight chlorosis whereas the genotype Malike was selected as sensitive with severe chlorosis. Candidate varieties 2316 and 1482 were the most sensitive ones. Plant growth and dry matter production differed among the tested genotypes. However no correlation was found between plant growth and dry matter production. Rootstock Beaufort gave the highest shoot dry matter although Heman had highest root dry matter. Newton showed more shoot and root dry matter than other genotypes. It is concluded that screening of genotypes based on severity of symptoms at early stage of development and their dry matter production could be used as a tool to indicate genotypic variation to salt stress. (author)
Genotyping isolates of the entomopathogenic fungus Beauveria ...

African Journals Online (AJOL)

Multi-locus denaturing gradient gel electrophoresis (DGGE) analysis was developed to investigate the genotypes of Beauveria bassiana sensu lato. ... These results demonstrated that multi-locus DGGE is a potentially useful molecular marker for genotyping, identifying and tracking the fates of experimentally released ...
Evaluating potassium-use-efficiency of five cotton genotypes of pakistan

International Nuclear Information System (INIS)

Hassan, Z.U.; Kubar, K.A.

2014-01-01

Potassium (K) deficiency in Pakistani soils has been recently reported as the major limiting factor affecting sustainable cotton production. The present study was conducted to envisage how K nutrition affect the growth, biomass production, yield and K-use-efficiency of five cotton genotypes, NIBGE-3701, NIBGE-1524 (Bt-transgenic), Sadori, Sindh-1 and SAU-2 (non-Bt conventional), commonly grown in Pakistan. All five genotypes were raised at deficient and adequate K levels, i.e. 0 and 60 kg K/sub 2/O ha-1, respectively. The experiment was performed in plastic pots following a completely randomized factorial design with three repeats. Adequate K nutrition significantly increased various plant growth traits and yield of all cotton genotypes under study, viz. number of sympodia (21%), number of leaves (34%), leaf dry biomass (30%), shoot dry biomass (31%), number of bolls (50%) and yield of seed cotton (92%). Substantial variations were observed among cotton genotypes for their K-use-efficiency and K-response-efficiency. Sadori and SAU-2 were screened as most K-use-efficient cotton genotypes, while Sindh-1 and SAU-2 were ranked as the most K-responsive cotton genotypes. Interestingly, Sadori did not respond to K nutrition. Moreover, Bt cotton genotypes accumulated more K as compared to non-Bt genotypes. The cotton genotype SAU-2 was identified as efficient-response genotype for better adaptation for both low- and high-K-input sustainable cotton agriculture systems. (author)
Genotypic profile of Listeria monocytogenes isolated in refrigerated chickens in southern Rio Grande do Sul, Brazil

Directory of Open Access Journals (Sweden)

Karla Sequeira Mendonça

2016-01-01

Full Text Available ABSTRACT: Listeria monocytogenes is of notable concern to the food industry, due to its ubiquitous nature and ability to grow in adverse conditions. This study aimed to determine the genotypic profile of L. monocytogenes strains isolated from refrigerated chickens marketed in the southern part of Rio Grande do Sul, Brazil. The strains of L. monocytogenes isolated were characterized by serotyping and Pulsed Field Gel Electrophoresis (PFGE. Three different serotypes (1/2a, 1/2b and 4e were evaluated by PFGE, and the macrorestriction patterns utilizing enzymes AscI and ApaI, revealed five different pulsotypes. The presence of such varied genotypic profiles demonstrates the prevalence of L. monocytogenes contamination of chicken processing environments, which combined with ineffective cleaning procedures, allowing the survival, adaptation and proliferation of these pathogens, not only in the processing environment, but also in local grocery stores.
Genotype x environment interaction and stability analysis for yield ...

African Journals Online (AJOL)

Chickpea is the major pulse crop cultivated in Ethiopia. However, its production is constrained due to genotype instability and environmental variability. This research was carried out to examine the magnitude of environmental effect on yield of chickpea genotypes and to investigate the stability and adaptability of genotypes ...
Generation of recombinant European bat lyssavirus type 1 and inter-genotypic compatibility of lyssavirus genotype 1 and 5 antigenome promoters.

Science.gov (United States)

Orbanz, Jeannette; Finke, Stefan

2010-10-01

Bat lyssaviruses (Fam. Rhabdoviridae) represent a source for the infection of terrestial mammals and the development of rabies disease. Molecular differences in the replication of bat and non-bat lyssaviruses and their contribution to pathogenicity, however, are unknown. One reason for this is the lack of reverse genetics systems for bat-restricted lyssaviruses. To investigate bat lyssavirus replication and host adaptation, we developed a reverse genetics system for European bat lyssavirus type 1 (EBLV-1; genotype 5). This was achieved by co-transfection of HEK-293T cells with a full-length EBLV-1 genome cDNA and expression plasmids for EBLV-1 proteins, resulting in recombinant EBLV-1 (rEBLV-1). Replication of rEBLV-1 was comparable to that of parental virus, showing that rEBLV-1 is a valid tool to investigate EBLV-1 replication functions. In a first approach, we tested whether the terminal promoter sequences of EBLV-1 are genotype-specific. Although genotype 1 (rabies virus) minigenomes were successfully amplified by EBLV-1 helper virus, in the context of the complete virus, only the antigenome promoter (AGP) sequence of EBLV-1 was replaceable, as indicated by comparable replication of rEBLV-1 and the chimeric virus. These analyses demonstrate that the terminal AGPs of genotype 1 and genotype 5 lyssaviruses are compatible with those of the heterologous genotype.
Who cares and how much? The imputed economic contribution to the Canadian healthcare system of middle-aged and older unpaid caregivers providing care to the elderly.

Science.gov (United States)

Hollander, Marcus J; Liu, Guiping; Chappell, Neena L

2009-01-01

Canadians provide significant amounts of unpaid care to elderly family members and friends with long-term health problems. While some information is available on the nature of the tasks unpaid caregivers perform, and the amounts of time they spend on these tasks, the contribution of unpaid caregivers is often hidden. (It is recognized that some caregiving may be for short periods of time or may entail matters better described as "help" or "assistance," such as providing transportation. However, we use caregiving to cover the full range of unpaid care provided from some basic help to personal care.) Aggregate estimates of the market costs to replace the unpaid care provided are important to governments for policy development as they provide a means to situate the contributions of unpaid caregivers within Canada's healthcare system. The purpose of this study was to obtain an assessment of the imputed costs of replacing the unpaid care provided by Canadians to the elderly. (Imputed costs is used to refer to costs that would be incurred if the care provided by an unpaid caregiver was, instead, provided by a paid caregiver, on a direct hour-for-hour substitution basis.) The economic value of unpaid care as understood in this study is defined as the cost to replace the services provided by unpaid caregivers at rates for paid care providers.
Genotyping panel for assessing response to cancer chemotherapy

Directory of Open Access Journals (Sweden)

Hampel Heather

2008-06-01

Full Text Available Abstract Background Variants in numerous genes are thought to affect the success or failure of cancer chemotherapy. Interindividual variability can result from genes involved in drug metabolism and transport, drug targets (receptors, enzymes, etc, and proteins relevant to cell survival (e.g., cell cycle, DNA repair, and apoptosis. The purpose of the current study is to establish a flexible, cost-effective, high-throughput genotyping platform for candidate genes involved in chemoresistance and -sensitivity, and treatment outcomes. Methods We have adopted SNPlex for genotyping 432 single nucleotide polymorphisms (SNPs in 160 candidate genes implicated in response to anticancer chemotherapy. Results The genotyping panels were applied to 39 patients with chronic lymphocytic leukemia undergoing flavopiridol chemotherapy, and 90 patients with colorectal cancer. 408 SNPs (94% produced successful genotyping results. Additional genotyping methods were established for polymorphisms undetectable by SNPlex, including multiplexed SNaPshot for CYP2D6 SNPs, and PCR amplification with fluorescently labeled primers for the UGT1A1 promoter (TAnTAA repeat polymorphism. Conclusion This genotyping panel is useful for supporting clinical anticancer drug trials to identify polymorphisms that contribute to interindividual variability in drug response. Availability of population genetic data across multiple studies has the potential to yield genetic biomarkers for optimizing anticancer therapy.
Core Gene Expression and Association of Genotypes with Viral ...

African Journals Online (AJOL)

Purpose: To determine genotypic distribution, ribonucleic acid (RNA) RNA viral load and express core gene from Hepatitis C Virus (HCV) infected patients in Punjab, Pakistan. Methods: A total of 1690 HCV RNA positive patients were included in the study. HCV genotyping was tested by type-specific genotyping assay, viral ...
Genotype x Environment Interaction for Tuber Yield, Dry Matter ...

African Journals Online (AJOL)

A study was conducted to determine stability of tuber yield, dry matter content and specific gravity, and the nature and magnitude of genotype x environment (G x E) interaction in elite tetraploid potato genotypes. Eleven potato genotypes including two standard checks were evaluated in the eastern part of Ethiopia at ...
The genotype-phenotype map of an evolving digital organism.

Directory of Open Access Journals (Sweden)

Miguel A Fortuna

2017-02-01

Full Text Available To understand how evolving systems bring forth novel and useful phenotypes, it is essential to understand the relationship between genotypic and phenotypic change. Artificial evolving systems can help us understand whether the genotype-phenotype maps of natural evolving systems are highly unusual, and it may help create evolvable artificial systems. Here we characterize the genotype-phenotype map of digital organisms in Avida, a platform for digital evolution. We consider digital organisms from a vast space of 10141 genotypes (instruction sequences, which can form 512 different phenotypes. These phenotypes are distinguished by different Boolean logic functions they can compute, as well as by the complexity of these functions. We observe several properties with parallels in natural systems, such as connected genotype networks and asymmetric phenotypic transitions. The likely common cause is robustness to genotypic change. We describe an intriguing tension between phenotypic complexity and evolvability that may have implications for biological evolution. On the one hand, genotypic change is more likely to yield novel phenotypes in more complex organisms. On the other hand, the total number of novel phenotypes reachable through genotypic change is highest for organisms with simple phenotypes. Artificial evolving systems can help us study aspects of biological evolvability that are not accessible in vastly more complex natural systems. They can also help identify properties, such as robustness, that are required for both human-designed artificial systems and synthetic biological systems to be evolvable.
The genotype-phenotype map of an evolving digital organism.

Science.gov (United States)

Fortuna, Miguel A; Zaman, Luis; Ofria, Charles; Wagner, Andreas

2017-02-01

To understand how evolving systems bring forth novel and useful phenotypes, it is essential to understand the relationship between genotypic and phenotypic change. Artificial evolving systems can help us understand whether the genotype-phenotype maps of natural evolving systems are highly unusual, and it may help create evolvable artificial systems. Here we characterize the genotype-phenotype map of digital organisms in Avida, a platform for digital evolution. We consider digital organisms from a vast space of 10141 genotypes (instruction sequences), which can form 512 different phenotypes. These phenotypes are distinguished by different Boolean logic functions they can compute, as well as by the complexity of these functions. We observe several properties with parallels in natural systems, such as connected genotype networks and asymmetric phenotypic transitions. The likely common cause is robustness to genotypic change. We describe an intriguing tension between phenotypic complexity and evolvability that may have implications for biological evolution. On the one hand, genotypic change is more likely to yield novel phenotypes in more complex organisms. On the other hand, the total number of novel phenotypes reachable through genotypic change is highest for organisms with simple phenotypes. Artificial evolving systems can help us study aspects of biological evolvability that are not accessible in vastly more complex natural systems. They can also help identify properties, such as robustness, that are required for both human-designed artificial systems and synthetic biological systems to be evolvable.
Characteristics of Streptococcus mutans genotypes and dental caries in children

Science.gov (United States)

Cheon, Kyounga; Moser, Stephen A.; Wiener, Howard W.; Whiddon, Jennifer; Momeni, Stephanie S.; Ruby, John D.; Cutter, Gary R.; Childers, Noel K.

2013-01-01

This longitudinal cohort study evaluated the diversity, commonality, and stability of Streptococcus mutans genotypes associated with dental caries history. Sixty-seven 5 and 6 yr-old children, considered being at high caries risk, had plaque collected from baseline through 36 months for S. mutans isolation and genotyping with repetitive extragenic palindromic-PCR (4,392 total isolates). Decayed, missing, filled surfaces (dmfs/DMFS) for each child were recorded at baseline. At baseline, 18 distinct genotypes were found among 911 S. mutans isolates from 67 children (diversity) and 13 genotypes were shared by at least 2 children (commonality). The number of genotypes per individual was positively associated with the proportion of decayed surfaces (p-ds) at baseline. Twenty-four of the 39 children who were available at follow-up visits maintained a predominant genotype for the follow-up periods (stability) and was negatively associated with p-ds. The observed diversity, commonality, and stability of S. mutans genotypes represent a pattern of dental caries epidemiology in this high caries risk community, which suggest fewer decayed surfaces are significantly associated with lower diversity and stability of S. mutans genotypes. PMID:23659236
The Transmission Disequilibrium/Heterogeneity Test with Parental-Genotype Reconstruction for Refined Genetic Mapping of Complex Diseases

Directory of Open Access Journals (Sweden)

Jing Han

2012-01-01

Full Text Available In linkage analysis for mapping genetic diseases, the transmission/disequilibrium test (TDT uses the linkage disequilibrium (LD between some marker and trait loci for precise genetic mapping while avoiding confounding due to population stratification. The sib-TDT (S-TDT and combined-TDT (C-TDT proposed by Spielman and Ewens can combine data from families with and without parental marker genotypes (PMGs. For some families with missing PMG, the reconstruction-combined TDT (RC-TDT proposed by Knapp may be used to reconstruct missing parental genotypes from the genotypes of their offspring to increase power and to correct for potential bias. In this paper, we propose a further extension of the RC-TDT, called the reconstruction-combined transmission disequilibrium/heterogeneity (RC-TDH test, to take into account the identical-by-descent (IBD sharing information in addition to the LD information. It can effectively utilize families with missing or incomplete parental genetic marker information. An application of this proposed method to Genetic Analysis Workshop 14 (GAW14 data sets and extensive simulation studies suggest that this approach may further increase statistical power which is particularly valuable when LD is unknown and/or when some or all PMGs are not available.
SplicePlot: a utility for visualizing splicing quantitative trait loci.

Science.gov (United States)

Wu, Eric; Nance, Tracy; Montgomery, Stephen B

2014-04-01

RNA sequencing has provided unprecedented resolution of alternative splicing and splicing quantitative trait loci (sQTL). However, there are few tools available for visualizing the genotype-dependent effects of splicing at a population level. SplicePlot is a simple command line utility that produces intuitive visualization of sQTLs and their effects. SplicePlot takes mapped RNA sequencing reads in BAM format and genotype data in VCF format as input and outputs publication-quality Sashimi plots, hive plots and structure plots, enabling better investigation and understanding of the role of genetics on alternative splicing and transcript structure. Source code and detailed documentation are available at http://montgomerylab.stanford.edu/spliceplot/index.html under Resources and at Github. SplicePlot is implemented in Python and is supported on Linux and Mac OS. A VirtualBox virtual machine running Ubuntu with SplicePlot already installed is also available.
A genome-wide association study identifies a new ovarian cancer susceptibility locus on 9p22.2

DEFF Research Database (Denmark)

Song, Honglin; Ramus, Susan J; Tyrer, Jonathan

2009-01-01

,817 cases and 2,353 controls from the UK and approximately 2 million imputed SNPs. We genotyped the 22,790 top ranked SNPs in 4,274 cases and 4,809 controls of European ancestry from Europe, USA and Australia. We identified 12 SNPs at 9p22 associated with disease risk (P ... (rs3814113; P = 2.5 x 10(-17)) was genotyped in a further 2,670 ovarian cancer cases and 4,668 controls, confirming its association (combined data odds ratio (OR) = 0.82, 95% confidence interval (CI) 0.79-0.86, P(trend) = 5.1 x 10(-19)). The association differs by histological subtype, being strongest...
Antixenosis of bean genotypes to Chrysodeixis includens (Lepidoptera: Noctuidae

Directory of Open Access Journals (Sweden)

Rafaela Morando

2015-06-01

Full Text Available The objective of this work was to evaluate bean genotypes for resistance to soybean looper (Chrysodeixis includens. Initially, free-choice tests were carried out with 59 genotypes, divided into three groups according to leaf color intensity (dark green, light green, and medium green, in order to evaluate oviposition preference. Subsequently, 12 genotypes with high potential for resistance were selected, as well as two susceptible commercial standards. With these genotypes, new tests were performed for oviposition in a greenhouse, besides tests for attractiveness and consumption under laboratory conditions (26±2ºC, 65±10% RH, and 14 h light: 10 h dark photophase. In the no-choice test with adults, in the greenhouse, the 'IAC Jabola', Arcelina 1, 'IAC Boreal', 'Flor de Mayo', and 'IAC Formoso' genotypes were the least oviposited, showing antixenosis-type resistance for oviposition. In the free-choice test with larvae, Arcelina 4, 'BRS Horizonte', 'Pérola', H96A102-1-1-1-52, 'IAC Boreal', 'IAC Harmonia', and 'IAC Formoso' were the less consumed genotypes, which indicates antixenosis to feeding. In the no-choice test, all genotypes (except for 'IAPAR 57' expressed moderate levels of antixenosis to feeding against C. includens larvae.
Genetic similarity of soybean genotypes revealed by seed protein

Directory of Open Access Journals (Sweden)

Nikolić Ana

2005-01-01

Full Text Available More accurate and complete descriptions of genotypes could help determinate future breeding strategies and facilitate introgression of new genotypes in current soybean genetic pool. The objective of this study was to characterize 20 soybean genotypes from the Maize Research Institute "Zemun Polje" collection, which have good agronomic performances, high yield, lodging and drought resistance, and low shuttering by seed proteins as biochemical markers. Seed proteins were isolated and separated by PAA electrophoresis. On the basis of the presence/absence of protein fractions coefficients of similarity were calculated as Dice and Roger and Tanamoto coefficient between pairs of genotypes. The similarity matrix was submitted for hierarchical cluster analysis of un weighted pair group using arithmetic average (UPGMA method and necessary computation were performed using NTSYS-pc program. Protein seed analysis confirmed low level of genetic diversity in soybean. The highest genetic similarity was between genotypes P9272 and Kador. According to obtained results, soybean genotypes were assigned in two larger groups and coefficients of similarity showed similar results. Because of the lack of pedigree data for analyzed genotypes, correspondence with marker data could not be determined. In plant with a narrow genetic base in their gene pool, such as soybean, protein markers may not be sufficient for characterization and study of genetic diversity.
Genetic relationship among Musa genotypes revealed by ...

African Journals Online (AJOL)

enoh

2012-03-29

Mar 29, 2012 ... A banana germplasm was established containing 44 Musa genotypes collected from various locations in Malaysia. To detect their genetic variation and to rule out duplicates among cultivar, microsatellite markers were used in their analysis. The microsatellite profiles of 44 Musa genotypes of various origins.

Genotyping of vacA alleles of Helicobacter pylori strains recovered ...

African Journals Online (AJOL)

commonly detected genotypes in the meat-based foods, viz, vegetable sandwich and ready to eat fish, were vacA ... Keywords: Helicobacter pylori, VacA genotypes, Genotyping, Food items ..... Microbiology and Quality Control, Islamic Azad.
The Phenotype/Genotype Correlation of Lactase Persistence among Omani Adults

Directory of Open Access Journals (Sweden)

Abdulrahim Al-Abri

2013-09-01

Full Text Available Objective: To examine the correlation of lactase persistence phenotype with genotype in Omani adults.Methods: Lactase persistence phenotype was tested by hydrogen breath test in 52 Omani Adults using the Micro H2 analyzer. Results were checked against genotyping using direct DNA sequencing.Results: Forty one individuals with C/C-13910 and T/T-13915 genotypes had positive breath tests (≥20 ppm; while eight of nine individuals with T/C-13910 or T/G-13915 genotypes had negative breath tests (<20 ppm and two subjects were non-hydrogen producers. The agreement between phenotype and genotype using Kappa value was very good (0.93.Conclusion: Genotyping both T/C-13910 and T/G-13915 alleles can be used to assist diagnosis and predict lactose intolerance in the Omani population.
HPV genotype-specific concordance between EuroArray HPV, Anyplex II HPV28 and Linear Array HPV Genotyping test in Australian cervical samples

Directory of Open Access Journals (Sweden)

Alyssa M. Cornall

2017-12-01

Full Text Available Purpose: To compare human papillomavirus genotype-specific performance of two genotyping assays, Anyplex II HPV28 (Seegene and EuroArray HPV (EuroImmun, with Linear Array HPV (Roche. Methods: DNA extracted from clinican-collected cervical brush specimens in PreservCyt medium (Hologic, from 403 women undergoing management for detected cytological abnormalities, was tested on the three assays. Genotype-specific agreement were assessed by Cohen's kappa statistic and Fisher's z-test of significance between proportions. Results: Agreement between Linear Array and the other 2 assays was substantial to almost perfect (Îº = 0.60 â 1.00 for most genotypes, and was almost perfect (Îº = 0.81 â 0.98 for almost all high-risk genotypes. Linear Array overall detected most genotypes more frequently, however this was only statistically significant for HPV51 (EuroArray; p = 0.0497, HPV52 (Anyplex II; p = 0.039 and HPV61 (Anyplex II; p=0.047. EuroArray detected signficantly more HPV26 (p = 0.002 and Anyplex II detected more HPV42 (p = 0.035 than Linear Array. Each assay performed differently for HPV68 detection: EuroArray and LA were in moderate to substantial agreement with Anyplex II (Îº = 0.46 and 0.62, respectively, but were in poor disagreement with each other (Îº = â0.01. Conclusions: EuroArray and Anyplex II had similar sensitivity to Linear Array for most high-risk genotypes, with slightly lower sensitivity for HPV 51 or 52. Keywords: Human papillomavirus, Genotyping, Linear Array, Anyplex II, EuroArray, Cervix
Sex and PRNP genotype determination in preimplantation caprine embryos.

Science.gov (United States)

Guignot, F; Perreau, C; Cavarroc, C; Touzé, J-L; Pougnard, J-L; Dupont, F; Beckers, J-F; Rémy, B; Babilliot, J-M; Bed'Hom, B; Lamorinière, J M; Mermillod, P; Baril, G

2011-08-01

The objective of this study was to test the accuracy of genotype diagnosis after whole amplification of DNA extracted from biopsies obtained by trimming goat embryos and to evaluate the viability of biopsied embryos after vitrification/warming and transfer. Whole genome amplification (WGA) was performed using Multiple Displacement Amplification (MDA). Sex and prion protein (PRNP) genotypes were determined. Sex diagnosis was carried out by PCR amplification of ZFX/ZFY and Y chromosome-specific sequences. Prion protein genotype determination was performed on codons 142, 154, 211, 222 and 240. Embryos were collected at day 7 after oestrus and biopsied either immediately after collection (blastocysts and expanded blastocysts) or after 24 h of in vitro culture (compacted morulae). Biopsied embryos were frozen by vitrification. Vitrified whole embryos were kept as control. DNA of biopsies was extracted and amplified using MDA. Sex diagnosis was efficient for 97.4% of biopsies and PRNP genotyping was determined in 78.7% of biopsies. After embryo transfer, no significant difference was observed in kidding rate between biopsied and vitrified control embryos, whereas embryo survival rate was different between biopsied and whole vitrified embryos (p = 0.032). At birth, 100% of diagnosed sex and 98.2% of predetermined codons were correct. Offspring PRNP profiles were in agreement with parental genotype. Whole genome amplification with MDA kit coupled with sex diagnosis and PRNP genotype predetermination are very accurate techniques to genotype goat embryos before transfer. These novel results allow us to plan selection of scrapie-resistant genotypes and kid sex before transfer of cryopreserved embryo. © 2010 Blackwell Verlag GmbH.
Low cost, low tech SNP genotyping tools for resource-limited areas: Plague in Madagascar as a model.

Directory of Open Access Journals (Sweden)

Cedar L Mitchell

2017-12-01

Full Text Available Genetic analysis of pathogenic organisms is a useful tool for linking human cases together and/or to potential environmental sources. The resulting data can also provide information on evolutionary patterns within a targeted species and phenotypic traits. However, the instruments often used to generate genotyping data, such as single nucleotide polymorphisms (SNPs, can be expensive and sometimes require advanced technologies to implement. This places many genotyping tools out of reach for laboratories that do not specialize in genetic studies and/or lack the requisite financial and technological resources. To address this issue, we developed a low cost and low tech genotyping system, termed agarose-MAMA, which combines traditional PCR and agarose gel electrophoresis to target phylogenetically informative SNPs.To demonstrate the utility of this approach for generating genotype data in a resource-constrained area (Madagascar, we designed an agarose-MAMA system targeting previously characterized SNPs within Yersinia pestis, the causative agent of plague. We then used this system to genetically type pathogenic strains of Y. pestis in a Malagasy laboratory not specialized in genetic studies, the Institut Pasteur de Madagascar (IPM. We conducted rigorous assay performance validations to assess potential variation introduced by differing research facilities, reagents, and personnel and found no difference in SNP genotyping results. These agarose-MAMA PCR assays are currently employed as an investigative tool at IPM, providing Malagasy researchers a means to improve the value of their plague epidemiological investigations by linking outbreaks to potential sources through genetic characterization of isolates and to improve understanding of disease ecology that may contribute to a long-term control effort.The success of our study demonstrates that the SNP-based genotyping capacity of laboratories in developing countries can be expanded with manageable
Low cost, low tech SNP genotyping tools for resource-limited areas: Plague in Madagascar as a model.

Science.gov (United States)

Mitchell, Cedar L; Andrianaivoarimanana, Voahangy; Colman, Rebecca E; Busch, Joseph; Hornstra-O'Neill, Heidie; Keim, Paul S; Wagner, David M; Rajerison, Minoarisoa; Birdsell, Dawn N

2017-12-01

Genetic analysis of pathogenic organisms is a useful tool for linking human cases together and/or to potential environmental sources. The resulting data can also provide information on evolutionary patterns within a targeted species and phenotypic traits. However, the instruments often used to generate genotyping data, such as single nucleotide polymorphisms (SNPs), can be expensive and sometimes require advanced technologies to implement. This places many genotyping tools out of reach for laboratories that do not specialize in genetic studies and/or lack the requisite financial and technological resources. To address this issue, we developed a low cost and low tech genotyping system, termed agarose-MAMA, which combines traditional PCR and agarose gel electrophoresis to target phylogenetically informative SNPs. To demonstrate the utility of this approach for generating genotype data in a resource-constrained area (Madagascar), we designed an agarose-MAMA system targeting previously characterized SNPs within Yersinia pestis, the causative agent of plague. We then used this system to genetically type pathogenic strains of Y. pestis in a Malagasy laboratory not specialized in genetic studies, the Institut Pasteur de Madagascar (IPM). We conducted rigorous assay performance validations to assess potential variation introduced by differing research facilities, reagents, and personnel and found no difference in SNP genotyping results. These agarose-MAMA PCR assays are currently employed as an investigative tool at IPM, providing Malagasy researchers a means to improve the value of their plague epidemiological investigations by linking outbreaks to potential sources through genetic characterization of isolates and to improve understanding of disease ecology that may contribute to a long-term control effort. The success of our study demonstrates that the SNP-based genotyping capacity of laboratories in developing countries can be expanded with manageable financial cost for
Genomic Variants Revealed by Invariably Missing Genotypes in Nelore Cattle.

Directory of Open Access Journals (Sweden)

Joaquim Manoel da Silva

Full Text Available High density genotyping panels have been used in a wide range of applications. From population genetics to genome-wide association studies, this technology still offers the lowest cost and the most consistent solution for generating SNP data. However, in spite of the application, part of the generated data is always discarded from final datasets based on quality control criteria used to remove unreliable markers. Some discarded data consists of markers that failed to generate genotypes, labeled as missing genotypes. A subset of missing genotypes that occur in the whole population under study may be caused by technical issues but can also be explained by the presence of genomic variations that are in the vicinity of the assayed SNP and that prevent genotyping probes from annealing. The latter case may contain relevant information because these missing genotypes might be used to identify population-specific genomic variants. In order to assess which case is more prevalent, we used Illumina HD Bovine chip genotypes from 1,709 Nelore (Bos indicus samples. We found 3,200 missing genotypes among the whole population. NGS re-sequencing data from 8 sires were used to verify the presence of genomic variations within their flanking regions in 81.56% of these missing genotypes. Furthermore, we discovered 3,300 novel SNPs/Indels, 31% of which are located in genes that may affect traits of importance for the genetic improvement of cattle production.
Implications of genotypic differences in the generation of a urinary metabolomics radiation signature

International Nuclear Information System (INIS)

Laiakis, Evagelia C.; Pannkuk, Evan L.; Diaz-Rubio, Maria Elena; Wang, Yi-Wen; Mak, Tytus D.; Simbulan-Rosenthal, Cynthia M.; Brenner, David J.; Fornace, Albert J.

2016-01-01

Highlights: • Metabolomics can rapidly assess the collective small molecule content in biofluids. • Previous studies focused on urine of wild type mice and now extending to radiosensitive mice. • The Parp1"−"/"− mice were utilized as a model radiosensitive mouse line. • Time-dependent perturbations are evident in different metabolic pathways. • Metabolomics can be used to dissect the effects of genotoxic agents on metabolism. - Abstract: The increased threat of radiological terrorism and accidental nuclear exposures, together with increased usage of radiation-based medical procedures, has made necessary the development of minimally invasive methods for rapid identification of exposed individuals. Genetically predisposed radiosensitive individuals comprise a significant number of the population and require specialized attention and treatments after such events. Metabolomics, the assessment of the collective small molecule content in a given biofluid or tissue, has proven effective in the rapid identification of radiation biomarkers and metabolic perturbations. To investigate how the genotypic background may alter the ionizing radiation (IR) signature, we analyzed urine from Parp1"−"/"− mice, as a model radiosensitive genotype, exposed to IR by utilizing the analytical power of liquid chromatography coupled with mass spectrometry (LC–MS), as urine has been thoroughly investigated in wild type (WT) mice in previous studies from our laboratory. Samples were collected at days one and three after irradiation, time points that are important for the early and efficient triage of exposed individuals. Time-dependent perturbations in metabolites were observed in the tricarboxylic acid pathway (TCA). Other differentially excreted metabolites included amino acids and metabolites associated with dysregulation of energy metabolism pathways. Time-dependent apoptotic pathway activation between WT and mutant mice following IR exposure may explain the altered
Implications of genotypic differences in the generation of a urinary metabolomics radiation signature

Energy Technology Data Exchange (ETDEWEB)

Laiakis, Evagelia C., E-mail: ecl28@georgetown.edu [Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington DC (United States); Pannkuk, Evan L. [Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington DC (United States); Diaz-Rubio, Maria Elena [Pediatrics, Division of Developmental Nutrition, University of Arkansas for Medical Sciences, Little Rock, AR (United States); Wang, Yi-Wen [Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington DC (United States); Mak, Tytus D. [Mass Spectrometry Data Center, National Institute of Standards and Technology (NIST), Gaithersburg MD (United States); Simbulan-Rosenthal, Cynthia M. [Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington DC (United States); Brenner, David J. [Columbia University, New York, NY (United States); Fornace, Albert J. [Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington DC (United States); Lombardi Comprehensive Cancer Center, Georgetown University, Washington DC (United States); Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 22254 (Saudi Arabia)

2016-06-15

Highlights: • Metabolomics can rapidly assess the collective small molecule content in biofluids. • Previous studies focused on urine of wild type mice and now extending to radiosensitive mice. • The Parp1{sup −/−} mice were utilized as a model radiosensitive mouse line. • Time-dependent perturbations are evident in different metabolic pathways. • Metabolomics can be used to dissect the effects of genotoxic agents on metabolism. - Abstract: The increased threat of radiological terrorism and accidental nuclear exposures, together with increased usage of radiation-based medical procedures, has made necessary the development of minimally invasive methods for rapid identification of exposed individuals. Genetically predisposed radiosensitive individuals comprise a significant number of the population and require specialized attention and treatments after such events. Metabolomics, the assessment of the collective small molecule content in a given biofluid or tissue, has proven effective in the rapid identification of radiation biomarkers and metabolic perturbations. To investigate how the genotypic background may alter the ionizing radiation (IR) signature, we analyzed urine from Parp1{sup −/−} mice, as a model radiosensitive genotype, exposed to IR by utilizing the analytical power of liquid chromatography coupled with mass spectrometry (LC–MS), as urine has been thoroughly investigated in wild type (WT) mice in previous studies from our laboratory. Samples were collected at days one and three after irradiation, time points that are important for the early and efficient triage of exposed individuals. Time-dependent perturbations in metabolites were observed in the tricarboxylic acid pathway (TCA). Other differentially excreted metabolites included amino acids and metabolites associated with dysregulation of energy metabolism pathways. Time-dependent apoptotic pathway activation between WT and mutant mice following IR exposure may explain the
Detection of HCV genotypes using molecular and radio-isotopic methods

International Nuclear Information System (INIS)

Ahmad, N.; Baig, S.M.; Shah, W.A.; Khattak, K.F.; Khan, B.; Qureshi, J.A.

2004-01-01

Hepatitis C virus (HCV) accounts for most cases of acute and chronic non-A and non-B liver diseases. Persistent HCV infection may lead to liver cirrhosis and hepatocellular carcinoma. Six major HCV genotypes have been recognized. Infection with different genotypes results in different clinical pictures and responses to antiviral therapy. In the area of Faisalabad (Punjab province of Pakistan), the prevalence and molecular epidemiology of Hepatitis C virus infection had never been investigated before. In this study, we have made an attempt to determine the prevalence, distribution and clinical significance of HCV infection in 1100 suspected patients of liver disease by nested reverse transcriptase polymerase chain reaction (RTPCR) over a period of four years. HCV genotypes of isolates were determined by dot-blot hybridization with genotype specific radiolabeled probes in 337 subjects. The proportion of patients with HCV genotypes 1,2,3 and 4 were 37.38%, 1.86%, 16.16% and 0.29% respectively. Mixed infection of HCV genotype was detected in 120 (35.6%) patients, whereas 31 (9.1%) samples remained unclassified. This study revealed changing epidemiology of hepatitis C virus genotype 1 and 3 in the patients. Multiple infection of HCV genotype in the same patient may be of great clinical and pathological importance and interest. (author)
HPV genotype-specific concordance between EuroArray HPV, Anyplex II HPV28 and Linear Array HPV Genotyping test in Australian cervical samples.

Science.gov (United States)

Cornall, Alyssa M; Poljak, Marin; Garland, Suzanne M; Phillips, Samuel; Machalek, Dorothy A; Tan, Jeffrey H; Quinn, Michael A; Tabrizi, Sepehr N

2017-12-01

To compare human papillomavirus genotype-specific performance of two genotyping assays, Anyplex II HPV28 (Seegene) and EuroArray HPV (EuroImmun), with Linear Array HPV (Roche). DNA extracted from clinican-collected cervical brush specimens in PreservCyt medium (Hologic), from 403 women undergoing management for detected cytological abnormalities, was tested on the three assays. Genotype-specific agreement were assessed by Cohen's kappa statistic and Fisher's z-test of significance between proportions. Agreement between Linear Array and the other 2 assays was substantial to almost perfect (κ = 0.60 - 1.00) for most genotypes, and was almost perfect (κ = 0.81 - 0.98) for almost all high-risk genotypes. Linear Array overall detected most genotypes more frequently, however this was only statistically significant for HPV51 (EuroArray; p = 0.0497), HPV52 (Anyplex II; p = 0.039) and HPV61 (Anyplex II; p=0.047). EuroArray detected signficantly more HPV26 (p = 0.002) and Anyplex II detected more HPV42 (p = 0.035) than Linear Array. Each assay performed differently for HPV68 detection: EuroArray and LA were in moderate to substantial agreement with Anyplex II (κ = 0.46 and 0.62, respectively), but were in poor disagreement with each other (κ = -0.01). EuroArray and Anyplex II had similar sensitivity to Linear Array for most high-risk genotypes, with slightly lower sensitivity for HPV 51 or 52. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Genotyping-By-Sequencing for Plant Genetic Diversity Analysis: A Lab Guide for SNP Genotyping

Directory of Open Access Journals (Sweden)

Gregory W. Peterson

2014-10-01

Full Text Available Genotyping-by-sequencing (GBS has recently emerged as a promising genomic approach for exploring plant genetic diversity on a genome-wide scale. However, many uncertainties and challenges remain in the application of GBS, particularly in non-model species. Here, we present a GBS protocol we developed and use for plant genetic diversity analysis. It uses two restriction enzymes to reduce genome complexity, applies Illumina multiplexing indexes for barcoding and has a custom bioinformatics pipeline for genotyping. This genetic diversity-focused GBS (gd-GBS protocol can serve as an easy-to-follow lab guide to assist a researcher through every step of a GBS application with five main components: sample preparation, library assembly, sequencing, SNP calling and diversity analysis. Specifically, in this presentation, we provide a brief overview of the GBS approach, describe the gd-GBS procedures, illustrate it with an application to analyze genetic diversity in 20 flax (Linum usitatissimum L. accessions and discuss related issues in GBS application. Following these lab bench procedures and using the custom bioinformatics pipeline, one could generate genome-wide SNP genotype data for a conventional genetic diversity analysis of a non-model plant species.
Histomorphological changes in hepatitis C non-responders with respect to viral genotypes

International Nuclear Information System (INIS)

Adnan, U.; Mirza, T.; Naz, E.; Aziz, S.

2013-01-01

Objective: To evaluate the distinct histopathological changes of chronic hepatitis C (CHC) non-responders in association with viral genotypes. Methods: This cross-sectional study was conducted at the histopathology section of the Dow Diagnostic Research and Reference Laboratory, Dow University of Health Sciences in collaboration with Sarwar Zuberi Liver Centre, Civil Hospital, Karachi from September 2009 to August 2011. Seventy-five non-responders (end-treatment-response [ETR] positive patients) from a consecutive series of viral-RNA positive CHC patients with known genotypes were selected. Their genotypes and pertinent clinical history was recorded. They were subjected to liver biopsies which were assessed for grade, stage, steatosis, stainable iron and characteristic histological lesions. Results: Majority of the patients (63, 84%) had genotype 3 while 12(16%) cases had genotype 1. The genotype 1 patients had significantly higher scores of inflammation (p<0.03) and fibrosis (p<0.04) as compared to genotype 3. Steatosis was significantly present in all genotype 3 patients in higher scores (p<0.001) compared to genotype 1. Stainable iron scores were generally low in the patients in this study, however, it was more commonly seen in genotype 3. The distribution of characteristic histological lesions was noteworthy in both the groups, irrespective of genotype. Conclusion: In this series, the predominant genotype was 3. However, genotype 1 patients were more prone to the aggressive nature of the disease with significantly higher scores of inflammation and fibrosis. Steatosis was characteristically observed in genotype 3 group. Stainable iron could not be attributed as a cause of non-response. (author)
An R package "VariABEL" for genome-wide searching of potentially interacting loci by testing genotypic variance heterogeneity

Directory of Open Access Journals (Sweden)

Struchalin Maksim V

2012-01-01

Full Text Available Abstract Background Hundreds of new loci have been discovered by genome-wide association studies of human traits. These studies mostly focused on associations between single locus and a trait. Interactions between genes and between genes and environmental factors are of interest as they can improve our understanding of the genetic background underlying complex traits. Genome-wide testing of complex genetic models is a computationally demanding task. Moreover, testing of such models leads to multiple comparison problems that reduce the probability of new findings. Assuming that the genetic model underlying a complex trait can include hundreds of genes and environmental factors, testing of these models in genome-wide association studies represent substantial difficulties. We and Pare with colleagues (2010 developed a method allowing to overcome such difficulties. The method is based on the fact that loci which are involved in interactions can show genotypic variance heterogeneity of a trait. Genome-wide testing of such heterogeneity can be a fast scanning approach which can point to the interacting genetic variants. Results In this work we present a new method, SVLM, allowing for variance heterogeneity analysis of imputed genetic variation. Type I error and power of this test are investigated and contracted with these of the Levene's test. We also present an R package, VariABEL, implementing existing and newly developed tests. Conclusions Variance heterogeneity analysis is a promising method for detection of potentially interacting loci. New method and software package developed in this work will facilitate such analysis in genome-wide context.
Characterization of cowpea genotype resistance to Callosobruchus maculatus

Directory of Open Access Journals (Sweden)

Maria de Jesus Passos de Castro

2013-09-01

Full Text Available The objective of this work was to characterize the resistance of 50 cowpea (Vigna unguiculata genotypes to Callosobruchus maculatus. A completely randomized design with five replicates per treatment (genotype was used. No-choice tests were performed using the 50 cowpea genotypes to evaluate the preference for oviposition and the development of the weevil. The genotypes IT85 F-2687, MN05-841 B-49, MNC99-508-1, MNC99-510-8, TVu 1593, Canapuzinho-1-2, and Sanzi Sambili show non-preference-type resistance (oviposition and feeding. IT81 D-1045 Ereto and IT81 D-1045 Enramador exhibit antibiosis against C. maculatus and descend from resistant genitors, which grants them potential to be used in future crossings to obtain cowpea varieties with higher levels of resistance.
Evaluation of Soybean and Cowpea Genotypes for Phosphorus Use Efficiency

Energy Technology Data Exchange (ETDEWEB)

Kumaga, F. K.; Ofori, K.; Adiku, S. K.; Kugblenu, Y. O.; Asante, W.; Seidu, H. [College of Agriculture and Consumer Sciences, University of Ghana, Legon, Accra (Ghana); Adu-Gyamfi, J. J. [Soil and Water Management and Crop Nutrition Laboratory, International Atomic Energy Agency, Vienna (Austria)

2013-11-15

Initial screening of one hundred and fifty-two (152) and fifty (50) genotypes of soybean and cowpea, respectively, were conducted at the early growth stage to evaluate root traits associated with phosphorus (P) efficiency. Fifty soybean genotypes were subsequently selected and evaluated on a tropical low P soil (Lixisol) for growth and yield under low and adequate P availability. Plants were sampled at twelve and thirty days after sowing and at maturity. Six cowpea genotypes were also selected and evaluated in pots filled with Alfisol under low, moderate and high P availability. Plants were sampled at forty days and assessed for shoot yield and nodulation under low P availability. Using Principal Component Analysis (PCA), Phosphorus Efficiency Index (PEI) was used to determine P efficiency of soybean and cowpea genotypes. A wide variation in root traits for soybean and cowpea at the early growth stage was found, and allometric analysis showed a significant correlation between the root and shoot parameters at this stage. The study provided an opportunity to compare root traits of newly developed cowpea genotypes (early maturing, medium maturing, dual purpose and Striga resistant lines) with older released cultivars. There were significant differences in root length among the groups. In general, dual purpose, Striga resistant and medium/early maturing genotypes showed the longest roots while the older varieties showed the least total root length. Field and pot results also showed differential growth of soybean and cowpea with low P availability. Further, PCA of the results indicated that soybean genotypes could be grouped into three distinct P efficiency categories. Retaining the PC and the relative weight for each genotype in combination with yield potential under high P, four categories of responsiveness to P were obtained. Cowpea genotypes were grouped into three P efficiency categories and two categories of responsiveness to P. The study also found genetic
Hepatitis B virus Genotypes in West Azarbayjan Province, Northwest Iran

Directory of Open Access Journals (Sweden)

Mohammad Hasan Khadem Ansari

2017-12-01

CONCLUSIONS: The results reveal that D genotype is the main genotype of HBV in West Azarbayjan province. Presence of this genotype conformed with the low rate of acute liver diseases caused by hepatitis B chronic infection, cirrhosis of the liver and hepatocellular carcinoma.
Representativeness of Tuberculosis Genotyping Surveillance in the United States, 2009–2010

Science.gov (United States)

Shak, Emma B.; Cowan, Lauren; Starks, Angela M.; Grant, Juliana

2015-01-01

Genotyping of Mycobacterium tuberculosis isolates contributes to tuberculosis (TB) control through detection of possible outbreaks. However, 20% of U.S. cases do not have an isolate for testing, and 10% of cases with isolates do not have a genotype reported. TB outbreaks in populations with incomplete genotyping data might be missed by genotyping-based outbreak detection. Therefore, we assessed the representativeness of TB genotyping data by comparing characteristics of cases reported during January 1, 2009–December 31, 2010, that had a genotype result with those cases that did not. Of 22,476 cases, 14,922 (66%) had a genotype result. Cases without genotype results were more likely to be patients <19 years of age, with unknown HIV status, of female sex, U.S.-born, and with no recent history of homelessness or substance abuse. Although cases with a genotype result are largely representative of all reported U.S. TB cases, outbreak detection methods that rely solely on genotyping data may underestimate TB transmission among certain groups. PMID:26556930
Differential survival among sSOD-1* genotypes in Chinook Salmon

Science.gov (United States)

Hayes, Michael C.; Reisenbichler, Reginald R.; Rubin, Stephen P.; Wetzel, Lisa A.; Marshall , Anne R.

2011-01-01

Differential survival and growth were tested in Chinook salmon Oncorhynchus tshawytscha expressing two common alleles, *–100 and *–260, at the superoxide dismutase locus (sSOD-1*). These tests were necessary to support separate studies in which the two alleles were used as genetic marks under the assumption of mark neutrality. Heterozygous adults were used to produce progeny with –100/–100, –100/–260, and –260/–260 genotypes that were reared in two natural streams and two hatcheries in the states of Washington and Oregon. The latter also were evaluated as returning adults. In general, the genotype ratios of juveniles reared at hatcheries were consistent with high survival and little or no differential survival in the hatchery. Adult returns at one hatchery were significantly different from the expected proportions, and the survival of the –260/–260 genotype was 0.56–0.89 times that of the –100/–100 genotype over four year-classes. Adult returns at a second hatchery (one year-class) were similar but not statistically significant: survival of the –260/–260genotype relative to the –100/–100 genotype was 0.76. The performance of the heterozygote group was intermediate at both hatcheries. Significant differences in growth were rarely observed among hatchery fish (one year-class of juveniles and one age-class of adult males) but were consistent with greater performance for the –100/–100 genotype. Results from two groups of juveniles reared in streams (one year-class from each stream) suggested few differences in growth, but the observed genotype ratios were significantly different from the expected ratios in one stream. Those differences were consistent with the adult data; survival for the –260/–260 genotype was 76% of that of the –100/–100 genotype. These results, which indicate nonneutrality among sSOD-1* genotypes, caused us to modify our related studies and suggest caution in the interpretation of results and analyses in
Antioxidant Defense Mechanisms of Salinity Tolerance in Rice Genotypes

Directory of Open Access Journals (Sweden)

Mohammad Golam Kibria

2017-05-01

Full Text Available In order to elucidate the role of antioxidant responses in salinity tolerance in rice genotypes under salt stress, experiments were conducted using four rice varieties, including salt-sensitive BRRI dhan 28 and three salt-tolerant varieties BRRI dhan 47, BINA dhan 8 and BINA dhan 10. Thirty-day-old rice seedlings were transplanted into pots. At the active tillering stage (35 d after transplanting, plants were exposed to different salinity levels (0, 20, 40 and 60 mmol/L NaCl. Salt stress caused a significant reduction in growth for all the rice genotypes. Growth reduction was higher in the salt-sensitive genotype than in the salt-tolerant ones, and BINA dhan 10 showed higher salt tolerance in all measured physiological parameters. The reduction in shoot and root biomass was found to be minimal in BINA dhan 10. Chlorophyll content significantly decreased under salt stress except for BINA dhan 10. Proline content significantly increased in salt-tolerant rice genotypes with increased salt concentration, and the highest proline content was obtained from BINA dhan 10 under salt stress. Catalase and ascorbate peroxidase activities significantly decreased in salt-sensitive genotype whereas significantly increased in salt-tolerant ones with increasing salt concentration. However, salt stress significantly decreased guaiacol peroxidase activity in all the rice genotypes irrespective of salt tolerance. K+/Na+ ratio also significantly decreased in shoots and roots of all the rice genotypes. The salt-tolerant genotype BINA dhan 10 maintained higher levels of chlorophyll and proline contents as well as catalase and ascorbate peroxidase activities under salt stress, thus, this might be the underlying mechanism for salt tolerance.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.