WorldWideScience

Sample records for single imputation methods

  1. Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans.

    Science.gov (United States)

    Xavier, A; Muir, William M; Rainey, Katy M

    2016-02-02

    Success in genome-wide association studies and marker-assisted selection depends on good phenotypic and genotypic data. The more complete this data is, the more powerful will be the results of analysis. Nevertheless, there are next-generation technologies that seek to provide genotypic information in spite of great proportions of missing data. The procedures these technologies use to impute genetic data, therefore, greatly affect downstream analyses. This study aims to (1) compare the genetic variance in a single-nucleotide polymorphism panel of soybean with missing data imputed using various methods, (2) evaluate the imputation accuracy and post-imputation quality associated with these methods, and (3) evaluate the impact of imputation method on heritability and the accuracy of genome-wide prediction of soybean traits. The imputation methods we evaluated were as follows: multivariate mixed model, hidden Markov model, logical algorithm, k-nearest neighbor, single value decomposition, and random forest. We used raw genotypes from the SoyNAM project and the following phenotypes: plant height, days to maturity, grain yield, and seed protein composition. We propose an imputation method based on multivariate mixed models using pedigree information. Our methods comparison indicate that heritability of traits can be affected by the imputation method. Genotypes with missing values imputed with methods that make use of genealogic information can favor genetic analysis of highly polygenic traits, but not genome-wide prediction accuracy. The genotypic matrix captured the highest amount of genetic variance when missing loci were imputed by the method proposed in this paper. We concluded that hidden Markov models and random forest imputation are more suitable to studies that aim analyses of highly heritable traits while pedigree-based methods can be used to best analyze traits with low heritability. Despite the notable contribution to heritability, advantages in genomic

  2. Multiple imputation methods for bivariate outcomes in cluster randomised trials.

    Science.gov (United States)

    DiazOrdaz, K; Kenward, M G; Gomes, M; Grieve, R

    2016-09-10

    Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  3. Multiple imputation in a longitudinal cohort study: a case study of sensitivity to imputation methods.

    Science.gov (United States)

    Romaniuk, Helena; Patton, George C; Carlin, John B

    2014-11-01

    Multiple imputation has entered mainstream practice for the analysis of incomplete data. We have used it extensively in a large Australian longitudinal cohort study, the Victorian Adolescent Health Cohort Study (1992-2008). Although we have endeavored to follow best practices, there is little published advice on this, and we have not previously examined the extent to which variations in our approach might lead to different results. Here, we examined sensitivity of analytical results to imputation decisions, investigating choice of imputation method, inclusion of auxiliary variables, omission of cases with excessive missing data, and approaches for imputing highly skewed continuous distributions that are analyzed as dichotomous variables. Overall, we found that decisions made about imputation approach had a discernible but rarely dramatic impact for some types of estimates. For model-based estimates of association, the choice of imputation method and decisions made to build the imputation model had little effect on results, whereas estimates of overall prevalence and prevalence stratified by subgroup were more sensitive to imputation method and settings. Multiple imputation by chained equations gave more plausible results than multivariate normal imputation for prevalence estimates but appeared to be more susceptible to numerical instability related to a highly skewed variable. © The Author 2014. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  4. Estimation of missing rainfall data using spatial interpolation and imputation methods

    Science.gov (United States)

    Radi, Noor Fadhilah Ahmad; Zakaria, Roslinazairimah; Azman, Muhammad Az-zuhri

    2015-02-01

    This study is aimed to estimate missing rainfall data by dividing the analysis into three different percentages namely 5%, 10% and 20% in order to represent various cases of missing data. In practice, spatial interpolation methods are chosen at the first place to estimate missing data. These methods include normal ratio (NR), arithmetic average (AA), coefficient of correlation (CC) and inverse distance (ID) weighting methods. The methods consider the distance between the target and the neighbouring stations as well as the correlations between them. Alternative method for solving missing data is an imputation method. Imputation is a process of replacing missing data with substituted values. A once-common method of imputation is single-imputation method, which allows parameter estimation. However, the single imputation method ignored the estimation of variability which leads to the underestimation of standard errors and confidence intervals. To overcome underestimation problem, multiple imputations method is used, where each missing value is estimated with a distribution of imputations that reflect the uncertainty about the missing data. In this study, comparison of spatial interpolation methods and multiple imputations method are presented to estimate missing rainfall data. The performance of the estimation methods used are assessed using the similarity index (S-index), mean absolute error (MAE) and coefficient of correlation (R).

  5. A comparison of multiple imputation methods for incomplete longitudinal binary data.

    Science.gov (United States)

    Yamaguchi, Yusuke; Misumi, Toshihiro; Maruo, Kazushi

    2017-09-08

    Longitudinal binary data are commonly encountered in clinical trials. Multiple imputation is an approach for getting a valid estimation of treatment effects under an assumption of missing at random mechanism. Although there are a variety of multiple imputation methods for the longitudinal binary data, a limited number of researches have reported on relative performances of the methods. Moreover, when focusing on the treatment effect throughout a period that has often been used in clinical evaluations of specific disease areas, no definite investigations comparing the methods have been available. We conducted an extensive simulation study to examine comparative performances of six multiple imputation methods available in the SAS MI procedure for longitudinal binary data, where two endpoints of responder rates at a specified time point and throughout a period were assessed. The simulation study suggested that results from naive approaches of a single imputation with non-responders and a complete case analysis could be very sensitive against missing data. The multiple imputation methods using a monotone method and a full conditional specification with a logistic regression imputation model were recommended for obtaining unbiased and robust estimations of the treatment effect. The methods were illustrated with data from a mental health research.

  6. Dealing with missing data in a multi-question depression scale: a comparison of imputation methods

    Directory of Open Access Journals (Sweden)

    Stuart Heather

    2006-12-01

    Full Text Available Abstract Background Missing data present a challenge to many research projects. The problem is often pronounced in studies utilizing self-report scales, and literature addressing different strategies for dealing with missing data in such circumstances is scarce. The objective of this study was to compare six different imputation techniques for dealing with missing data in the Zung Self-reported Depression scale (SDS. Methods 1580 participants from a surgical outcomes study completed the SDS. The SDS is a 20 question scale that respondents complete by circling a value of 1 to 4 for each question. The sum of the responses is calculated and respondents are classified as exhibiting depressive symptoms when their total score is over 40. Missing values were simulated by randomly selecting questions whose values were then deleted (a missing completely at random simulation. Additionally, a missing at random and missing not at random simulation were completed. Six imputation methods were then considered; 1 multiple imputation, 2 single regression, 3 individual mean, 4 overall mean, 5 participant's preceding response, and 6 random selection of a value from 1 to 4. For each method, the imputed mean SDS score and standard deviation were compared to the population statistics. The Spearman correlation coefficient, percent misclassified and the Kappa statistic were also calculated. Results When 10% of values are missing, all the imputation methods except random selection produce Kappa statistics greater than 0.80 indicating 'near perfect' agreement. MI produces the most valid imputed values with a high Kappa statistic (0.89, although both single regression and individual mean imputation also produced favorable results. As the percent of missing information increased to 30%, or when unbalanced missing data were introduced, MI maintained a high Kappa statistic. The individual mean and single regression method produced Kappas in the 'substantial agreement' range

  7. Methods and Strategies to Impute Missing Genotypes for Improving Genomic Prediction

    DEFF Research Database (Denmark)

    Ma, Peipei

    Genomic prediction has been widely used in dairy cattle breeding. Genotype imputation is a key procedure to efficently utilize marker data from different chips and obtain high density marker data with minimizing cost. This thesis investigated methods and strategies to genotype imputation...... for improving genomic prediction. The results indicate the IMPUTE2 and Beagle are accurate imputation methods, while Fimpute is a good alternative for routine imputation with large data set. Genotypes of non-genotyped animals can be accurately imputed if they have genotyped porgenies. A combined reference...

  8. A toolkit in SAS for the evaluation of multiple imputation methods

    NARCIS (Netherlands)

    Brand, Jaap P.L.; van Buuren, Stef; Groothuis-Oudshoorn, Karin; Gelsema, Edzard S.

    This paper outlines a strategy to validate multiple imputation methods. Rubin's criteria for proper multiple imputation are the point of departure. We describe a simulation method that yields insight into various aspects of bias and efficiency of the imputation process. We propose a new method for

  9. A toolkit in SAS for the evaluation of multiple imputation methods

    NARCIS (Netherlands)

    Brand, J.P.L.; Buuren, S. van; Groothuis-Oudshoorn, K.; Gelsema, E.S.

    2003-01-01

    This paper outlines a strategy to validate multiple imputation methods. Rubin's criteria for proper multiple imputation are the point of departure. We describe a simulation method that yields insight into various aspects of bias and efficiency of the imputation process. We propose a new method for

  10. The multiple imputation method: a case study involving secondary data analysis.

    Science.gov (United States)

    Walani, Salimah R; Cleland, Charles M

    2015-05-01

    To illustrate with the example of a secondary data analysis study the use of the multiple imputation method to replace missing data. Most large public datasets have missing data, which need to be handled by researchers conducting secondary data analysis studies. Multiple imputation is a technique widely used to replace missing values while preserving the sample size and sampling variability of the data. The 2004 National Sample Survey of Registered Nurses. The authors created a model to impute missing values using the chained equation method. They used imputation diagnostics procedures and conducted regression analysis of imputed data to determine the differences between the log hourly wages of internationally educated and US-educated registered nurses. The authors used multiple imputation procedures to replace missing values in a large dataset with 29,059 observations. Five multiple imputed datasets were created. Imputation diagnostics using time series and density plots showed that imputation was successful. The authors also present an example of the use of multiple imputed datasets to conduct regression analysis to answer a substantive research question. Multiple imputation is a powerful technique for imputing missing values in large datasets while preserving the sample size and variance of the data. Even though the chained equation method involves complex statistical computations, recent innovations in software and computation have made it possible for researchers to conduct this technique on large datasets. The authors recommend nurse researchers use multiple imputation methods for handling missing data to improve the statistical power and external validity of their studies.

  11. Comparison of different imputation methods from low- to high-density panels using Chinese Holstein cattle.

    Science.gov (United States)

    Weng, Z; Zhang, Z; Zhang, Q; Fu, W; He, S; Ding, X

    2013-05-01

    Imputation of high-density genotypes from low- or medium-density platforms is a promising way to enhance the efficiency of whole-genome selection programs at low cost. In this study, we compared the efficiency of three widely used imputation algorithms (fastPHASE, BEAGLE and findhap) using Chinese Holstein cattle with Illumina BovineSNP50 genotypes. A total of 2108 cattle were randomly divided into a reference population and a test population to evaluate the influence of the reference population size. Three bovine chromosomes, BTA1, 16 and 28, were used to represent large, medium and small chromosome size, respectively. We simulated different scenarios by randomly masking 20%, 40%, 80% and 95% single-nucleotide polymorphisms (SNPs) on each chromosome in the test population to mimic different SNP density panels. Illumina Bovine3K and Illumina BovineLD (6909 SNPs) information was also used. We found that the three methods showed comparable accuracy when the proportion of masked SNPs was low. However, the difference became larger when more SNPs were masked. BEAGLE performed the best and was most robust with imputation accuracies >90% in almost all situations. fastPHASE was affected by the proportion of masked SNPs, especially when the masked SNP rate was high. findhap ran the fastest, whereas its accuracies were lower than those of BEAGLE but higher than those of fastPHASE. In addition, enlarging the reference population improved the imputation accuracy for BEAGLE and findhap, but did not affect fastPHASE. Considering imputation accuracy and computational requirements, BEAGLE has been found to be more reliable for imputing genotypes from low- to high-density genotyping platforms.

  12. Assessment of imputation methods using varying ecological information to fill the gaps in a tree functional trait database

    Science.gov (United States)

    Poyatos, Rafael; Sus, Oliver; Vilà-Cabrera, Albert; Vayreda, Jordi; Badiella, Llorenç; Mencuccini, Maurizio; Martínez-Vilalta, Jordi

    2016-04-01

    Plant functional traits are increasingly being used in ecosystem ecology thanks to the growing availability of large ecological databases. However, these databases usually contain a large fraction of missing data because measuring plant functional traits systematically is labour-intensive and because most databases are compilations of datasets with different sampling designs. As a result, within a given database, there is an inevitable variability in the number of traits available for each data entry and/or the species coverage in a given geographical area. The presence of missing data may severely bias trait-based analyses, such as the quantification of trait covariation or trait-environment relationships and may hamper efforts towards trait-based modelling of ecosystem biogeochemical cycles. Several data imputation (i.e. gap-filling) methods have been recently tested on compiled functional trait databases, but the performance of imputation methods applied to a functional trait database with a regular spatial sampling has not been thoroughly studied. Here, we assess the effects of data imputation on five tree functional traits (leaf biomass to sapwood area ratio, foliar nitrogen, maximum height, specific leaf area and wood density) in the Ecological and Forest Inventory of Catalonia, an extensive spatial database (covering 31900 km2). We tested the performance of species mean imputation, single imputation by the k-nearest neighbors algorithm (kNN) and a multiple imputation method, Multivariate Imputation with Chained Equations (MICE) at different levels of missing data (10%, 30%, 50%, and 80%). We also assessed the changes in imputation performance when additional predictors (species identity, climate, forest structure, spatial structure) were added in kNN and MICE imputations. We evaluated the imputed datasets using a battery of indexes describing departure from the complete dataset in trait distribution, in the mean prediction error, in the correlation matrix

  13. Application of imputation methods to genomic selection in Chinese Holstein cattle

    Directory of Open Access Journals (Sweden)

    Weng Ziqing

    2012-02-01

    Full Text Available Abstract Missing genotypes are a common feature of high density SNP datasets obtained using SNP chip technology and this is likely to decrease the accuracy of genomic selection. This problem can be circumvented by imputing the missing genotypes with estimated genotypes. When implementing imputation, the criteria used for SNP data quality control and whether to perform imputation before or after data quality control need to consider. In this paper, we compared six strategies of imputation and quality control using different imputation methods, different quality control criteria and by changing the order of imputation and quality control, against a real dataset of milk production traits in Chinese Holstein cattle. The results demonstrated that, no matter what imputation method and quality control criteria were used, strategies with imputation before quality control performed better than strategies with imputation after quality control in terms of accuracy of genomic selection. The different imputation methods and quality control criteria did not significantly influence the accuracy of genomic selection. We concluded that performing imputation before quality control could increase the accuracy of genomic selection, especially when the rate of missing genotypes is high and the reference population is small.

  14. IMPUTATION METHODS AND APPROACHES: AN ANALYSIS OF PROTEIN SOURCES IN THE MEXICAN DIET

    Directory of Open Access Journals (Sweden)

    Jose Antonio Lopez

    2014-04-01

    Full Text Available Several imputation approaches using a large sample and different levels of censoring are compared and contrasted following a multiple imputation methodology. The study not only discusses these imputation approaches, but also quantifies differences in price variability before and after price imputation, evaluates the performance of each method, and estimates and compares parameters and elasticities from a complete demand system. The study’s findings reveal that small variability among the mean prices from the various imputation approaches may result in relatively larger variability among the underlying parameter estimates of interest and the ultimately desired measures. This suggests that selection bias may be avoided or reduced by validating the imputation approaches and choosing the imputation method based on an analysis of the ultimately desired measures.

  15. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    Science.gov (United States)

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Missing value imputation in multi-environment trials: Reconsidering the Krzanowski method

    Directory of Open Access Journals (Sweden)

    Sergio Arciniegas-Alarcón

    2016-07-01

    Full Text Available We propose a new methodology for multiple imputation when faced with missing data in multi-environmental trials with genotype-by-environment interaction, based on the imputation system developed by Krzanowski that uses the singular value decomposition (SVD of a matrix. Several different iterative variants are described; differential weights can also be included in each variant to represent the influence of different components of SVD in the imputation process. The methods are compared through a simulation study based on three real data matrices that have values deleted randomly at different percentages, using as measure of overall accuracy a combination of the variance between imputations and their mean square deviations relative to the deleted values. The best results are shown by two of the iterative schemes that use weights belonging to the interval [0.75, 1]. These schemes provide imputations that have higher quality when compared with other multiple imputation methods based on the Krzanowski method.

  17. The Ability of Different Imputation Methods to Preserve the Significant Genes and Pathways in Cancer

    Directory of Open Access Journals (Sweden)

    Rosa Aghdam

    2017-12-01

    Full Text Available Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving significant genes and pathways. In the first step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next, 10 well-known imputation methods were applied to the complete datasets. The significance analysis of microarrays (SAM method was applied to detect the significant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving significant genes. To determine the impact of different imputation methods on the identification of important genes, the chi-squared test was used to compare the proportions of overlaps between significant genes detected from original data and those detected from the imputed datasets. Additionally, the significant genes are tested for their enrichment in important pathways, using the ConsensusPathDB. Our results showed that almost all the significant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no significant difference in the performance of various imputation methods tested. The source code and selected datasets are available on http://profiles.bs.ipm.ir/softwares/imputation_methods/.

  18. The Ability of Different Imputation Methods to Preserve the Significant Genes and Pathways in Cancer.

    Science.gov (United States)

    Aghdam, Rosa; Baghfalaki, Taban; Khosravi, Pegah; Saberi Ansari, Elnaz

    2017-12-01

    Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving significant genes and pathways. In the first step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next, 10 well-known imputation methods were applied to the complete datasets. The significance analysis of microarrays (SAM) method was applied to detect the significant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving significant genes. To determine the impact of different imputation methods on the identification of important genes, the chi-squared test was used to compare the proportions of overlaps between significant genes detected from original data and those detected from the imputed datasets. Additionally, the significant genes are tested for their enrichment in important pathways, using the ConsensusPathDB. Our results showed that almost all the significant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no significant difference in the performance of various imputation methods tested. The source code and selected datasets are available on http://profiles.bs.ipm.ir/softwares/imputation_methods/. Copyright © 2017. Production and hosting by Elsevier B.V.

  19. Simple imputation methods versus direct likelihood analysis for missing item scores in multilevel educational data.

    Science.gov (United States)

    Kadengye, Damazo T; Cools, Wilfried; Ceulemans, Eva; Van den Noortgate, Wim

    2012-06-01

    Missing data, such as item responses in multilevel data, are ubiquitous in educational research settings. Researchers in the item response theory (IRT) context have shown that ignoring such missing data can create problems in the estimation of the IRT model parameters. Consequently, several imputation methods for dealing with missing item data have been proposed and shown to be effective when applied with traditional IRT models. Additionally, a nonimputation direct likelihood analysis has been shown to be an effective tool for handling missing observations in clustered data settings. This study investigates the performance of six simple imputation methods, which have been found to be useful in other IRT contexts, versus a direct likelihood analysis, in multilevel data from educational settings. Multilevel item response data were simulated on the basis of two empirical data sets, and some of the item scores were deleted, such that they were missing either completely at random or simply at random. An explanatory IRT model was used for modeling the complete, incomplete, and imputed data sets. We showed that direct likelihood analysis of the incomplete data sets produced unbiased parameter estimates that were comparable to those from a complete data analysis. Multiple-imputation approaches of the two-way mean and corrected item mean substitution methods displayed varying degrees of effectiveness in imputing data that in turn could produce unbiased parameter estimates. The simple random imputation, adjusted random imputation, item means substitution, and regression imputation methods seemed to be less effective in imputing missing item scores in multilevel data settings.

  20. Multi-generational imputation of single nucleotide polymorphism marker genotypes and accuracy of genomic selection.

    Science.gov (United States)

    Toghiani, S; Aggrey, S E; Rekaya, R

    2016-07-01

    Availability of high-density single nucleotide polymorphism (SNP) genotyping platforms provided unprecedented opportunities to enhance breeding programmes in livestock, poultry and plant species, and to better understand the genetic basis of complex traits. Using this genomic information, genomic breeding values (GEBVs), which are more accurate than conventional breeding values. The superiority of genomic selection is possible only when high-density SNP panels are used to track genes and QTLs affecting the trait. Unfortunately, even with the continuous decrease in genotyping costs, only a small fraction of the population has been genotyped with these high-density panels. It is often the case that a larger portion of the population is genotyped with low-density and low-cost SNP panels and then imputed to a higher density. Accuracy of SNP genotype imputation tends to be high when minimum requirements are met. Nevertheless, a certain rate of genotype imputation errors is unavoidable. Thus, it is reasonable to assume that the accuracy of GEBVs will be affected by imputation errors; especially, their cumulative effects over time. To evaluate the impact of multi-generational selection on the accuracy of SNP genotypes imputation and the reliability of resulting GEBVs, a simulation was carried out under varying updating of the reference population, distance between the reference and testing sets, and the approach used for the estimation of GEBVs. Using fixed reference populations, imputation accuracy decayed by about 0.5% per generation. In fact, after 25 generations, the accuracy was only 7% lower than the first generation. When the reference population was updated by either 1% or 5% of the top animals in the previous generations, decay of imputation accuracy was substantially reduced. These results indicate that low-density panels are useful, especially when the generational interval between reference and testing population is small. As the generational interval

  1. A Method for Imputing Response Options for Missing Data on Multiple-Choice Assessments

    Science.gov (United States)

    Wolkowitz, Amanda A.; Skorupski, William P.

    2013-01-01

    When missing values are present in item response data, there are a number of ways one might impute a correct or incorrect response to a multiple-choice item. There are significantly fewer methods for imputing the actual response option an examinee may have provided if he or she had not omitted the item either purposely or accidentally. This…

  2. When data goes missing: methods for missing score imputation in biometric fusion

    Science.gov (United States)

    Ding, Yaohui; Ross, Arun

    2010-04-01

    While fusion can be accomplished at multiple levels in a multibiometric system, score level fusion is commonly used as it offers a good trade-off between fusion complexity and data availability. However, missing scores affect the implementation of several biometric fusion rules. While there are several techniques for handling missing data, the imputation scheme - which replaces missing values with predicted values - is preferred since this scheme can be followed by a standard fusion scheme designed for complete data. This paper compares the performance of three imputation methods: Imputation via Maximum Likelihood Estimation (MLE), Multiple Imputation (MI) and Random Draw Imputation through Gaussian Mixture Model estimation (RD GMM). A novel method called Hot-deck GMM is also introduced and exhibits markedly better performance than the other methods because of its ability to preserve the local structure of the score distribution. Experiments on the MSU dataset indicate the robustness of the schemes in handling missing scores at various missing data rates.

  3. Missing data imputation using statistical and machine learning methods in a real breast cancer problem.

    Science.gov (United States)

    Jerez, José M; Molina, Ignacio; García-Laencina, Pedro J; Alba, Emilio; Ribelles, Nuria; Martín, Miguel; Franco, Leonardo

    2010-10-01

    Missing data imputation is an important task in cases where it is crucial to use all available data and not discard records with missing values. This work evaluates the performance of several statistical and machine learning imputation methods that were used to predict recurrence in patients in an extensive real breast cancer data set. Imputation methods based on statistical techniques, e.g., mean, hot-deck and multiple imputation, and machine learning techniques, e.g., multi-layer perceptron (MLP), self-organisation maps (SOM) and k-nearest neighbour (KNN), were applied to data collected through the "El Álamo-I" project, and the results were then compared to those obtained from the listwise deletion (LD) imputation method. The database includes demographic, therapeutic and recurrence-survival information from 3679 women with operable invasive breast cancer diagnosed in 32 different hospitals belonging to the Spanish Breast Cancer Research Group (GEICAM). The accuracies of predictions on early cancer relapse were measured using artificial neural networks (ANNs), in which different ANNs were estimated using the data sets with imputed missing values. The imputation methods based on machine learning algorithms outperformed imputation statistical methods in the prediction of patient outcome. Friedman's test revealed a significant difference (p=0.0091) in the observed area under the ROC curve (AUC) values, and the pairwise comparison test showed that the AUCs for MLP, KNN and SOM were significantly higher (p=0.0053, p=0.0048 and p=0.0071, respectively) than the AUC from the LD-based prognosis model. The methods based on machine learning techniques were the most suited for the imputation of missing values and led to a significant enhancement of prognosis accuracy compared to imputation methods based on statistical procedures. Copyright © 2010 Elsevier B.V. All rights reserved.

  4. Multiple- vs Non- or Single-Imputation based Fuzzy Clustering for Incomplete Longitudinal Behavioral Intervention Data.

    Science.gov (United States)

    Zhang, Zhaoyang; Fang, Hua

    2016-06-01

    Disentangling patients' behavioral variations is a critical step for better understanding an intervention's effects on individual outcomes. Missing data commonly exist in longitudinal behavioral intervention studies. Multiple imputation (MI) has been well studied for missing data analyses in the statistical field, however, has not yet been scrutinized for clustering or unsupervised learning, which are important techniques for explaining the heterogeneity of treatment effects. Built upon previous work on MI fuzzy clustering, this paper theoretically, empirically and numerically demonstrate how MI-based approach can reduce the uncertainty of clustering accuracy in comparison to non-and single-imputation based clustering approach. This paper advances our understanding of the utility and strength of multiple-imputation (MI) based fuzzy clustering approach to processing incomplete longitudinal behavioral intervention data.

  5. Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Guldbrandtsen, Bernt; Sahana, Goutam

    2014-01-01

    autosome 29 using 387,436 bi-allelic variants and 13,612 SNP markers from the bovine HD panel. Results A combined breed reference population led to higher imputation accuracies than did a single breed reference. The highest accuracy of imputation for all three test breeds was achieved when using BEAGLE...... with un-phased reference data (mean genotype correlations of 0.90, 0.89 and 0.87 for Holstein, Jersey and Nordic Red respectively) but IMPUTE2 with un-phased reference data gave similar accuracies for Holsteins and Nordic Red. Pre-phasing the reference data only lead to a minor decrease in the imputation...

  6. An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.

    Science.gov (United States)

    Liu, Yuzhe; Gopalakrishnan, Vanathi

    2017-03-01

    Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.

  7. A comparison of existing methods for multiple imputation in individual participant data meta-analysis.

    Science.gov (United States)

    Kunkel, Deborah; Kaizar, Eloise E

    2017-09-30

    Multiple imputation is a popular method for addressing missing data, but its implementation is difficult when data have a multilevel structure and one or more variables are systematically missing. This systematic missing data pattern may commonly occur in meta-analysis of individual participant data, where some variables are never observed in some studies, but are present in other hierarchical data settings. In these cases, valid imputation must account for both relationships between variables and correlation within studies. Proposed methods for multilevel imputation include specifying a full joint model and multiple imputation with chained equations (MICE). While MICE is attractive for its ease of implementation, there is little existing work describing conditions under which this is a valid alternative to specifying the full joint model. We present results showing that for multilevel normal models, MICE is rarely exactly equivalent to joint model imputation. Through a simulation study and an example using data from a traumatic brain injury study, we found that in spite of theoretical differences, MICE imputations often produce results similar to those obtained using the joint model. We also assess the influence of prior distributions in MICE imputation methods and find that when missingness is high, prior choices in MICE models tend to affect estimation of across-study variability more than compatibility of conditional likelihoods. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  8. Imputation methods for filling missing data in urban air pollution data for Malaysia

    Directory of Open Access Journals (Sweden)

    Nur Afiqah Zakaria

    2018-06-01

    Full Text Available The air quality measurement data obtained from the continuous ambient air quality monitoring (CAAQM station usually contained missing data. The missing observations of the data usually occurred due to machine failure, routine maintenance and human error. In this study, the hourly monitoring data of CO, O3, PM10, SO2, NOx, NO2, ambient temperature and humidity were used to evaluate four imputation methods (Mean Top Bottom, Linear Regression, Multiple Imputation and Nearest Neighbour. The air pollutants observations were simulated into four percentages of simulated missing data i.e. 5%, 10%, 15% and 20%. Performance measures namely the Mean Absolute Error, Root Mean Squared Error, Coefficient of Determination and Index of Agreement were used to describe the goodness of fit of the imputation methods. From the results of the performance measures, Mean Top Bottom method was selected as the most appropriate imputation method for filling in the missing values in air pollutants data.

  9. Comparison of three boosting methods in parent-offspring trios for genotype imputation using simulation study

    Directory of Open Access Journals (Sweden)

    Abbas Mikhchi

    2016-01-01

    Full Text Available Abstract Background Genotype imputation is an important process of predicting unknown genotypes, which uses reference population with dense genotypes to predict missing genotypes for both human and animal genetic variations at a low cost. Machine learning methods specially boosting methods have been used in genetic studies to explore the underlying genetic profile of disease and build models capable of predicting missing values of a marker. Methods In this study strategies and factors affecting the imputation accuracy of parent-offspring trios compared from lower-density SNP panels (5 K to high density (10 K SNP panel using three different Boosting methods namely TotalBoost (TB, LogitBoost (LB and AdaBoost (AB. The methods employed using simulated data to impute the un-typed SNPs in parent-offspring trios. Four different datasets of G1 (100 trios with 5 k SNPs, G2 (100 trios with 10 k SNPs, G3 (500 trios with 5 k SNPs, and G4 (500 trio with 10 k SNPs were simulated. In four datasets all parents were genotyped completely, and offspring genotyped with a lower density panel. Results Comparison of the three methods for imputation showed that the LB outperformed AB and TB for imputation accuracy. The time of computation were different between methods. The AB was the fastest algorithm. The higher SNP densities resulted the increase of the accuracy of imputation. Larger trios (i.e. 500 was better for performance of LB and TB. Conclusions The conclusion is that the three methods do well in terms of imputation accuracy also the dense chip is recommended for imputation of parent-offspring trios.

  10. Comparison of missing value imputation methods in time series: the case of Turkish meteorological data

    Science.gov (United States)

    Yozgatligil, Ceylan; Aslan, Sipan; Iyigun, Cem; Batmaz, Inci

    2013-04-01

    This study aims to compare several imputation methods to complete the missing values of spatio-temporal meteorological time series. To this end, six imputation methods are assessed with respect to various criteria including accuracy, robustness, precision, and efficiency for artificially created missing data in monthly total precipitation and mean temperature series obtained from the Turkish State Meteorological Service. Of these methods, simple arithmetic average, normal ratio (NR), and NR weighted with correlations comprise the simple ones, whereas multilayer perceptron type neural network and multiple imputation strategy adopted by Monte Carlo Markov Chain based on expectation-maximization (EM-MCMC) are computationally intensive ones. In addition, we propose a modification on the EM-MCMC method. Besides using a conventional accuracy measure based on squared errors, we also suggest the correlation dimension (CD) technique of nonlinear dynamic time series analysis which takes spatio-temporal dependencies into account for evaluating imputation performances. Depending on the detailed graphical and quantitative analysis, it can be said that although computational methods, particularly EM-MCMC method, are computationally inefficient, they seem favorable for imputation of meteorological time series with respect to different missingness periods considering both measures and both series studied. To conclude, using the EM-MCMC algorithm for imputing missing values before conducting any statistical analyses of meteorological data will definitely decrease the amount of uncertainty and give more robust results. Moreover, the CD measure can be suggested for the performance evaluation of missing data imputation particularly with computational methods since it gives more precise results in meteorological time series.

  11. A Nonparametric, Multiple Imputation-Based Method for the Retrospective Integration of Data Sets

    Science.gov (United States)

    Carrig, Madeline M.; Manrique-Vallier, Daniel; Ranby, Krista W.; Reiter, Jerome P.; Hoyle, Rick H.

    2015-01-01

    Complex research questions often cannot be addressed adequately with a single data set. One sensible alternative to the high cost and effort associated with the creation of large new data sets is to combine existing data sets containing variables related to the constructs of interest. The goal of the present research was to develop a flexible, broadly applicable approach to the integration of disparate data sets that is based on nonparametric multiple imputation and the collection of data from a convenient, de novo calibration sample. We demonstrate proof of concept for the approach by integrating three existing data sets containing items related to the extent of problematic alcohol use and associations with deviant peers. We discuss both necessary conditions for the approach to work well and potential strengths and weaknesses of the method compared to other data set integration approaches. PMID:26257437

  12. Multiple imputation methods for nonparametric inference on cumulative incidence with missing cause of failure

    Science.gov (United States)

    Lee, Minjung; Dignam, James J.; Han, Junhee

    2014-01-01

    We propose a nonparametric approach for cumulative incidence estimation when causes of failure are unknown or missing for some subjects. Under the missing at random assumption, we estimate the cumulative incidence function using multiple imputation methods. We develop asymptotic theory for the cumulative incidence estimators obtained from multiple imputation methods. We also discuss how to construct confidence intervals for the cumulative incidence function and perform a test for comparing the cumulative incidence functions in two samples with missing cause of failure. Through simulation studies, we show that the proposed methods perform well. The methods are illustrated with data from a randomized clinical trial in early stage breast cancer. PMID:25043107

  13. Comparison of different methods for imputing genome-wide marker genotypes in Swedish and Finnish Red Cattle

    DEFF Research Database (Denmark)

    Ma, Peipei; Brøndum, Rasmus Froberg; Qin, Zahng

    2013-01-01

    This study investigated the imputation accuracy of different methods, considering both the minor allele frequency and relatedness between individuals in the reference and test data sets. Two data sets from the combined population of Swedish and Finnish Red Cattle were used to test the influence...... coefficient was lower when the minor allele frequency was lower. The results indicate that Beagle and IMPUTE2 provide the most robust and accurate imputation accuracies, but considering computing time and memory usage, FImpute is another alternative method....

  14. Influence of Imputation and EM Methods on Factor Analysis When Item Nonresponse in Questionnaire Data Is Nonignorable.

    Science.gov (United States)

    Bernaards, Coen A.; Sijtsma, Klaas

    2000-01-01

    Using simulation, studied the influence of each of 12 imputation methods and 2 methods using the EM algorithm on the results of maximum likelihood factor analysis as compared with results from the complete data factor analysis (no missing scores). Discusses why EM methods recovered complete data factor loadings better than imputation methods. (SLD)

  15. Statistical Analysis of a Class: Monte Carlo and Multiple Imputation Spreadsheet Methods for Estimation and Extrapolation

    Science.gov (United States)

    Fish, Laurel J.; Halcoussis, Dennis; Phillips, G. Michael

    2017-01-01

    The Monte Carlo method and related multiple imputation methods are traditionally used in math, physics and science to estimate and analyze data and are now becoming standard tools in analyzing business and financial problems. However, few sources explain the application of the Monte Carlo method for individuals and business professionals who are…

  16. Bias and Precision of the "Multiple Imputation, Then Deletion" Method for Dealing With Missing Outcome Data.

    Science.gov (United States)

    Sullivan, Thomas R; Salter, Amy B; Ryan, Philip; Lee, Katherine J

    2015-09-15

    Multiple imputation (MI) is increasingly being used to handle missing data in epidemiologic research. When data on both the exposure and the outcome are missing, an alternative to standard MI is the "multiple imputation, then deletion" (MID) method, which involves deleting imputed outcomes prior to analysis. While MID has been shown to provide efficiency gains over standard MI when analysis and imputation models are the same, the performance of MID in the presence of auxiliary variables for the incomplete outcome is not well understood. Using simulated data, we evaluated the performance of standard MI and MID in regression settings where data were missing on both the outcome and the exposure and where an auxiliary variable associated with the incomplete outcome was included in the imputation model. When the auxiliary variable was unrelated to missingness in the outcome, both standard MI and MID produced negligible bias when estimating regression parameters, with standard MI being more efficient in most settings. However, when the auxiliary variable was also associated with missingness in the outcome, alarmingly MID produced markedly biased parameter estimates. On the basis of these results, we recommend that researchers use standard MI rather than MID in the presence of auxiliary variables associated with an incomplete outcome. © The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  17. Imputation Accuracy from Low to Moderate Density Single Nucleotide Polymorphism Chips in a Thai Multibreed Dairy Cattle Population

    Directory of Open Access Journals (Sweden)

    Danai Jattawa

    2016-04-01

    Full Text Available The objective of this study was to investigate the accuracy of imputation from low density (LDC to moderate density SNP chips (MDC in a Thai Holstein-Other multibreed dairy cattle population. Dairy cattle with complete pedigree information (n = 1,244 from 145 dairy farms were genotyped with GeneSeek GGP20K (n = 570, GGP26K (n = 540 and GGP80K (n = 134 chips. After checking for single nucleotide polymorphism (SNP quality, 17,779 SNP markers in common between the GGP20K, GGP26K, and GGP80K were used to represent MDC. Animals were divided into two groups, a reference group (n = 912 and a test group (n = 332. The SNP markers chosen for the test group were those located in positions corresponding to GeneSeek GGP9K (n = 7,652. The LDC to MDC genotype imputation was carried out using three different software packages, namely Beagle 3.3 (population-based algorithm, FImpute 2.2 (combined family- and population-based algorithms and Findhap 4 (combined family- and population-based algorithms. Imputation accuracies within and across chromosomes were calculated as ratios of correctly imputed SNP markers to overall imputed SNP markers. Imputation accuracy for the three software packages ranged from 76.79% to 93.94%. FImpute had higher imputation accuracy (93.94% than Findhap (84.64% and Beagle (76.79%. Imputation accuracies were similar and consistent across chromosomes for FImpute, but not for Findhap and Beagle. Most chromosomes that showed either high (73% or low (80% imputation accuracies were the same chromosomes that had above and below average linkage disequilibrium (LD; defined here as the correlation between pairs of adjacent SNP within chromosomes less than or equal to 1 Mb apart. Results indicated that FImpute was more suitable than Findhap and Beagle for genotype imputation in this Thai multibreed population. Perhaps additional increments in imputation accuracy could be achieved by increasing the completeness of pedigree information.

  18. Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods

    Directory of Open Access Journals (Sweden)

    Seaman Shaun R

    2012-04-01

    Full Text Available Abstract Background Multiple imputation is often used for missing data. When a model contains as covariates more than one function of a variable, it is not obvious how best to impute missing values in these covariates. Consider a regression with outcome Y and covariates X and X2. In 'passive imputation' a value X* is imputed for X and then X2 is imputed as (X*2. A recent proposal is to treat X2 as 'just another variable' (JAV and impute X and X2 under multivariate normality. Methods We use simulation to investigate the performance of three methods that can easily be implemented in standard software: 1 linear regression of X on Y to impute X then passive imputation of X2; 2 the same regression but with predictive mean matching (PMM; and 3 JAV. We also investigate the performance of analogous methods when the analysis involves an interaction, and study the theoretical properties of JAV. The application of the methods when complete or incomplete confounders are also present is illustrated using data from the EPIC Study. Results JAV gives consistent estimation when the analysis is linear regression with a quadratic or interaction term and X is missing completely at random. When X is missing at random, JAV may be biased, but this bias is generally less than for passive imputation and PMM. Coverage for JAV was usually good when bias was small. However, in some scenarios with a more pronounced quadratic effect, bias was large and coverage poor. When the analysis was logistic regression, JAV's performance was sometimes very poor. PMM generally improved on passive imputation, in terms of bias and coverage, but did not eliminate the bias. Conclusions Given the current state of available software, JAV is the best of a set of imperfect imputation methods for linear regression with a quadratic or interaction effect, but should not be used for logistic regression.

  19. A comparison of methods for creating multiple imputations of nominal variables

    NARCIS (Netherlands)

    Lang, K.M.; Wu, Wei

    2017-01-01

    Many variables that are analyzed by social scientists are nominal in nature. When missing data occur on these variables, optimal recovery of the analysis model's parameters is a challenging endeavor. One of the most popular methods to deal with missing nominal data is multiple imputation (MI). This

  20. Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

    Science.gov (United States)

    Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

    2009-01-01

    Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....

  1. Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods.

    Science.gov (United States)

    Seaman, Shaun R; Bartlett, Jonathan W; White, Ian R

    2012-04-10

    Multiple imputation is often used for missing data. When a model contains as covariates more than one function of a variable, it is not obvious how best to impute missing values in these covariates. Consider a regression with outcome Y and covariates X and X2. In 'passive imputation' a value X* is imputed for X and then X2 is imputed as (X*)2. A recent proposal is to treat X2 as 'just another variable' (JAV) and impute X and X2 under multivariate normality. We use simulation to investigate the performance of three methods that can easily be implemented in standard software: 1) linear regression of X on Y to impute X then passive imputation of X2; 2) the same regression but with predictive mean matching (PMM); and 3) JAV. We also investigate the performance of analogous methods when the analysis involves an interaction, and study the theoretical properties of JAV. The application of the methods when complete or incomplete confounders are also present is illustrated using data from the EPIC Study. JAV gives consistent estimation when the analysis is linear regression with a quadratic or interaction term and X is missing completely at random. When X is missing at random, JAV may be biased, but this bias is generally less than for passive imputation and PMM. Coverage for JAV was usually good when bias was small. However, in some scenarios with a more pronounced quadratic effect, bias was large and coverage poor. When the analysis was logistic regression, JAV's performance was sometimes very poor. PMM generally improved on passive imputation, in terms of bias and coverage, but did not eliminate the bias. Given the current state of available software, JAV is the best of a set of imperfect imputation methods for linear regression with a quadratic or interaction effect, but should not be used for logistic regression.

  2. A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods.

    Science.gov (United States)

    Ratcliffe, B; El-Dien, O G; Klápště, J; Porth, I; Chen, C; Jaquish, B; El-Kassaby, Y A

    2015-12-01

    Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.

  3. An Imputation Method for Missing Traffic Data Based on FCM Optimized by PSO-SVR

    Directory of Open Access Journals (Sweden)

    Qiang Shang

    2018-01-01

    Full Text Available Missing traffic data are inevitable due to detector failure or communication failure. Currently, most of imputation methods estimated the missing traffic values by using spatial-temporal information as much as possible. However, it ignores an important fact that spatial-temporal information of the traffic missing data is often incomplete and unavailable. Moreover, most of the existing methods are verified by traffic data from freeway, and their applicability to urban road data needs to be further verified. In this paper, a hybrid method for missing traffic data imputation is proposed using FCM optimized by a combination of PSO algorithm and SVR. In this method, FCM is the basic algorithm and the parameters of FCM are optimized. Firstly, the patterns of missing traffic data are analyzed and the representation of missing traffic data is given using matrix-based data structure. Then, traffic data from urban expressway and urban arterial road are used to analyze spatial-temporal correlation of the traffic data for the determination of the proposed method input. Finally, numerical experiment is designed from three perspectives to test the performance of the proposed method. The experimental results demonstrate that the novel method not only has high imputation precision, but also exhibits good robustness.

  4. The rise of multiple imputation: a review of the reporting and implementation of the method in medical research.

    Science.gov (United States)

    Hayati Rezvan, Panteha; Lee, Katherine J; Simpson, Julie A

    2015-04-07

    Missing data are common in medical research, which can lead to a loss in statistical power and potentially biased results if not handled appropriately. Multiple imputation (MI) is a statistical method, widely adopted in practice, for dealing with missing data. Many academic journals now emphasise the importance of reporting information regarding missing data and proposed guidelines for documenting the application of MI have been published. This review evaluated the reporting of missing data, the application of MI including the details provided regarding the imputation model, and the frequency of sensitivity analyses within the MI framework in medical research articles. A systematic review of articles published in the Lancet and New England Journal of Medicine between January 2008 and December 2013 in which MI was implemented was carried out. We identified 103 papers that used MI, with the number of papers increasing from 11 in 2008 to 26 in 2013. Nearly half of the papers specified the proportion of complete cases or the proportion with missing data by each variable. In the majority of the articles (86%) the imputed variables were specified. Of the 38 papers (37%) that stated the method of imputation, 20 used chained equations, 8 used multivariate normal imputation, and 10 used alternative methods. Very few articles (9%) detailed how they handled non-normally distributed variables during imputation. Thirty-nine papers (38%) stated the variables included in the imputation model. Less than half of the papers (46%) reported the number of imputations, and only two papers compared the distribution of imputed and observed data. Sixty-six papers presented the results from MI as a secondary analysis. Only three articles carried out a sensitivity analysis following MI to assess departures from the missing at random assumption, with details of the sensitivity analyses only provided by one article. This review outlined deficiencies in the documenting of missing data and the

  5. Use of multiple imputation method to improve estimation of missing baseline serum creatinine in acute kidney injury research.

    Science.gov (United States)

    Siew, Edward D; Peterson, Josh F; Eden, Svetlana K; Moons, Karel G; Ikizler, T Alp; Matheny, Michael E

    2013-01-01

    Baseline creatinine (BCr) is frequently missing in AKI studies. Common surrogate estimates can misclassify AKI and adversely affect the study of related outcomes. This study examined whether multiple imputation improved accuracy of estimating missing BCr beyond current recommendations to apply assumed estimated GFR (eGFR) of 75 ml/min per 1.73 m(2) (eGFR 75). From 41,114 unique adult admissions (13,003 with and 28,111 without BCr data) at Vanderbilt University Hospital between 2006 and 2008, a propensity score model was developed to predict likelihood of missing BCr. Propensity scoring identified 6502 patients with highest likelihood of missing BCr among 13,003 patients with known BCr to simulate a "missing" data scenario while preserving actual reference BCr. Within this cohort (n=6502), the ability of various multiple-imputation approaches to estimate BCr and classify AKI were compared with that of eGFR 75. All multiple-imputation methods except the basic one more closely approximated actual BCr than did eGFR 75. Total AKI misclassification was lower with multiple imputation (full multiple imputation + serum creatinine) (9.0%) than with eGFR 75 (12.3%; Pmultiple imputation + serum creatinine) (15.3%) versus eGFR 75 (40.5%; PMultiple imputation improved specificity and positive predictive value for detecting AKI at the expense of modestly decreasing sensitivity relative to eGFR 75. Multiple imputation can improve accuracy in estimating missing BCr and reduce misclassification of AKI beyond currently proposed methods.

  6. Methods for Imputing Missing Efficacy Data in Clinical Trials of Biologic Psoriasis Therapies: Implications for Interpretations of Trial Results.

    Science.gov (United States)

    Langley, Richard G B; Reich, Kristian; Papavassilis, Charis; Fox, Todd; Gong, Yankun; Gu Ttner, Achim

    2017-08-01

    BACKGROUND: An issue in long-term clinical trials of biologics in psoriasis is how to handle missing efficacy data. This methodological challenge may not be understood by clinicians, yet can have a significant effect on the interpretation of clinical trials. OBJECTIVE Evaluate the effects of different data imputation methods on apparent secukinumab response rates. METHODS: Post hoc analyses were conducted on efficacy data from 2 phase III, multicenter, randomized, double-blind trials (FIXTURE and ERASURE) of secukinumab in moderate to severe plaque psoriasis. Per study protocols, missing data were imputed using strict non-response imputation (NRI), a highly conservative method that assumes non-response for all missing data. Alternative imputation methods (observed data, last observation carried forward [LOCF], modified NRI, and multiple imputation [MI]) were applied in this analysis and the resultant response rates compared. RESULTS: Response rates obtained with each imputation method diverged increasingly over 52-weeks of follow-up. Strict NRI response estimates were consistently lower than those using the other methods. At week 52, Psoriasis Area and Severity Index (PASI) 90 rates for secukinumab 300 mg based on strict NRI were 9.2% (FIXTURE) and 8.7% (ERASURE) lower than estimates obtained using the least conservative method (observed data). Estimates obtained through LOCF and modified NRI were closest to those produced by MI, currently regarded as the most methodologically sophisticated approach available. CONCLUSION: Awareness of differences in assumptions and limitations among imputation methods is necessary for well-informed interpretation of trial data. J Drugs Dermatol. 2017;16(8):734-742..

  7. Candidate gene analysis using imputed genotypes: cell cycle single-nucleotide polymorphisms and ovarian cancer risk

    DEFF Research Database (Denmark)

    Goode, Ellen L; Fridley, Brooke L; Vierkant, Robert A

    2009-01-01

    , CDK4, RB1, CDKN2D, and CCNE1) and one gene region (CDKN2A-CDKN2B). Because of the semi-overlapping nature of the 123 assayed tagging SNPs, we performed multiple imputation based on fastPHASE using data from White non-Hispanic study participants and participants in the international HapMap Consortium...... and National Institute of Environmental Health Sciences SNPs Program. Logistic regression assuming a log-additive model was done on combined and imputed data. We observed strengthened signals in imputation-based analyses at several SNPs, particularly CDKN2A-CDKN2B rs3731239; CCND1 rs602652, rs3212879, rs649392...

  8. Evaluation of the imputation performance of the program IMPUTE in an admixed sample from Mexico City using several model designs

    Science.gov (United States)

    2012-01-01

    Background We explored the imputation performance of the program IMPUTE in an admixed sample from Mexico City. The following issues were evaluated: (a) the impact of different reference panels (HapMap vs. 1000 Genomes) on imputation; (b) potential differences in imputation performance between single-step vs. two-step (phasing and imputation) approaches; (c) the effect of different INFO score thresholds on imputation performance and (d) imputation performance in common vs. rare markers. Methods The sample from Mexico City comprised 1,310 individuals genotyped with the Affymetrix 5.0 array. We randomly masked 5% of the markers directly genotyped on chromosome 12 (n = 1,046) and compared the imputed genotypes with the microarray genotype calls. Imputation was carried out with the program IMPUTE. The concordance rates between the imputed and observed genotypes were used as a measure of imputation accuracy and the proportion of non-missing genotypes as a measure of imputation efficacy. Results The single-step imputation approach produced slightly higher concordance rates than the two-step strategy (99.1% vs. 98.4% when using the HapMap phase II combined panel), but at the expense of a lower proportion of non-missing genotypes (85.5% vs. 90.1%). The 1,000 Genomes reference sample produced similar concordance rates to the HapMap phase II panel (98.4% for both datasets, using the two-step strategy). However, the 1000 Genomes reference sample increased substantially the proportion of non-missing genotypes (94.7% vs. 90.1%). Rare variants (Mexico City, which has primarily Native American (62%) and European (33%) contributions. Genotype concordances were higher than 98.4% using all the imputation strategies, in spite of the fact that no Native American samples are present in the HapMap and 1000 Genomes reference panels. The best balance of imputation accuracy and efficiency was obtained with the 1,000 Genomes panel. Rare variants were not captured effectively by any of

  9. Use of Multiple Imputation Method to Improve Estimation of Missing Baseline Serum Creatinine in Acute Kidney Injury Research

    Science.gov (United States)

    Peterson, Josh F.; Eden, Svetlana K.; Moons, Karel G.; Ikizler, T. Alp; Matheny, Michael E.

    2013-01-01

    Summary Background and objectives Baseline creatinine (BCr) is frequently missing in AKI studies. Common surrogate estimates can misclassify AKI and adversely affect the study of related outcomes. This study examined whether multiple imputation improved accuracy of estimating missing BCr beyond current recommendations to apply assumed estimated GFR (eGFR) of 75 ml/min per 1.73 m2 (eGFR 75). Design, setting, participants, & measurements From 41,114 unique adult admissions (13,003 with and 28,111 without BCr data) at Vanderbilt University Hospital between 2006 and 2008, a propensity score model was developed to predict likelihood of missing BCr. Propensity scoring identified 6502 patients with highest likelihood of missing BCr among 13,003 patients with known BCr to simulate a “missing” data scenario while preserving actual reference BCr. Within this cohort (n=6502), the ability of various multiple-imputation approaches to estimate BCr and classify AKI were compared with that of eGFR 75. Results All multiple-imputation methods except the basic one more closely approximated actual BCr than did eGFR 75. Total AKI misclassification was lower with multiple imputation (full multiple imputation + serum creatinine) (9.0%) than with eGFR 75 (12.3%; Pserum creatinine) (15.3%) versus eGFR 75 (40.5%; P<0.001). Multiple imputation improved specificity and positive predictive value for detecting AKI at the expense of modestly decreasing sensitivity relative to eGFR 75. Conclusions Multiple imputation can improve accuracy in estimating missing BCr and reduce misclassification of AKI beyond currently proposed methods. PMID:23037980

  10. Randomly and Non-Randomly Missing Renal Function Data in the Strong Heart Study: A Comparison of Imputation Methods.

    Directory of Open Access Journals (Sweden)

    Nawar Shara

    Full Text Available Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS. Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989-1991, 2 (1993-1995, and 3 (1998-1999 was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results.

  11. Should multiple imputation be the method of choice for handling missing data in randomized trials?

    Science.gov (United States)

    Sullivan, Thomas R; White, Ian R; Salter, Amy B; Ryan, Philip; Lee, Katherine J

    2016-01-01

    The use of multiple imputation has increased markedly in recent years, and journal reviewers may expect to see multiple imputation used to handle missing data. However in randomized trials, where treatment group is always observed and independent of baseline covariates, other approaches may be preferable. Using data simulation we evaluated multiple imputation, performed both overall and separately by randomized group, across a range of commonly encountered scenarios. We considered both missing outcome and missing baseline data, with missing outcome data induced under missing at random mechanisms. Provided the analysis model was correctly specified, multiple imputation produced unbiased treatment effect estimates, but alternative unbiased approaches were often more efficient. When the analysis model overlooked an interaction effect involving randomized group, multiple imputation produced biased estimates of the average treatment effect when applied to missing outcome data, unless imputation was performed separately by randomized group. Based on these results, we conclude that multiple imputation should not be seen as the only acceptable way to handle missing data in randomized trials. In settings where multiple imputation is adopted, we recommend that imputation is carried out separately by randomized group.

  12. A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java

    Science.gov (United States)

    Susianto, Y.; Notodiputro, K. A.; Kurnia, A.; Wijayanto, H.

    2017-03-01

    Missing values in repeated measurements have attracted concerns from researchers in the last few years. For many years, the standard statistical methods for repeated measurements have been developed assuming that the data was complete. The standard statistical methods cannot produce good estimates if the data suffered substantially by missing values. To overcome this problem the imputation methods could be used. This paper discusses three imputation methods namely the Yates method, expectation-maximization (EM) algorithm, and Markov Chain Monte Carlo (MCMC) method. These methods were used to estimate the missing values of per-capita expenditure data at sub-districts level in Central Java. The performance of these imputation methods is evaluated by comparing the mean square error (MSE) and mean absolute error (MAE) of the resulting estimates using linear mixed models. It is showed that MSE and MAE produced by the Yates method are lower than the MSE and MAE resulted from both the EM algorithm and the MCMC method. Therefore, the Yates method is recommended to impute the missing values of per capita expenditure at sub-district level.

  13. Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation.

    Science.gov (United States)

    Enders, Craig K; Mistler, Stephen A; Keller, Brian T

    2016-06-01

    Although missing data methods have advanced in recent years, methodologists have devoted less attention to multilevel data structures where observations at level-1 are nested within higher-order organizational units at level-2 (e.g., individuals within neighborhoods; repeated measures nested within individuals; students nested within classrooms). Joint modeling and chained equations imputation are the principal imputation frameworks for single-level data, and both have multilevel counterparts. These approaches differ algorithmically and in their functionality; both are appropriate for simple random intercept analyses with normally distributed data, but they differ beyond that. The purpose of this paper is to describe multilevel imputation strategies and evaluate their performance in a variety of common analysis models. Using multiple imputation theory and computer simulations, we derive 4 major conclusions: (a) joint modeling and chained equations imputation are appropriate for random intercept analyses; (b) the joint model is superior for analyses that posit different within- and between-cluster associations (e.g., a multilevel regression model that includes a level-1 predictor and its cluster means, a multilevel structural equation model with different path values at level-1 and level-2); (c) chained equations imputation provides a dramatic improvement over joint modeling in random slope analyses; and (d) a latent variable formulation for categorical variables is quite effective. We use a real data analysis to demonstrate multilevel imputation, and we suggest a number of avenues for future research. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  14. Tuning multiple imputation by predictive mean matching and local residual draws.

    Science.gov (United States)

    Morris, Tim P; White, Ian R; Royston, Patrick

    2014-06-05

    Multiple imputation is a commonly used method for handling incomplete covariates as it can provide valid inference when data are missing at random. This depends on being able to correctly specify the parametric model used to impute missing values, which may be difficult in many realistic settings. Imputation by predictive mean matching (PMM) borrows an observed value from a donor with a similar predictive mean; imputation by local residual draws (LRD) instead borrows the donor's residual. Both methods relax some assumptions of parametric imputation, promising greater robustness when the imputation model is misspecified. We review development of PMM and LRD and outline the various forms available, and aim to clarify some choices about how and when they should be used. We compare performance to fully parametric imputation in simulation studies, first when the imputation model is correctly specified and then when it is misspecified. In using PMM or LRD we strongly caution against using a single donor, the default value in some implementations, and instead advocate sampling from a pool of around 10 donors. We also clarify which matching metric is best. Among the current MI software there are several poor implementations. PMM and LRD may have a role for imputing covariates (i) which are not strongly associated with outcome, and (ii) when the imputation model is thought to be slightly but not grossly misspecified. Researchers should spend efforts on specifying the imputation model correctly, rather than expecting predictive mean matching or local residual draws to do the work.

  15. Application of a novel hybrid method for spatiotemporal data imputation: A case study of the Minqin County groundwater level

    Science.gov (United States)

    Zhang, Zhongrong; Yang, Xuan; Li, Hao; Li, Weide; Yan, Haowen; Shi, Fei

    2017-10-01

    The techniques for data analyses have been widely developed in past years, however, missing data still represent a ubiquitous problem in many scientific fields. In particular, dealing with missing spatiotemporal data presents an enormous challenge. Nonetheless, in recent years, a considerable amount of research has focused on spatiotemporal problems, making spatiotemporal missing data imputation methods increasingly indispensable. In this paper, a novel spatiotemporal hybrid method is proposed to verify and imputed spatiotemporal missing values. This new method, termed SOM-FLSSVM, flexibly combines three advanced techniques: self-organizing feature map (SOM) clustering, the fruit fly optimization algorithm (FOA) and the least squares support vector machine (LSSVM). We employ a cross-validation (CV) procedure and FOA swarm intelligence optimization strategy that can search available parameters and determine the optimal imputation model. The spatiotemporal underground water data for Minqin County, China, were selected to test the reliability and imputation ability of SOM-FLSSVM. We carried out a validation experiment and compared three well-studied models with SOM-FLSSVM using a different missing data ratio from 0.1 to 0.8 in the same data set. The results demonstrate that the new hybrid method performs well in terms of both robustness and accuracy for spatiotemporal missing data.

  16. Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis.

    Science.gov (United States)

    Eekhout, Iris; van de Wiel, Mark A; Heymans, Martijn W

    2017-08-22

    Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin's Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels significantly contributes to the model, different methods are available. For example pooling chi-square tests with multiple degrees of freedom, pooling likelihood ratio test statistics, and pooling based on the covariance matrix of the regression model. These methods are more complex than RR and are not available in all mainstream statistical software packages. In addition, they do not always obtain optimal power levels. We argue that the median of the p-values from the overall significance tests from the analyses on the imputed datasets can be used as an alternative pooling rule for categorical variables. The aim of the current study is to compare different methods to test a categorical variable for significance after multiple imputation on applicability and power. In a large simulation study, we demonstrated the control of the type I error and power levels of different pooling methods for categorical variables. This simulation study showed that for non-significant categorical covariates the type I error is controlled and the statistical power of the median pooling rule was at least equal to current multiple parameter tests. An empirical data example showed similar results. It can therefore be concluded that using the median of the p-values from the imputed data analyses is an attractive and easy to use alternative method for significance testing of categorical variables.

  17. A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method

    Directory of Open Access Journals (Sweden)

    Jun-He Yang

    2017-01-01

    Full Text Available Reservoirs are important for households and impact the national economy. This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir’s water level. This study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. The two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. The proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir’s water level. This was done to compare with the listing method under the forecasting error. These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.

  18. Predictive mean matching imputation of semicontinuous variables

    NARCIS (Netherlands)

    Vink, G.; Frank, L.E.; Pannekoek, J.; Buuren, S. van

    2014-01-01

    Multiple imputation methods properly account for the uncertainty of missing data. One of those methods for creating multiple imputations is predictive mean matching (PMM), a general purpose method. Little is known about the performance of PMM in imputing non-normal semicontinuous data (skewed data

  19. Multiple imputation with multivariate imputation by chained equation (MICE) package.

    Science.gov (United States)

    Zhang, Zhongheng

    2016-01-01

    Multiple imputation (MI) is an advanced technique for handing missing values. It is superior to single imputation in that it takes into account uncertainty in missing value imputation. However, MI is underutilized in medical literature due to lack of familiarity and computational challenges. The article provides a step-by-step approach to perform MI by using R multivariate imputation by chained equation (MICE) package. The procedure firstly imputed m sets of complete dataset by calling mice() function. Then statistical analysis such as univariate analysis and regression model can be performed within each dataset by calling with() function. This function sets the environment for statistical analysis. Lastly, the results obtained from each analysis are combined by using pool() function.

  20. Significant variation between SNP-based HLA imputations in diverse populations: the last mile is the hardest.

    Science.gov (United States)

    Pappas, D J; Lizee, A; Paunic, V; Beutner, K R; Motyer, A; Vukcevic, D; Leslie, S; Biesiada, J; Meller, J; Taylor, K D; Zheng, X; Zhao, L P; Gourraud, P-A; Hollenbach, J A; Mack, S J; Maiers, M

    2017-04-25

    Four single nucleotide polymorphism (SNP)-based human leukocyte antigen (HLA) imputation methods (e-HLA, HIBAG, HLA*IMP:02 and MAGPrediction) were trained using 1000 Genomes SNP and HLA genotypes and assessed for their ability to accurately impute molecular HLA-A, -B, -C and -DRB1 genotypes in the Human Genome Diversity Project cell panel. Imputation concordance was high (>89%) across all methods for both HLA-A and HLA-C, but HLA-B and HLA-DRB1 proved generally difficult to impute. Overall, <27.8% of subjects were correctly imputed for all HLA loci by any method. Concordance across all loci was not enhanced via the application of confidence thresholds; reliance on confidence scores across methods only led to noticeable improvement (+3.2%) for HLA-DRB1. As the HLA complex is highly relevant to the study of human health and disease, a standardized assessment of SNP-based HLA imputation methods is crucial for advancing genomic research. Considerable room remains for the improvement of HLA-B and especially HLA-DRB1 imputation methods, and no imputation method is as accurate as molecular genotyping. The application of large, ancestrally diverse HLA and SNP reference data sets and multiple imputation methods has the potential to make SNP-based HLA imputation methods a tractable option for determining HLA genotypes.The Pharmacogenomics Journal advance online publication, 25 April 2017; doi:10.1038/tpj.2017.7.

  1. Handling missing data for the identification of charged particles in a multilayer detector: A comparison between different imputation methods

    Energy Technology Data Exchange (ETDEWEB)

    Riggi, S., E-mail: sriggi@oact.inaf.it [INAF - Osservatorio Astrofisico di Catania (Italy); Riggi, D. [Keras Strategy - Milano (Italy); Riggi, F. [Dipartimento di Fisica e Astronomia - Università di Catania (Italy); INFN, Sezione di Catania (Italy)

    2015-04-21

    Identification of charged particles in a multilayer detector by the energy loss technique may also be achieved by the use of a neural network. The performance of the network becomes worse when a large fraction of information is missing, for instance due to detector inefficiencies. Algorithms which provide a way to impute missing information have been developed over the past years. Among the various approaches, we focused on normal mixtures’ models in comparison with standard mean imputation and multiple imputation methods. Further, to account for the intrinsic asymmetry of the energy loss data, we considered skew-normal mixture models and provided a closed form implementation in the Expectation-Maximization (EM) algorithm framework to handle missing patterns. The method has been applied to a test case where the energy losses of pions, kaons and protons in a six-layers’ Silicon detector are considered as input neurons to a neural network. Results are given in terms of reconstruction efficiency and purity of the various species in different momentum bins.

  2. Multiple imputation strategies for zero-inflated cost data in economic evaluations : which method works best?

    NARCIS (Netherlands)

    MacNeil Vroomen, Janet; Eekhout, Iris; Dijkgraaf, Marcel G; van Hout, Hein; de Rooij, Sophia E; Heymans, Martijn W; Bosmans, Judith E

    2016-01-01

    Cost and effect data often have missing data because economic evaluations are frequently added onto clinical studies where cost data are rarely the primary outcome. The objective of this article was to investigate which multiple imputation strategy is most appropriate to use for missing

  3. Multiple imputation strategies for zero-inflated cost data in economic evaluations: which method works best?

    NARCIS (Netherlands)

    Macneil Vroomen, Janet; Eekhout, Iris; Dijkgraaf, Marcel G.; van Hout, Hein; de Rooij, Sophia E.; Heymans, Martijn W.; Bosmans, Judith E.

    2016-01-01

    Cost and effect data often have missing data because economic evaluations are frequently added onto clinical studies where cost data are rarely the primary outcome. The objective of this article was to investigate which multiple imputation strategy is most appropriate to use for missing

  4. Improving accuracy of rare variant imputation with a two-step imputation approach

    DEFF Research Database (Denmark)

    Kreiner-Møller, Eskil; Medina-Gomez, Carolina; Uitterlinden, André G

    2015-01-01

    not being comprehensively scrutinized. Next-generation arrays ensuring sufficient coverage together with new reference panels, as the 1000 Genomes panel, are emerging to facilitate imputation of low frequent single-nucleotide polymorphisms (minor allele frequency (MAF) two-step......, the concordance rate between calls of imputed and true genotypes was found to be significantly higher for heterozygotes (Ptwo-step approach in our setting improves imputation quality compared with traditional direct imputation noteworthy...

  5. A comparison of selected parametric and imputation methods for estimating snag density and snag quality attributes

    Science.gov (United States)

    Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam

    2012-01-01

    Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ≥ 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ≥ 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification

  6. Imputation with the R Package VIM

    Directory of Open Access Journals (Sweden)

    Alexander Kowarik

    2016-10-01

    Full Text Available The package VIM (Templ, Alfons, Kowarik, and Prantner 2016 is developed to explore and analyze the structure of missing values in data using visualization methods, to impute these missing values with the built-in imputation methods and to verify the imputation process using visualization tools, as well as to produce high-quality graphics for publications. This article focuses on the different imputation techniques available in the package. Four different imputation methods are currently implemented in VIM, namely hot-deck imputation, k-nearest neighbor imputation, regression imputation and iterative robust model-based imputation (Templ, Kowarik, and Filzmoser 2011. All of these methods are implemented in a flexible manner with many options for customization. Furthermore in this article practical examples are provided to highlight the use of the implemented methods on real-world applications. In addition, the graphical user interface of VIM has been re-implemented from scratch resulting in the package VIMGUI (Schopfhauser, Templ, Alfons, Kowarik, and Prantner 2016 to enable users without extensive R skills to access these imputation and visualization methods.

  7. Graphical and numerical diagnostic tools to assess suitability of multiple imputations and imputation models.

    Science.gov (United States)

    Bondarenko, Irina; Raghunathan, Trivellore

    2016-07-30

    Multiple imputation has become a popular approach for analyzing incomplete data. Many software packages are available to multiply impute the missing values and to analyze the resulting completed data sets. However, diagnostic tools to check the validity of the imputations are limited, and the majority of the currently available methods need considerable knowledge of the imputation model. In many practical settings, however, the imputer and the analyst may be different individuals or from different organizations, and the analyst model may or may not be congenial to the model used by the imputer. This article develops and evaluates a set of graphical and numerical diagnostic tools for two practical purposes: (i) for an analyst to determine whether the imputations are reasonable under his/her model assumptions without actually knowing the imputation model assumptions; and (ii) for an imputer to fine tune the imputation model by checking the key characteristics of the observed and imputed values. The tools are based on the numerical and graphical comparisons of the distributions of the observed and imputed values conditional on the propensity of response. The methodology is illustrated using simulated data sets created under a variety of scenarios. The examples focus on continuous and binary variables, but the principles can be used to extend methods for other types of variables. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  8. Multiple imputation of multiple multi-item scales when a full imputation model is infeasible.

    Science.gov (United States)

    Plumpton, Catrin O; Morris, Tim; Hughes, Dyfrig A; White, Ian R

    2016-01-26

    Missing data in a large scale survey presents major challenges. We focus on performing multiple imputation by chained equations when data contain multiple incomplete multi-item scales. Recent authors have proposed imputing such data at the level of the individual item, but this can lead to infeasibly large imputation models. We use data gathered from a large multinational survey, where analysis uses separate logistic regression models in each of nine country-specific data sets. In these data, applying multiple imputation by chained equations to the individual scale items is computationally infeasible. We propose an adaptation of multiple imputation by chained equations which imputes the individual scale items but reduces the number of variables in the imputation models by replacing most scale items with scale summary scores. We evaluate the feasibility of the proposed approach and compare it with a complete case analysis. We perform a simulation study to compare the proposed method with alternative approaches: we do this in a simplified setting to allow comparison with the full imputation model. For the case study, the proposed approach reduces the size of the prediction models from 134 predictors to a maximum of 72 and makes multiple imputation by chained equations computationally feasible. Distributions of imputed data are seen to be consistent with observed data. Results from the regression analysis with multiple imputation are similar to, but more precise than, results for complete case analysis; for the same regression models a 39% reduction in the standard error is observed. The simulation shows that our proposed method can perform comparably against the alternatives. By substantially reducing imputation model sizes, our adaptation makes multiple imputation feasible for large scale survey data with multiple multi-item scales. For the data considered, analysis of the multiply imputed data shows greater power and efficiency than complete case analysis. The

  9. A method for imputing the impact of health problems on at-work performance and productivity from available health data.

    Science.gov (United States)

    Lerner, Debra; Chang, Hong; Rogers, William H; Benson, Carmela; Schein, Jeffrey; Allaire, Saralynn

    2009-05-01

    To develop a method for imputing the work performance and productivity impact of illness and treatment from available data. Using data from four studies of musculoskeletal disorders (eg, osteoarthritis) and pain, we modeled the relationships between scores from the Work Limitations Questionnaire (WLQ), a validated measure of health-related limitations in work performance and productivity, and a series of validated health measures (eg, a pain scale). The 15 health and 5 WLQ variables were significantly associated in 115 of 116 study-specific models (P < 0.05). Fifteen commonly collected health variables may be used to predict WLQ impact (increase or decrease) for samples with musculoskeletal pain and physical impairments to help fill information gaps.

  10. Combining multiple imputation and meta-analysis with individual participant data.

    Science.gov (United States)

    Burgess, Stephen; White, Ian R; Resche-Rigon, Matthieu; Wood, Angela M

    2013-11-20

    Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within-study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse-variance weighted meta-analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between-study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse-variance weighted meta-analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta-analysis, rather than meta-analyzing each of the multiple imputations and then combining the meta-analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

  11. Combining multiple imputation and meta-analysis with individual participant data

    Science.gov (United States)

    Burgess, Stephen; White, Ian R; Resche-Rigon, Matthieu; Wood, Angela M

    2013-01-01

    Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within-study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse-variance weighted meta-analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between-study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse-variance weighted meta-analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta-analysis, rather than meta-analyzing each of the multiple imputations and then combining the meta-analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration. PMID:23703895

  12. Multiple imputation and its application

    CERN Document Server

    Carpenter, James

    2013-01-01

    A practical guide to analysing partially observed data. Collecting, analysing and drawing inferences from data is central to research in the medical and social sciences. Unfortunately, it is rarely possible to collect all the intended data. The literature on inference from the resulting incomplete  data is now huge, and continues to grow both as methods are developed for large and complex data structures, and as increasing computer power and suitable software enable researchers to apply these methods. This book focuses on a particular statistical method for analysing and drawing inferences from incomplete data, called Multiple Imputation (MI). MI is attractive because it is both practical and widely applicable. The authors aim is to clarify the issues raised by missing data, describing the rationale for MI, the relationship between the various imputation models and associated algorithms and its application to increasingly complex data structures. Multiple Imputation and its Application: Discusses the issues ...

  13. Data driven estimation of imputation error-a strategy for imputation with a reject option

    DEFF Research Database (Denmark)

    Bak, Nikolaj; Hansen, Lars Kai

    2016-01-01

    indiscriminately. We note that the effects of imputation can be strongly dependent on what is missing. To help make decisions about which records should be imputed, we propose to use a machine learning approach to estimate the imputation error for each case with missing data. The method is thought...... to be a practical approach to help users using imputation after the informed choice to impute the missing data has been made. To do this all patterns of missing values are simulated in all complete cases, enabling calculation of the "true error" in each of these new cases. The error is then estimated for each case...... with missing values by weighing the "true errors" by similarity. The method can also be used to test the performance of different imputation methods. A universal numerical threshold of acceptable error cannot be set since this will differ according to the data, research question, and analysis method...

  14. A comparison of model-based imputation methods for handling missing predictor values in a linear regression model: A simulation study

    Science.gov (United States)

    Hasan, Haliza; Ahmad, Sanizah; Osman, Balkish Mohd; Sapri, Shamsiah; Othman, Nadirah

    2017-08-01

    In regression analysis, missing covariate data has been a common problem. Many researchers use ad hoc methods to overcome this problem due to the ease of implementation. However, these methods require assumptions about the data that rarely hold in practice. Model-based methods such as Maximum Likelihood (ML) using the expectation maximization (EM) algorithm and Multiple Imputation (MI) are more promising when dealing with difficulties caused by missing data. Then again, inappropriate methods of missing value imputation can lead to serious bias that severely affects the parameter estimates. The main objective of this study is to provide a better understanding regarding missing data concept that can assist the researcher to select the appropriate missing data imputation methods. A simulation study was performed to assess the effects of different missing data techniques on the performance of a regression model. The covariate data were generated using an underlying multivariate normal distribution and the dependent variable was generated as a combination of explanatory variables. Missing values in covariate were simulated using a mechanism called missing at random (MAR). Four levels of missingness (10%, 20%, 30% and 40%) were imposed. ML and MI techniques available within SAS software were investigated. A linear regression analysis was fitted and the model performance measures; MSE, and R-Squared were obtained. Results of the analysis showed that MI is superior in handling missing data with highest R-Squared and lowest MSE when percent of missingness is less than 30%. Both methods are unable to handle larger than 30% level of missingness.

  15. Meta-analysis of test accuracy studies using imputation for partial reporting of multiple thresholds.

    Science.gov (United States)

    Ensor, J; Deeks, J J; Martin, E C; Riley, R D

    2018-03-01

    For tests reporting continuous results, primary studies usually provide test performance at multiple but often different thresholds. This creates missing data when performing a meta-analysis at each threshold. A standard meta-analysis (no imputation [NI]) ignores such missing data. A single imputation (SI) approach was recently proposed to recover missing threshold results. Here, we propose a new method that performs multiple imputation of the missing threshold results using discrete combinations (MIDC). The new MIDC method imputes missing threshold results by randomly selecting from the set of all possible discrete combinations which lie between the results for 2 known bounding thresholds. Imputed and observed results are then synthesised at each threshold. This is repeated multiple times, and the multiple pooled results at each threshold are combined using Rubin's rules to give final estimates. We compared the NI, SI, and MIDC approaches via simulation. Both imputation methods outperform the NI method in simulations. There was generally little difference in the SI and MIDC methods, but the latter was noticeably better in terms of estimating the between-study variances and generally gave better coverage, due to slightly larger standard errors of pooled estimates. Given selective reporting of thresholds, the imputation methods also reduced bias in the summary receiver operating characteristic curve. Simulations demonstrate the imputation methods rely on an equal threshold spacing assumption. A real example is presented. The SI and, in particular, MIDC methods can be used to examine the impact of missing threshold results in meta-analysis of test accuracy studies. © 2017 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd.

  16. AN EFFECTIVE TECHNIQUE OF MULTIPLE IMPUTATION IN NONPARAMETRIC QUANTILE REGRESSION

    OpenAIRE

    Yanan Hu; Qianqian Zhu; Maozai Tian

    2014-01-01

    In this study, we consider the nonparametric quantile regression model with the covariates Missing at Random (MAR). Multiple imputation is becoming an increasingly popular approach for analyzing missing data, which combined with quantile regression is not well-developed. We propose an effective and accurate two-stage multiple imputation method for the model based on the quantile regression, which consists of initial imputation in the first stage and multiple imputation in the second stage. Th...

  17. Multiple Imputation of Multilevel Missing Data-Rigor versus Simplicity

    Science.gov (United States)

    Drechsler, Jörg

    2015-01-01

    Multiple imputation is widely accepted as the method of choice to address item-nonresponse in surveys. However, research on imputation strategies for the hierarchical structures that are typically found in the data in educational contexts is still limited. While a multilevel imputation model should be preferred from a theoretical point of view if…

  18. Alternative Multiple Imputation Inference for Mean and Covariance Structure Modeling

    Science.gov (United States)

    Lee, Taehun; Cai, Li

    2012-01-01

    Model-based multiple imputation has become an indispensable method in the educational and behavioral sciences. Mean and covariance structure models are often fitted to multiply imputed data sets. However, the presence of multiple random imputations complicates model fit testing, which is an important aspect of mean and covariance structure…

  19. Outcome-sensitive multiple imputation: a simulation study.

    Science.gov (United States)

    Kontopantelis, Evangelos; White, Ian R; Sperrin, Matthew; Buchan, Iain

    2017-01-09

    Multiple imputation is frequently used to deal with missing data in healthcare research. Although it is known that the outcome should be included in the imputation model when imputing missing covariate values, it is not known whether it should be imputed. Similarly no clear recommendations exist on: the utility of incorporating a secondary outcome, if available, in the imputation model; the level of protection offered when data are missing not-at-random; the implications of the dataset size and missingness levels. We used realistic assumptions to generate thousands of datasets across a broad spectrum of contexts: three mechanisms of missingness (completely at random; at random; not at random); varying extents of missingness (20-80% missing data); and different sample sizes (1,000 or 10,000 cases). For each context we quantified the performance of a complete case analysis and seven multiple imputation methods which deleted cases with missing outcome before imputation, after imputation or not at all; included or did not include the outcome in the imputation models; and included or did not include a secondary outcome in the imputation models. Methods were compared on mean absolute error, bias, coverage and power over 1,000 datasets for each scenario. Overall, there was very little to separate multiple imputation methods which included the outcome in the imputation model. Even when missingness was quite extensive, all multiple imputation approaches performed well. Incorporating a secondary outcome, moderately correlated with the outcome of interest, made very little difference. The dataset size and the extent of missingness affected performance, as expected. Multiple imputation methods protected less well against missingness not at random, but did offer some protection. As long as the outcome is included in the imputation model, there are very small performance differences between the possible multiple imputation approaches: no outcome imputation, imputation or

  20. Mapping wildland fuels and forest structure for land management: a comparison of nearest neighbor imputation and other methods

    Science.gov (United States)

    Kenneth B. Pierce; Janet L. Ohmann; Michael C. Wimberly; Matthew J. Gregory; Jeremy S. Fried

    2009-01-01

    Land managers need consistent information about the geographic distribution of wildland fuels and forest structure over large areas to evaluate fire risk and plan fuel treatments. We compared spatial predictions for 12 fuel and forest structure variables across three regions in the western United States using gradient nearest neighbor (GNN) imputation, linear models (...

  1. Model checking in multiple imputation: an overview and case study

    Directory of Open Access Journals (Sweden)

    Cattram D. Nguyen

    2017-08-01

    Full Text Available Abstract Background Multiple imputation has become very popular as a general-purpose method for handling missing data. The validity of multiple-imputation-based analyses relies on the use of an appropriate model to impute the missing values. Despite the widespread use of multiple imputation, there are few guidelines available for checking imputation models. Analysis In this paper, we provide an overview of currently available methods for checking imputation models. These include graphical checks and numerical summaries, as well as simulation-based methods such as posterior predictive checking. These model checking techniques are illustrated using an analysis affected by missing data from the Longitudinal Study of Australian Children. Conclusions As multiple imputation becomes further established as a standard approach for handling missing data, it will become increasingly important that researchers employ appropriate model checking approaches to ensure that reliable results are obtained when using this method.

  2. Model checking in multiple imputation: an overview and case study.

    Science.gov (United States)

    Nguyen, Cattram D; Carlin, John B; Lee, Katherine J

    2017-01-01

    Multiple imputation has become very popular as a general-purpose method for handling missing data. The validity of multiple-imputation-based analyses relies on the use of an appropriate model to impute the missing values. Despite the widespread use of multiple imputation, there are few guidelines available for checking imputation models. In this paper, we provide an overview of currently available methods for checking imputation models. These include graphical checks and numerical summaries, as well as simulation-based methods such as posterior predictive checking. These model checking techniques are illustrated using an analysis affected by missing data from the Longitudinal Study of Australian Children. As multiple imputation becomes further established as a standard approach for handling missing data, it will become increasingly important that researchers employ appropriate model checking approaches to ensure that reliable results are obtained when using this method.

  3. A multiple imputation method based on weighted quantile regression models for longitudinal censored biomarker data with missing values at early visits.

    Science.gov (United States)

    Lee, MinJae; Rahbar, Mohammad H; Brown, Matthew; Gensler, Lianne; Weisman, Michael; Diekman, Laura; Reveille, John D

    2018-01-11

    In patient-based studies, biomarker data are often subject to left censoring due to the detection limits, or to incomplete sample or data collection. In the context of longitudinal regression analysis, inappropriate handling of these issues could lead to biased parameter estimates. We developed a specific multiple imputation (MI) strategy based on weighted censored quantile regression (CQR) that not only accounts for censoring, but also missing data at early visits when longitudinal biomarker data are modeled as a covariate. We assessed through simulation studies the performances of developed imputation approach by considering various scenarios of covariance structures of longitudinal data and levels of censoring. We also illustrated the application of the proposed method to the Prospective Study of Outcomes in Ankylosing spondylitis (AS) (PSOAS) data to address the issues of censored or missing C-reactive protein (CRP) level at early visits for a group of patients. Our findings from simulation studies indicated that the proposed method performs better than other MI methods by having a higher relative efficiency. We also found that our approach is not sensitive to the choice of covariance structure as compared to other methods that assume normality of biomarker data. The analysis results of PSOAS data from the imputed CRP levels based on our method suggested that higher CRP is significantly associated with radiographic damage, while those from other methods did not result in a significant association. The MI based on weighted CQR offers a more valid statistical approach to evaluate a biomarker of disease in the presence of both issues with censoring and missing data in early visits.

  4. Multiple imputation in the presence of high-dimensional data.

    Science.gov (United States)

    Zhao, Yize; Long, Qi

    2016-10-01

    Missing data are frequently encountered in biomedical, epidemiologic and social research. It is well known that a naive analysis without adequate handling of missing data may lead to bias and/or loss of efficiency. Partly due to its ease of use, multiple imputation has become increasingly popular in practice for handling missing data. However, it is unclear what is the best strategy to conduct multiple imputation in the presence of high-dimensional data. To answer this question, we investigate several approaches of using regularized regression and Bayesian lasso regression to impute missing values in the presence of high-dimensional data. We compare the performance of these methods through numerical studies, in which we also evaluate the impact of the dimension of the data, the size of the true active set for imputation, and the strength of correlation. Our numerical studies show that in the presence of high-dimensional data the standard multiple imputation approach performs poorly and the imputation approach using Bayesian lasso regression achieves, in most cases, better performance than the other imputation methods including the standard imputation approach using the correctly specified imputation model. Our results suggest that Bayesian lasso regression and its extensions are better suited for multiple imputation in the presence of high-dimensional data than the other regression methods. © The Author(s) 2013.

  5. Nonlinear multiple imputation for continuous covariate within semiparametric Cox model: application to HIV data in Senegal.

    Science.gov (United States)

    Mbougua, Jules Brice Tchatchueng; Laurent, Christian; Ndoye, Ibra; Delaporte, Eric; Gwet, Henri; Molinari, Nicolas

    2013-11-20

    Multiple imputation is commonly used to impute missing covariate in Cox semiparametric regression setting. It is to fill each missing data with more plausible values, via a Gibbs sampling procedure, specifying an imputation model for each missing variable. This imputation method is implemented in several softwares that offer imputation models steered by the shape of the variable to be imputed, but all these imputation models make an assumption of linearity on covariates effect. However, this assumption is not often verified in practice as the covariates can have a nonlinear effect. Such a linear assumption can lead to a misleading conclusion because imputation model should be constructed to reflect the true distributional relationship between the missing values and the observed values. To estimate nonlinear effects of continuous time invariant covariates in imputation model, we propose a method based on B-splines function. To assess the performance of this method, we conducted a simulation study, where we compared the multiple imputation method using Bayesian splines imputation model with multiple imputation using Bayesian linear imputation model in survival analysis setting. We evaluated the proposed method on the motivated data set collected in HIV-infected patients enrolled in an observational cohort study in Senegal, which contains several incomplete variables. We found that our method performs well to estimate hazard ratio compared with the linear imputation methods, when data are missing completely at random, or missing at random. Copyright © 2013 John Wiley & Sons, Ltd.

  6. Posterior predictive checking of multiple imputation models.

    Science.gov (United States)

    Nguyen, Cattram D; Lee, Katherine J; Carlin, John B

    2015-07-01

    Multiple imputation is gaining popularity as a strategy for handling missing data, but there is a scarcity of tools for checking imputation models, a critical step in model fitting. Posterior predictive checking (PPC) has been recommended as an imputation diagnostic. PPC involves simulating "replicated" data from the posterior predictive distribution of the model under scrutiny. Model fit is assessed by examining whether the analysis from the observed data appears typical of results obtained from the replicates produced by the model. A proposed diagnostic measure is the posterior predictive "p-value", an extreme value of which (i.e., a value close to 0 or 1) suggests a misfit between the model and the data. The aim of this study was to evaluate the performance of the posterior predictive p-value as an imputation diagnostic. Using simulation methods, we deliberately misspecified imputation models to determine whether posterior predictive p-values were effective in identifying these problems. When estimating the regression parameter of interest, we found that more extreme p-values were associated with poorer imputation model performance, although the results highlighted that traditional thresholds for classical p-values do not apply in this context. A shortcoming of the PPC method was its reduced ability to detect misspecified models with increasing amounts of missing data. Despite the limitations of posterior predictive p-values, they appear to have a valuable place in the imputer's toolkit. In addition to automated checking using p-values, we recommend imputers perform graphical checks and examine other summaries of the test quantity distribution. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation

    Science.gov (United States)

    Palmer, Cameron; Pe’er, Itsik

    2016-01-01

    Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data. PMID:27310603

  8. Comparison of HLA allelic imputation programs.

    Directory of Open Access Journals (Sweden)

    Jason H Karnes

    Full Text Available Imputation of human leukocyte antigen (HLA alleles from SNP-level data is attractive due to importance of HLA alleles in human disease, widespread availability of genome-wide association study (GWAS data, and expertise required for HLA sequencing. However, comprehensive evaluations of HLA imputations programs are limited. We compared HLA imputation results of HIBAG, SNP2HLA, and HLA*IMP:02 to sequenced HLA alleles in 3,265 samples from BioVU, a de-identified electronic health record database coupled to a DNA biorepository. We performed four-digit HLA sequencing for HLA-A, -B, -C, -DRB1, -DPB1, and -DQB1 using long-read 454 FLX sequencing. All samples were genotyped using both the Illumina HumanExome BeadChip platform and a GWAS platform. Call rates and concordance rates were compared by platform, frequency of allele, and race/ethnicity. Overall concordance rates were similar between programs in European Americans (EA (0.975 [SNP2HLA]; 0.939 [HLA*IMP:02]; 0.976 [HIBAG]. SNP2HLA provided a significant advantage in terms of call rate and the number of alleles imputed. Concordance rates were lower overall for African Americans (AAs. These observations were consistent when accuracy was compared across HLA loci. All imputation programs performed similarly for low frequency HLA alleles. Higher concordance rates were observed when HLA alleles were imputed from GWAS platforms versus the HumanExome BeadChip, suggesting that high genomic coverage is preferred as input for HLA allelic imputation. These findings provide guidance on the best use of HLA imputation methods and elucidate their limitations.

  9. Missing data and multiple imputation in clinical epidemiological research.

    Science.gov (United States)

    Pedersen, Alma B; Mikkelsen, Ellen M; Cronin-Fenton, Deirdre; Kristensen, Nickolaj R; Pham, Tra My; Pedersen, Lars; Petersen, Irene

    2017-01-01

    Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data.

  10. Multiple imputation: dealing with missing data.

    Science.gov (United States)

    de Goeij, Moniek C M; van Diepen, Merel; Jager, Kitty J; Tripepi, Giovanni; Zoccali, Carmine; Dekker, Friedo W

    2013-10-01

    In many fields, including the field of nephrology, missing data are unfortunately an unavoidable problem in clinical/epidemiological research. The most common methods for dealing with missing data are complete case analysis-excluding patients with missing data--mean substitution--replacing missing values of a variable with the average of known values for that variable-and last observation carried forward. However, these methods have severe drawbacks potentially resulting in biased estimates and/or standard errors. In recent years, a new method has arisen for dealing with missing data called multiple imputation. This method predicts missing values based on other data present in the same patient. This procedure is repeated several times, resulting in multiple imputed data sets. Thereafter, estimates and standard errors are calculated in each imputation set and pooled into one overall estimate and standard error. The main advantage of this method is that missing data uncertainty is taken into account. Another advantage is that the method of multiple imputation gives unbiased results when data are missing at random, which is the most common type of missing data in clinical practice, whereas conventional methods do not. However, the method of multiple imputation has scarcely been used in medical literature. We, therefore, encourage authors to do so in the future when possible.

  11. Missing data and multiple imputation in clinical epidemiological research

    Directory of Open Access Journals (Sweden)

    Pedersen AB

    2017-03-01

    Full Text Available Alma B Pedersen,1 Ellen M Mikkelsen,1 Deirdre Cronin-Fenton,1 Nickolaj R Kristensen,1 Tra My Pham,2 Lars Pedersen,1 Irene Petersen1,2 1Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus N, Denmark; 2Department of Primary Care and Population Health, University College London, London, UK Abstract: Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR, missing at random (MAR, and missing not at random (MNAR. In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data. Keywords: missing data, observational study, multiple imputation, MAR, MCAR, MNAR

  12. Public Undertakings and Imputability

    DEFF Research Database (Denmark)

    Ølykke, Grith Skovgaard

    2013-01-01

    exercised by the State, imputability to the State, and the State’s fulfilment of the Market Economy Investor Principle. Furthermore, it is examined whether, in the absence of imputability, public undertakings’ market behaviour is subject to the Market Economy Investor Principle, and it is concluded...... that this is not the case. Lastly, it is discussed whether other legal instruments, namely competition law, public procurement law, or the Transparency Directive, regulate public undertakings’ market behaviour. It is found that those rules are not sufficient to mend the gap created by the imputability requirement. Legal......In this article, the issue of impuability to the State of public undertakings’ decision-making is analysed and discussed in the context of the DSBFirst case. DSBFirst is owned by the independent public undertaking DSB and the private undertaking FirstGroup plc and won the contracts in the 2008...

  13. Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.

    Science.gov (United States)

    Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A

    2016-01-01

    Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.

  14. Random Forest as an Imputation Method for Education and Psychology Research: Its Impact on Item Fit and Difficulty of the Rasch Model

    Science.gov (United States)

    Golino, Hudson F.; Gomes, Cristiano M. A.

    2016-01-01

    This paper presents a non-parametric imputation technique, named random forest, from the machine learning field. The random forest procedure has two main tuning parameters: the number of trees grown in the prediction and the number of predictors used. Fifty experimental conditions were created in the imputation procedure, with different…

  15. Prostate cancer: net survival and cause-specific survival rates after multiple imputation

    OpenAIRE

    Morisot, Adeline; Bessaoud, Fa?za; Landais, Paul; R?billard, Xavier; Tr?tarre, Brigitte; Daur?s, Jean-Pierre

    2015-01-01

    Background Estimations of survival rates are diverse and the choice of the appropriate method depends on the context. Given the increasing interest in multiple imputation methods, we explored the interest of a multiple imputation approach in the estimation of cause-specific survival, when a subset of causes of death was observed. Methods By using European Randomized Study of Screening for Prostate Cancer (ERSPC), 20 multiply imputed datasets were created and analyzed with a Multivariate Imput...

  16. The Relative Impacts of Design Effects and Multiple Imputation on Variance Estimates: A Case Study with the 2008 National Ambulatory Medical Care Survey

    Directory of Open Access Journals (Sweden)

    Lewis Taylor

    2014-03-01

    Full Text Available The National Ambulatory Medical Care Survey collects data on office-based physician care from a nationally representative, multistage sampling scheme where the ultimate unit of analysis is a patient-doctor encounter. Patient race, a commonly analyzed demographic, has been subject to a steadily increasing item nonresponse rate. In 1999, race was missing for 17 percent of cases; by 2008, that figure had risen to 33 percent. Over this entire period, single imputation has been the compensation method employed. Recent research at the National Center for Health Statistics evaluated multiply imputing race to better represent the missing-data uncertainty. Given item nonresponse rates of 30 percent or greater, we were surprised to find many estimates’ ratios of multiple-imputation to single-imputation estimated standard errors close to 1. A likely explanation is that the design effects attributable to the complex sample design largely outweigh any increase in variance attributable to missing-data uncertainty.

  17. Missing value imputation for epistatic MAPs

    LENUS (Irish Health Repository)

    Ryan, Colm

    2010-04-20

    Abstract Background Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data. Results We identify different categories for the missing data based on their underlying cause, and show that values from the largest category can be imputed effectively. We compare local and global imputation approaches across a variety of distinct E-MAP datasets, showing that both are competitive and preferable to filling in with zeros. In addition we show that these methods are effective in an E-MAP from a different species, suggesting that pairwise imputation techniques will be increasingly useful as analogous epistasis mapping techniques are developed in different species. We show that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions. Finally we show that imputed interactions, generated using nearest neighbor methods, are enriched for annotations in the same manner as measured interactions. Therefore our method potentially

  18. ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

    Directory of Open Access Journals (Sweden)

    Kamatani Naoyuki

    2011-05-01

    Full Text Available Abstract Background Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required. Results We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo. Conclusions ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.

  19. Estimating the accuracy of geographical imputation

    Directory of Open Access Journals (Sweden)

    Boscoe Francis P

    2008-01-01

    Full Text Available Abstract Background To reduce the number of non-geocoded cases researchers and organizations sometimes include cases geocoded to postal code centroids along with cases geocoded with the greater precision of a full street address. Some analysts then use the postal code to assign information to the cases from finer-level geographies such as a census tract. Assignment is commonly completed using either a postal centroid or by a geographical imputation method which assigns a location by using both the demographic characteristics of the case and the population characteristics of the postal delivery area. To date no systematic evaluation of geographical imputation methods ("geo-imputation" has been completed. The objective of this study was to determine the accuracy of census tract assignment using geo-imputation. Methods Using a large dataset of breast, prostate and colorectal cancer cases reported to the New Jersey Cancer Registry, we determined how often cases were assigned to the correct census tract using alternate strategies of demographic based geo-imputation, and using assignments obtained from postal code centroids. Assignment accuracy was measured by comparing the tract assigned with the tract originally identified from the full street address. Results Assigning cases to census tracts using the race/ethnicity population distribution within a postal code resulted in more correctly assigned cases than when using postal code centroids. The addition of age characteristics increased the match rates even further. Match rates were highly dependent on both the geographic distribution of race/ethnicity groups and population density. Conclusion Geo-imputation appears to offer some advantages and no serious drawbacks as compared with the alternative of assigning cases to census tracts based on postal code centroids. For a specific analysis, researchers will still need to consider the potential impact of geocoding quality on their results and evaluate

  20. Multiple Imputation in Three or More Stages.

    Science.gov (United States)

    McGinniss, J; Harel, O

    2016-09-01

    Missing values present challenges in the analysis of data across many areas of research. Handling incomplete data incorrectly can lead to bias, over-confident intervals, and inaccurate inferences. One principled method of handling incomplete data is multiple imputation. This article considers incomplete data in which values are missing for three or more qualitatively different reasons and applies a modified multiple imputation framework in the analysis of that data. Included are a proof of the methodology used for three-stage multiple imputation with its limiting distribution, an extension to more than three types of missing values, an extension to the ignorability assumption with proof, and simulations demonstrating that the estimator is unbiased and efficient under the ignorability assumption.

  1. Multiple imputation for non-response when estimating HIV prevalence using survey data.

    Science.gov (United States)

    Chinomona, Amos; Mwambi, Henry

    2015-10-16

    Missing data are a common feature in many areas of research especially those involving survey data in biological, health and social sciences research. Most of the analyses of the survey data are done taking a complete-case approach, that is taking a list-wise deletion of all cases with missing values assuming that missing values are missing completely at random (MCAR). Methods that are based on substituting the missing values with single values such as the last value carried forward, the mean and regression predictions (single imputations) are also used. These methods often result in potential bias in estimates, in loss of statistical information and in loss of distributional relationships between variables. In addition, the strong MCAR assumption is not tenable in most practical instances. Since missing data are a major problem in HIV research, the current research seeks to illustrate and highlight the strength of multiple imputation procedure, as a method of handling missing data, which comes from its ability to draw multiple values for the missing observations from plausible predictive distributions for them. This is particularly important in HIV research in sub-Saharan Africa where accurate collection of (complete) data is still a challenge. Furthermore the multiple imputation accounts for the uncertainty introduced by the very process of imputing values for the missing observations. In particular national and subgroup estimates of HIV prevalence in Zimbabwe were computed using multiply imputed data sets from the 2010-11 Zimbabwe Demographic and Health Surveys (2010-11 ZDHS) data. A survey logistic regression model for HIV prevalence and demographic and socio-economic variables was used as the substantive analysis model. The results for both the complete-case analysis and the multiple imputation analysis are presented and discussed. Across different subgroups of the population, the crude estimates of HIV prevalence are generally not identical but their

  2. How to Improve Postgenomic Knowledge Discovery Using Imputation

    Directory of Open Access Journals (Sweden)

    2009-02-01

    Full Text Available While microarrays make it feasible to rapidly investigate many complex biological problems, their multistep fabrication has the proclivity for error at every stage. The standard tactic has been to either ignore or regard erroneous gene readings as missing values, though this assumption can exert a major influence upon postgenomic knowledge discovery methods like gene selection and gene regulatory network (GRN reconstruction. This has been the catalyst for a raft of new flexible imputation algorithms including local least square impute and the recent heuristic collateral missing value imputation, which exploit the biological transactional behaviour of functionally correlated genes to afford accurate missing value estimation. This paper examines the influence of missing value imputation techniques upon postgenomic knowledge inference methods with results for various algorithms consistently corroborating that instead of ignoring missing values, recycling microarray data by flexible and robust imputation can provide substantial performance benefits for subsequent downstream procedures.

  3. How to Improve Postgenomic Knowledge Discovery Using Imputation

    Directory of Open Access Journals (Sweden)

    Coppel Ross

    2009-01-01

    Full Text Available While microarrays make it feasible to rapidly investigate many complex biological problems, their multistep fabrication has the proclivity for error at every stage. The standard tactic has been to either ignore or regard erroneous gene readings as missing values, though this assumption can exert a major influence upon postgenomic knowledge discovery methods like gene selection and gene regulatory network (GRN reconstruction. This has been the catalyst for a raft of new flexible imputation algorithms including local least square impute and the recent heuristic collateral missing value imputation, which exploit the biological transactional behaviour of functionally correlated genes to afford accurate missing value estimation. This paper examines the influence of missing value imputation techniques upon postgenomic knowledge inference methods with results for various algorithms consistently corroborating that instead of ignoring missing values, recycling microarray data by flexible and robust imputation can provide substantial performance benefits for subsequent downstream procedures.

  4. Multiple Imputation of Missing Composite Outcomes in Longitudinal Data.

    Science.gov (United States)

    O'Keeffe, Aidan G; Farewell, Daniel M; Tom, Brian D M; Farewell, Vernon T

    2016-01-01

    In longitudinal randomised trials and observational studies within a medical context, a composite outcome-which is a function of several individual patient-specific outcomes-may be felt to best represent the outcome of interest. As in other contexts, missing data on patient outcome, due to patient drop-out or for other reasons, may pose a problem. Multiple imputation is a widely used method for handling missing data, but its use for composite outcomes has been seldom discussed. Whilst standard multiple imputation methodology can be used directly for the composite outcome, the distribution of a composite outcome may be of a complicated form and perhaps not amenable to statistical modelling. We compare direct multiple imputation of a composite outcome with separate imputation of the components of a composite outcome. We consider two imputation approaches. One approach involves modelling each component of a composite outcome using standard likelihood-based models. The other approach is to use linear increments methods. A linear increments approach can provide an appealing alternative as assumptions concerning both the missingness structure within the data and the imputation models are different from the standard likelihood-based approach. We compare both approaches using simulation studies and data from a randomised trial on early rheumatoid arthritis patients. Results suggest that both approaches are comparable and that for each, separate imputation offers some improvement on the direct imputation of a composite outcome.

  5. Multiple Imputation of Item Scores in Test and Questionnaire Data, and Influence on Psychometric Results

    Science.gov (United States)

    van Ginkel, Joost R.; van der Ark, L. Andries; Sijtsma, Klaas

    2007-01-01

    The performance of five simple multiple imputation methods for dealing with missing data were compared. In addition, random imputation and multivariate normal imputation were used as lower and upper benchmark, respectively. Test data were simulated and item scores were deleted such that they were either missing completely at random, missing at…

  6. A note on the relationships between multiple imputation, maximum likelihood and fully Bayesian methods for missing responses in linear regression models.

    Science.gov (United States)

    Chen, Qingxia; Ibrahim, Joseph G

    2014-07-01

    Multiple Imputation, Maximum Likelihood and Fully Bayesian methods are the three most commonly used model-based approaches in missing data problems. Although it is easy to show that when the responses are missing at random (MAR), the complete case analysis is unbiased and efficient, the aforementioned methods are still commonly used in practice for this setting. To examine the performance of and relationships between these three methods in this setting, we derive and investigate small sample and asymptotic expressions of the estimates and standard errors, and fully examine how these estimates are related for the three approaches in the linear regression model when the responses are MAR. We show that when the responses are MAR in the linear model, the estimates of the regression coefficients using these three methods are asymptotically equivalent to the complete case estimates under general conditions. One simulation and a real data set from a liver cancer clinical trial are given to compare the properties of these methods when the responses are MAR.

  7. Cost reduction for web-based data imputation

    KAUST Repository

    Li, Zhixu

    2014-01-01

    Web-based Data Imputation enables the completion of incomplete data sets by retrieving absent field values from the Web. In particular, complete fields can be used as keywords in imputation queries for absent fields. However, due to the ambiguity of these keywords and the data complexity on the Web, different queries may retrieve different answers to the same absent field value. To decide the most probable right answer to each absent filed value, existing method issues quite a few available imputation queries for each absent value, and then vote on deciding the most probable right answer. As a result, we have to issue a large number of imputation queries for filling all absent values in an incomplete data set, which brings a large overhead. In this paper, we work on reducing the cost of Web-based Data Imputation in two aspects: First, we propose a query execution scheme which can secure the most probable right answer to an absent field value by issuing as few imputation queries as possible. Second, we recognize and prune queries that probably will fail to return any answers a priori. Our extensive experimental evaluation shows that our proposed techniques substantially reduce the cost of Web-based Imputation without hurting its high imputation accuracy. © 2014 Springer International Publishing Switzerland.

  8. Use of imputed population-based cancer registry data as a method of accounting for missing information: application to estrogen receptor status for breast cancer.

    Science.gov (United States)

    Howlader, Nadia; Noone, Anne-Michelle; Yu, Mandi; Cronin, Kathleen A

    2012-08-15

    The National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program provides a rich source of data stratified according to tumor biomarkers that play an important role in cancer surveillance research. These data are useful for analyzing trends in cancer incidence and survival. These tumor markers, however, are often prone to missing observations. To address the problem of missing data, the authors employed sequential regression multivariate imputation for breast cancer variables, with a particular focus on estrogen receptor status, using data from 13 SEER registries covering the period 1992-2007. In this paper, they present an approach to accounting for missing information through the creation of imputed data sets that can be analyzed using existing software (e.g., SEER*Stat) developed for analyzing cancer registry data. Bias in age-adjusted trends in female breast cancer incidence is shown graphically before and after imputation of estrogen receptor status, stratified by age and race. The imputed data set will be made available in SEER*Stat (http://seer.cancer.gov/analysis/index.html) to facilitate accurate estimation of breast cancer incidence trends. To ensure that the imputed data set is used correctly, the authors provide detailed, step-by-step instructions for conducting analyses. This is the first time that a nationally representative, population-based cancer registry data set has been imputed and made available to researchers for conducting a variety of analyses of breast cancer incidence trends.

  9. Multiple imputation for handling missing outcome data when estimating the relative risk.

    Science.gov (United States)

    Sullivan, Thomas R; Lee, Katherine J; Ryan, Philip; Salter, Amy B

    2017-09-06

    Multiple imputation is a popular approach to handling missing data in medical research, yet little is known about its applicability for estimating the relative risk. Standard methods for imputing incomplete binary outcomes involve logistic regression or an assumption of multivariate normality, whereas relative risks are typically estimated using log binomial models. It is unclear whether misspecification of the imputation model in this setting could lead to biased parameter estimates. Using simulated data, we evaluated the performance of multiple imputation for handling missing data prior to estimating adjusted relative risks from a correctly specified multivariable log binomial model. We considered an arbitrary pattern of missing data in both outcome and exposure variables, with missing data induced under missing at random mechanisms. Focusing on standard model-based methods of multiple imputation, missing data were imputed using multivariate normal imputation or fully conditional specification with a logistic imputation model for the outcome. Multivariate normal imputation performed poorly in the simulation study, consistently producing estimates of the relative risk that were biased towards the null. Despite outperforming multivariate normal imputation, fully conditional specification also produced somewhat biased estimates, with greater bias observed for higher outcome prevalences and larger relative risks. Deleting imputed outcomes from analysis datasets did not improve the performance of fully conditional specification. Both multivariate normal imputation and fully conditional specification produced biased estimates of the relative risk, presumably since both use a misspecified imputation model. Based on simulation results, we recommend researchers use fully conditional specification rather than multivariate normal imputation and retain imputed outcomes in the analysis when estimating relative risks. However fully conditional specification is not without its

  10. Application of an imputation method for geospatial inventory of forest structural attributes across multiple spatial scales in the Lake States, U.S.A

    Science.gov (United States)

    Deo, Ram K.

    Credible spatial information characterizing the structure and site quality of forests is critical to sustainable forest management and planning, especially given the increasing demands and threats to forest products and services. Forest managers and planners are required to evaluate forest conditions over a broad range of scales, contingent on operational or reporting requirements. Traditionally, forest inventory estimates are generated via a design-based approach that involves generalizing sample plot measurements to characterize an unknown population across a larger area of interest. However, field plot measurements are costly and as a consequence spatial coverage is limited. Remote sensing technologies have shown remarkable success in augmenting limited sample plot data to generate stand- and landscape-level spatial predictions of forest inventory attributes. Further enhancement of forest inventory approaches that couple field measurements with cutting edge remotely sensed and geospatial datasets are essential to sustainable forest management. We evaluated a novel Random Forest based k Nearest Neighbors (RF-kNN) imputation approach to couple remote sensing and geospatial data with field inventory collected by different sampling methods to generate forest inventory information across large spatial extents. The forest inventory data collected by the FIA program of US Forest Service was integrated with optical remote sensing and other geospatial datasets to produce biomass distribution maps for a part of the Lake States and species-specific site index maps for the entire Lake State. Targeting small-area application of the state-of-art remote sensing, LiDAR (light detection and ranging) data was integrated with the field data collected by an inexpensive method, called variable plot sampling, in the Ford Forest of Michigan Tech to derive standing volume map in a cost-effective way. The outputs of the RF-kNN imputation were compared with independent validation

  11. [Jurisdiction and imputability].

    Science.gov (United States)

    Tapiador Sanjuán, M J

    2004-12-01

    Validity, efficacy and responsibility of acts depend on the intelligence and will of the acting subject; therefore when they are reduced or debilitated, these acts may be declared as non-valid and the author, not-responsible for the acts. Some neurological pathologies may generate physical and/or psychic permanent deficiencies, which prevent subjects from acting on their own. For these cases, the law establishes the incapacity state, in order to protect the disabled and complete the reduced ability, guaranteeing their rights and security. The disabled state will be determined by a legal sentence, which states the lack of ability to manage. In that sentence extension and limits of the disability will be determined; disability level will be proportional to the insight degree.Similarly, a subject suffering a pathological condition that invalidates his/her will and intelligence will be considered non-responsible and not imputable, since there is no culpability ability. The Penal Code establishes the criteria that will determine the possibility of imputability or its absence, as well as modifying circumstances.

  12. Multiple imputation analysis of case-cohort studies.

    Science.gov (United States)

    Marti, Helena; Chavance, Michel

    2011-06-15

    The usual methods for analyzing case-cohort studies rely on sometimes not fully efficient weighted estimators. Multiple imputation might be a good alternative because it uses all the data available and approximates the maximum partial likelihood estimator. This method is based on the generation of several plausible complete data sets, taking into account uncertainty about missing values. When the imputation model is correctly defined, the multiple imputation estimator is asymptotically unbiased and its variance is correctly estimated. We show that a correct imputation model must be estimated from the fully observed data (cases and controls), using the case status among the explanatory variable. To validate the approach, we analyzed case-cohort studies first with completely simulated data and then with case-cohort data sampled from two real cohorts. The analyses of simulated data showed that, when the imputation model was correct, the multiple imputation estimator was unbiased and efficient. The observed gain in precision ranged from 8 to 37 per cent for phase-1 variables and from 5 to 19 per cent for the phase-2 variable. When the imputation model was misspecified, the multiple imputation estimator was still more efficient than the weighted estimators but it was also slightly biased. The analyses of case-cohort data sampled from complete cohorts showed that even when no strong predictor of the phase-2 variable was available, the multiple imputation was unbiased, as precised as the weighted estimator for the phase-2 variable and slightly more precise than the weighted estimators for the phase-1 variables. However, the multiple imputation estimator was found to be biased when, because of interaction terms, some coefficients of the imputation model had to be estimated from small samples. Multiple imputation is an efficient technique for analyzing case-cohort data. Practically, we suggest building the analysis model using only the case-cohort data and weighted

  13. Multiple Imputation of Predictor Variables Using Generalized Additive Models

    NARCIS (Netherlands)

    de Jong, Roel; van Buuren, Stef; Spiess, Martin

    2016-01-01

    The sensitivity of multiple imputation methods to deviations from their distributional assumptions is investigated using simulations, where the parameters of scientific interest are the coefficients of a linear regression model, and values in predictor variables are missing at random. The

  14. A comparison of selected parametric and non-parametric imputation methods for estimating forest biomass and basal area

    Science.gov (United States)

    Donald Gagliasso; Susan Hummel; Hailemariam. Temesgen

    2014-01-01

    Various methods have been used to estimate the amount of above ground forest biomass across landscapes and to create biomass maps for specific stands or pixels across ownership or project areas. Without an accurate estimation method, land managers might end up with incorrect biomass estimate maps, which could lead them to make poorer decisions in their future...

  15. Multiple imputation of cognitive performance as a repeatedly measured outcome.

    Science.gov (United States)

    Rawlings, Andreea Monica; Sang, Yingying; Sharrett, Albert Richey; Coresh, Josef; Griswold, Michael; Kucharska-Newton, Anna Maria; Palta, Priya; Wruck, Lisa Miller; Gross, Alden Lawrence; Deal, Jennifer Anne; Power, Melinda Carolyn; Bandeen-Roche, Karen Jean

    2017-01-01

    Longitudinal studies of cognitive performance are sensitive to dropout, as participants experiencing cognitive deficits are less likely to attend study visits, which may bias estimated associations between exposures of interest and cognitive decline. Multiple imputation is a powerful tool for handling missing data, however its use for missing cognitive outcome measures in longitudinal analyses remains limited. We use multiple imputation by chained equations (MICE) to impute cognitive performance scores of participants who did not attend the 2011-2013 exam of the Atherosclerosis Risk in Communities Study. We examined the validity of imputed scores using observed and simulated data under varying assumptions. We examined differences in the estimated association between diabetes at baseline and 20-year cognitive decline with and without imputed values. Lastly, we discuss how different analytic methods (mixed models and models fit using generalized estimate equations) and choice of for whom to impute result in different estimands. Validation using observed data showed MICE produced unbiased imputations. Simulations showed a substantial reduction in the bias of the 20-year association between diabetes and cognitive decline comparing MICE (3-4 % bias) to analyses of available data only (16-23 % bias) in a construct where missingness was strongly informative but realistic. Associations between diabetes and 20-year cognitive decline were substantially stronger with MICE than in available-case analyses. Our study suggests when informative data are available for non-examined participants, MICE can be an effective tool for imputing cognitive performance and improving assessment of cognitive decline, though careful thought should be given to target imputation population and analytic model chosen, as they may yield different estimands.

  16. A stochastic multiple imputation algorithm for missing covariate data in tree-structured survival analysis.

    Science.gov (United States)

    Wallace, Meredith L; Anderson, Stewart J; Mazumdar, Sati

    2010-12-20

    Missing covariate data present a challenge to tree-structured methodology due to the fact that a single tree model, as opposed to an estimated parameter value, may be desired for use in a clinical setting. To address this problem, we suggest a multiple imputation algorithm that adds draws of stochastic error to a tree-based single imputation method presented by Conversano and Siciliano (Technical Report, University of Naples, 2003). Unlike previously proposed techniques for accommodating missing covariate data in tree-structured analyses, our methodology allows the modeling of complex and nonlinear covariate structures while still resulting in a single tree model. We perform a simulation study to evaluate our stochastic multiple imputation algorithm when covariate data are missing at random and compare it to other currently used methods. Our algorithm is advantageous for identifying the true underlying covariate structure when complex data and larger percentages of missing covariate observations are present. It is competitive with other current methods with respect to prediction accuracy. To illustrate our algorithm, we create a tree-structured survival model for predicting time to treatment response in older, depressed adults. Copyright © 2010 John Wiley & Sons, Ltd.

  17. Restrictive Imputation of Incomplete Survey Data

    NARCIS (Netherlands)

    Vink, G.

    2015-01-01

    This dissertation focuses on finding plausible imputations when there is some restriction posed on the imputation model. In these restrictive situations, current imputation methodology does not lead to satisfactory imputations. The restrictions, and the resulting missing data problems are real-life

  18. Recovery of information from multiple imputation: a simulation study.

    Science.gov (United States)

    Lee, Katherine J; Carlin, John B

    2012-06-13

    complete case analysis if the imputation model is not appropriate. Epidemiologists dealing with missing data should keep in mind the potential limitations as well as the potential benefits of multiple imputation. Further work is needed to provide clearer guidelines on effective application of this method.

  19. Clustering with Missing Values: No Imputation Required

    Science.gov (United States)

    Wagstaff, Kiri

    2004-01-01

    Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.

  20. Bootstrap inference when using multiple imputation.

    Science.gov (United States)

    Schomaker, Michael; Heumann, Christian

    2018-04-16

    Many modern estimators require bootstrapping to calculate confidence intervals because either no analytic standard error is available or the distribution of the parameter of interest is nonsymmetric. It remains however unclear how to obtain valid bootstrap inference when dealing with multiple imputation to address missing data. We present 4 methods that are intuitively appealing, easy to implement, and combine bootstrap estimation with multiple imputation. We show that 3 of the 4 approaches yield valid inference, but that the performance of the methods varies with respect to the number of imputed data sets and the extent of missingness. Simulation studies reveal the behavior of our approaches in finite samples. A topical analysis from HIV treatment research, which determines the optimal timing of antiretroviral treatment initiation in young children, demonstrates the practical implications of the 4 methods in a sophisticated and realistic setting. This analysis suffers from missing data and uses the g-formula for inference, a method for which no standard errors are available. Copyright © 2018 John Wiley & Sons, Ltd.

  1. A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.

    Science.gov (United States)

    De Silva, Anurika Priyanjali; Moreno-Betancur, Margarita; De Livera, Alysha Madhu; Lee, Katherine Jane; Simpson, Julie Anne

    2017-07-25

    Missing data is a common problem in epidemiological studies, and is particularly prominent in longitudinal data, which involve multiple waves of data collection. Traditional multiple imputation (MI) methods (fully conditional specification (FCS) and multivariate normal imputation (MVNI)) treat repeated measurements of the same time-dependent variable as just another 'distinct' variable for imputation and therefore do not make the most of the longitudinal structure of the data. Only a few studies have explored extensions to the standard approaches to account for the temporal structure of longitudinal data. One suggestion is the two-fold fully conditional specification (two-fold FCS) algorithm, which restricts the imputation of a time-dependent variable to time blocks where the imputation model includes measurements taken at the specified and adjacent times. To date, no study has investigated the performance of two-fold FCS and standard MI methods for handling missing data in a time-varying covariate with a non-linear trajectory over time - a commonly encountered scenario in epidemiological studies. We simulated 1000 datasets of 5000 individuals based on the Longitudinal Study of Australian Children (LSAC). Three missing data mechanisms: missing completely at random (MCAR), and a weak and a strong missing at random (MAR) scenarios were used to impose missingness on body mass index (BMI) for age z-scores; a continuous time-varying exposure variable with a non-linear trajectory over time. We evaluated the performance of FCS, MVNI, and two-fold FCS for handling up to 50% of missing data when assessing the association between childhood obesity and sleep problems. The standard two-fold FCS produced slightly more biased and less precise estimates than FCS and MVNI. We observed slight improvements in bias and precision when using a time window width of two for the two-fold FCS algorithm compared to the standard width of one. We recommend the use of FCS or MVNI in a similar

  2. A SPATIOTEMPORAL APPROACH FOR HIGH RESOLUTION TRAFFIC FLOW IMPUTATION

    Energy Technology Data Exchange (ETDEWEB)

    Han, Lee [University of Tennessee, Knoxville (UTK); Chin, Shih-Miao [ORNL; Hwang, Ho-Ling [ORNL

    2016-01-01

    Along with the rapid development of Intelligent Transportation Systems (ITS), traffic data collection technologies have been evolving dramatically. The emergence of innovative data collection technologies such as Remote Traffic Microwave Sensor (RTMS), Bluetooth sensor, GPS-based Floating Car method, automated license plate recognition (ALPR) (1), etc., creates an explosion of traffic data, which brings transportation engineering into the new era of Big Data. However, despite the advance of technologies, the missing data issue is still inevitable and has posed great challenges for research such as traffic forecasting, real-time incident detection and management, dynamic route guidance, and massive evacuation optimization, because the degree of success of these endeavors depends on the timely availability of relatively complete and reasonably accurate traffic data. A thorough literature review suggests most current imputation models, if not all, focus largely on the temporal nature of the traffic data and fail to consider the fact that traffic stream characteristics at a certain location are closely related to those at neighboring locations and utilize these correlations for data imputation. To this end, this paper presents a Kriging based spatiotemporal data imputation approach that is able to fully utilize the spatiotemporal information underlying in traffic data. Imputation performance of the proposed approach was tested using simulated scenarios and achieved stable imputation accuracy. Moreover, the proposed Kriging imputation model is more flexible compared to current models.

  3. A web-based approach to data imputation

    KAUST Repository

    Li, Zhixu

    2013-10-24

    In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Moreover, several optimization techniques are also proposed to reduce the cost of estimating the confidence of imputation queries at both the tuple-level and the database-level. Experiments based on several real-world data collections demonstrate not only the effectiveness of WebPut compared to existing approaches, but also the efficiency of our proposed algorithms and optimization techniques. © 2013 Springer Science+Business Media New York.

  4. Fitting additive hazards models for case-cohort studies: a multiple imputation approach.

    Science.gov (United States)

    Jung, Jinhyouk; Harel, Ofer; Kang, Sangwook

    2016-07-30

    In this paper, we consider fitting semiparametric additive hazards models for case-cohort studies using a multiple imputation approach. In a case-cohort study, main exposure variables are measured only on some selected subjects, but other covariates are often available for the whole cohort. We consider this as a special case of a missing covariate by design. We propose to employ a popular incomplete data method, multiple imputation, for estimation of the regression parameters in additive hazards models. For imputation models, an imputation modeling procedure based on a rejection sampling is developed. A simple imputation modeling that can naturally be applied to a general missing-at-random situation is also considered and compared with the rejection sampling method via extensive simulation studies. In addition, a misspecification aspect in imputation modeling is investigated. The proposed procedures are illustrated using a cancer data example. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  5. Implications of Missing Data Imputation for Agricultural Household Surveys: An Application to Technology Adoption

    OpenAIRE

    Gedikoglu, Haluk; Parcell, Joseph L.

    2012-01-01

    Missing data is a problem that occurs frequently in survey data. Missing data results in biased estimates and reduced efficiency for regression estimates. The objective of the current study is to analyze the impact of missing-data imputation, using multiple-imputation methods, on regression estimates for agricultural household surveys. The current study also analyzes the impact of multiple-imputation on regression results, when all the variables in the regression have missing observations. Fi...

  6. Comparison of Imputation Methods for Handling Missing Categorical Data with Univariate Pattern|| Una comparación de métodos de imputación de variables categóricas con patrón univariado

    Directory of Open Access Journals (Sweden)

    Torres Munguía, Juan Armando

    2014-06-01

    Full Text Available This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random forests. Results showed that the most effective methods for dealing with missing categorical data in most of the scenarios assessed in this paper were hot-deck and polytomous regression approaches. || El presente estudio examina la estimación de proporciones muestrales en la presencia de valores faltantes en una variable categórica. Se utiliza una encuesta de consumo de tabaco (Encuesta Nacional de Adicciones de México 2011 para crear bases de datos simuladas pero reales con 5% y 15% de valores perdidos para cada mecanismo de no respuesta MCAR, MAR y MNAR. Se evalúa el desempeño de seis métodos para tratar la falta de respuesta: listwise, imputación de moda, imputación aleatoria, hot-deck, imputación por regresión politómica y árboles de clasificación. Los resultados de las simulaciones indican que los métodos más efectivos para el tratamiento de la no respuesta en variables categóricas, bajo los escenarios simulados, son hot-deck y la regresión politómica.

  7. Flexible Imputation of Missing Data

    CERN Document Server

    van Buuren, Stef

    2012-01-01

    Missing data form a problem in every scientific discipline, yet the techniques required to handle them are complicated and often lacking. One of the great ideas in statistical science--multiple imputation--fills gaps in the data with plausible values, the uncertainty of which is coded in the data itself. It also solves other problems, many of which are missing data problems in disguise. Flexible Imputation of Missing Data is supported by many examples using real data taken from the author's vast experience of collaborative research, and presents a practical guide for handling missing data unde

  8. Latent class regression: inference and estimation with two-stage multiple imputation.

    Science.gov (United States)

    Harel, Ofer; Chung, Hwan; Miglioretti, Diana

    2013-07-01

    Latent class regression (LCR) is a popular method for analyzing multiple categorical outcomes. While nonresponse to the manifest items is a common complication, inferences of LCR can be evaluated using maximum likelihood, multiple imputation, and two-stage multiple imputation. Under similar missing data assumptions, the estimates and variances from all three procedures are quite close. However, multiple imputation and two-stage multiple imputation can provide additional information: estimates for the rates of missing information. The methodology is illustrated using an example from a study on racial and ethnic disparities in breast cancer severity. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Mapping gradients of community composition with nearest-neighbour imputation: extending plot data for landscape analysis

    Science.gov (United States)

    Janet L. Ohmann; Matthew J. Gregory; Emilie B. Henderson; Heather M. Roberts

    2011-01-01

    Question: How can nearest-neighbour (NN) imputation be used to develop maps of multiple species and plant communities? Location: Western and central Oregon, USA, but methods are applicable anywhere. Methods: We demonstrate NN imputation by mapping woody plant communities for >100 000 km2 of diverse forests and woodlands. Species abundances on...

  10. Performance of selected imputation techniques for missing variances in meta-analysis

    Science.gov (United States)

    Idris, N. R. N.; Abdullah, M. H.; Tolos, S. M.

    2013-04-01

    A common method of handling the problem of missing variances in meta-analysis of continuous response is through imputation. However, the performance of imputation techniques may be influenced by the type of model utilised. In this article, we examine through a simulation study the effects of the techniques of imputation of the missing SDs and type of models used on the overall meta-analysis estimates. The results suggest that imputation should be adopted to estimate the overall effect size, irrespective of the model used. However, the accuracy of the estimates of the corresponding standard error (SE) is influenced by the imputation techniques. For estimates based on the fixed effects model, mean imputation provides better estimates than multiple imputations, while those based on the random effects model responds more robustly to the type of imputation techniques. The results showed that although imputation is good in reducing the bias in point estimates, it is more likely to produce coverage probability which is higher than the nominal value.

  11. Multiple imputation of completely missing repeated measures data within person from a complex sample: application to accelerometer data in the National Health and Nutrition Examination Survey.

    Science.gov (United States)

    Liu, Benmei; Yu, Mandi; Graubard, Barry I; Troiano, Richard P; Schenker, Nathaniel

    2016-12-10

    The Physical Activity Monitor component was introduced into the 2003-2004 National Health and Nutrition Examination Survey (NHANES) to collect objective information on physical activity including both movement intensity counts and ambulatory steps. Because of an error in the accelerometer device initialization process, the steps data were missing for all participants in several primary sampling units, typically a single county or group of contiguous counties, who had intensity count data from their accelerometers. To avoid potential bias and loss in efficiency in estimation and inference involving the steps data, we considered methods to accurately impute the missing values for steps collected in the 2003-2004 NHANES. The objective was to come up with an efficient imputation method that minimized model-based assumptions. We adopted a multiple imputation approach based on additive regression, bootstrapping and predictive mean matching methods. This method fits alternative conditional expectation (ace) models, which use an automated procedure to estimate optimal transformations for both the predictor and response variables. This paper describes the approaches used in this imputation and evaluates the methods by comparing the distributions of the original and the imputed data. A simulation study using the observed data is also conducted as part of the model diagnostics. Finally, some real data analyses are performed to compare the before and after imputation results. Published 2016. This article is a U.S. Government work and is in the public domain in the USA. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.

  12. A holistic comparative analysis of diagnostic tests for urothelial carcinoma: a study of Cxbladder Detect, UroVysion® FISH, NMP22® and cytology based on imputation of multiple datasets.

    Science.gov (United States)

    Breen, Vivienne; Kasabov, Nikola; Kamat, Ashish M; Jacobson, Elsie; Suttie, James M; O'Sullivan, Paul J; Kavalieris, Laimonis; Darling, David G

    2015-05-12

    Comparing the relative utility of diagnostic tests is challenging when available datasets are small, partial or incomplete. The analytical leverage associated with a large sample size can be gained by integrating several small datasets to enable effective and accurate across-dataset comparisons. Accordingly, we propose a methodology for a holistic comparative analysis and ranking of cancer diagnostic tests through dataset integration and imputation of missing values, using urothelial carcinoma (UC) as a case study. Five datasets comprising samples from 939 subjects, including 89 with UC, where up to four diagnostic tests (cytology, NMP22®, UroVysion® Fluorescence In-Situ Hybridization (FISH) and Cxbladder Detect) were integrated into a single dataset containing all measured records and missing values. The tests were firstly ranked using three criteria: sensitivity, specificity and a standard variable (feature) ranking method popularly known as signal-to-noise ratio (SNR) index derived from the mean values for all subjects clinically known to have UC versus healthy subjects. Secondly, step-wise unsupervised and supervised imputation (the latter accounting for the 'clinical truth' as determined by cystoscopy) was performed using personalized modelling, k-nearest-neighbour methods, multiple logistic regression and multilayer perceptron neural networks. All imputation models were cross-validated by comparing their post-imputation predictive accuracy for UC with their pre-imputation accuracy. Finally, the post-imputation tests were re-ranked using the same three criteria. In both measured and imputed data sets, Cxbladder Detect ranked higher for sensitivity, and urine cytology a higher specificity, when compared with other UC tests. Cxbladder Detect consistently ranked higher than FISH and all other tests when SNR analyses were performed on measured, unsupervised and supervised imputed datasets. Supervised imputation resulted in a smaller cross-validation error

  13. Passive imputation and parcel summaries are both valid to handle missing items in studies with many multi-item scales.

    Science.gov (United States)

    Eekhout, Iris; de Vet, Henrica Cw; de Boer, Michiel R; Twisk, Jos Wr; Heymans, Martijn W

    2018-04-01

    Previous studies showed that missing data in multi-item scales can best be handled by multiple imputation of item scores. However, when many scales are used, the number of items will become too large for the imputation model to reliably estimate imputations. A solution is to use passive imputation or a parcel summary score that combine and consequently reduce the number of variables in the imputation model. The performance of these methods was evaluated in a simulation study and illustrated in an example. Passive imputation, which updated scale scores from imputed items, and parcel summary scores that use the average over available item scores were compared to using all items simultaneously, imputing total scores of scales and complete-case analysis. Scale scores and coefficient estimates from linear regression were compared to "true" parameters on bias and precision. Passive imputation and using parcel summaries showed smaller bias and more precision than imputing total scores and complete-case analyses. Passive imputation or using parcel summary scores are valid missing data solutions in studies that include many multi-item scales.

  14. Assessment of genotype imputation performance using 1000 Genomes in African American studies.

    Directory of Open Access Journals (Sweden)

    Dana B Hancock

    Full Text Available Genotype imputation, used in genome-wide association studies to expand coverage of single nucleotide polymorphisms (SNPs, has performed poorly in African Americans compared to less admixed populations. Overall, imputation has typically relied on HapMap reference haplotype panels from Africans (YRI, European Americans (CEU, and Asians (CHB/JPT. The 1000 Genomes project offers a wider range of reference populations, such as African Americans (ASW, but their imputation performance has had limited evaluation. Using 595 African Americans genotyped on Illumina's HumanHap550v3 BeadChip, we compared imputation results from four software programs (IMPUTE2, BEAGLE, MaCH, and MaCH-Admix and three reference panels consisting of different combinations of 1000 Genomes populations (February 2012 release: (1 3 specifically selected populations (YRI, CEU, and ASW; (2 8 populations of diverse African (AFR or European (AFR descent; and (3 all 14 available populations (ALL. Based on chromosome 22, we calculated three performance metrics: (1 concordance (percentage of masked genotyped SNPs with imputed and true genotype agreement; (2 imputation quality score (IQS; concordance adjusted for chance agreement, which is particularly informative for low minor allele frequency [MAF] SNPs; and (3 average r2hat (estimated correlation between the imputed and true genotypes, for all imputed SNPs. Across the reference panels, IMPUTE2 and MaCH had the highest concordance (91%-93%, but IMPUTE2 had the highest IQS (81%-83% and average r2hat (0.68 using YRI+ASW+CEU, 0.62 using AFR+EUR, and 0.55 using ALL. Imputation quality for most programs was reduced by the addition of more distantly related reference populations, due entirely to the introduction of low frequency SNPs (MAF≤2% that are monomorphic in the more closely related panels. While imputation was optimized by using IMPUTE2 with reference to the ALL panel (average r2hat = 0.86 for SNPs with MAF>2%, use of the ALL

  15. The Utility of Nonparametric Transformations for Imputation of Survey Data

    Directory of Open Access Journals (Sweden)

    Robbins Michael W.

    2014-12-01

    Full Text Available Missing values present a prevalent problem in the analysis of establishment survey data. Multivariate imputation algorithms (which are used to fill in missing observations tend to have the common limitation that imputations for continuous variables are sampled from Gaussian distributions. This limitation is addressed here through the use of robust marginal transformations. Specifically, kernel-density and empirical distribution-type transformations are discussed and are shown to have favorable properties when used for imputation of complex survey data. Although such techniques have wide applicability (i.e., they may be easily applied in conjunction with a wide array of imputation techniques, the proposed methodology is applied here with an algorithm for imputation in the USDA’s Agricultural Resource Management Survey. Data analysis and simulation results are used to illustrate the specific advantages of the robust methods when compared to the fully parametric techniques and to other relevant techniques such as predictive mean matching. To summarize, transformations based upon parametric densities are shown to distort several data characteristics in circumstances where the parametric model is ill fit; however, no circumstances are found in which the transformations based upon parametric models outperform the nonparametric transformations. As a result, the transformation based upon the empirical distribution (which is the most computationally efficient is recommended over the other transformation procedures in practice.

  16. Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys

    Science.gov (United States)

    Si, Yajuan; Reiter, Jerome P.

    2013-01-01

    In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian,…

  17. Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data

    Science.gov (United States)

    Deng, Yi; Chang, Changgee; Ido, Moges Seyoum; Long, Qi

    2016-02-01

    Multiple imputation (MI) has been widely used for handling missing data in biomedical research. In the presence of high-dimensional data, regularized regression has been used as a natural strategy for building imputation models, but limited research has been conducted for handling general missing data patterns where multiple variables have missing values. Using the idea of multiple imputation by chained equations (MICE), we investigate two approaches of using regularized regression to impute missing values of high-dimensional data that can handle general missing data patterns. We compare our MICE methods with several existing imputation methods in simulation studies. Our simulation results demonstrate the superiority of the proposed MICE approach based on an indirect use of regularized regression in terms of bias. We further illustrate the proposed methods using two data examples.

  18. An imputation-based solution to using mismeasured covariates in propensity score analysis.

    Science.gov (United States)

    Webb-Vargas, Yenny; Rudolph, Kara E; Lenis, David; Murakami, Peter; Stuart, Elizabeth A

    2017-08-01

    Although covariate measurement error is likely the norm rather than the exception, methods for handling covariate measurement error in propensity score methods have not been widely investigated. We consider a multiple imputation-based approach that uses an external calibration sample with information on the true and mismeasured covariates, multiple imputation for external calibration, to correct for the measurement error, and investigate its performance using simulation studies. As expected, using the covariate measured with error leads to bias in the treatment effect estimate. In contrast, the multiple imputation for external calibration method can eliminate almost all the bias. We confirm that the outcome must be used in the imputation process to obtain good results, a finding related to the idea of congenial imputation and analysis in the broader multiple imputation literature. We illustrate the multiple imputation for external calibration approach using a motivating example estimating the effects of living in a disadvantaged neighborhood on mental health and substance use outcomes among adolescents. These results show that estimating the propensity score using covariates measured with error leads to biased estimates of treatment effects, but when a calibration data set is available, multiple imputation for external calibration can be used to help correct for such bias.

  19. Data imputation analysis for Cosmic Rays time series

    Science.gov (United States)

    Fernandes, R. C.; Lucio, P. S.; Fernandez, J. H.

    2017-05-01

    The occurrence of missing data concerning Galactic Cosmic Rays time series (GCR) is inevitable since loss of data is due to mechanical and human failure or technical problems and different periods of operation of GCR stations. The aim of this study was to perform multiple dataset imputation in order to depict the observational dataset. The study has used the monthly time series of GCR Climax (CLMX) and Roma (ROME) from 1960 to 2004 to simulate scenarios of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of missing data compared to observed ROME series, with 50 replicates. Then, the CLMX station as a proxy for allocation of these scenarios was used. Three different methods for monthly dataset imputation were selected: AMÉLIA II - runs the bootstrap Expectation Maximization algorithm, MICE - runs an algorithm via Multivariate Imputation by Chained Equations and MTSDI - an Expectation Maximization algorithm-based method for imputation of missing values in multivariate normal time series. The synthetic time series compared with the observed ROME series has also been evaluated using several skill measures as such as RMSE, NRMSE, Agreement Index, R, R2, F-test and t-test. The results showed that for CLMX and ROME, the R2 and R statistics were equal to 0.98 and 0.96, respectively. It was observed that increases in the number of gaps generate loss of quality of the time series. Data imputation was more efficient with MTSDI method, with negligible errors and best skill coefficients. The results suggest a limit of about 60% of missing data for imputation, for monthly averages, no more than this. It is noteworthy that CLMX, ROME and KIEL stations present no missing data in the target period. This methodology allowed reconstructing 43 time series.

  20. Comparing methodologies for imputing ethnicity in an urban ophthalmology clinic.

    Science.gov (United States)

    Storey, Philip; Murchison, Ann P; Dai, Yang; Hark, Lisa; Pizzi, Laura T; Leiby, Benjamin E; Haller, Julia A

    2014-04-01

    To compare methodologies for imputing ethnicity in an urban ophthalmology clinic. Using data from 19,165 patients with self-reported ethnicity, surname, and home address, we compared the accuracy of three methodologies for imputing ethnicity: (1) a surname method based on tabulation from the 2000 US Census; (2) a geocoding method based on tract data from the 2010 US Census; and (3) a combined surname geocoding method using Bayes' theorem. The combined surname geocoding model had the highest accuracy of the three methodologies, imputing black ethnicity with a sensitivity of 84% and positive predictive value (PPV) of 94%, white ethnicity with a sensitivity of 92% and PPV of 82%, Hispanic ethnicity with a sensitivity of 77% and PPV of 71%, and Asian ethnicity with a sensitivity of 83% and PPV of 79%. Overall agreement of imputed and self-reported ethnicity was fair for the surname method (κ 0.23), moderate for the geocoding method (κ 0.58), and strong for the combined method (κ 0.76). A methodology combining surname analysis and Census tract data using Bayes' theorem to determine ethnicity is superior to other methods tested and is ideally suited for research purposes of clinical and administrative data.

  1. Time relative single-photon (photoelectron) method

    International Nuclear Information System (INIS)

    Luo Binqiao

    1988-01-01

    A single-photon (photoelectron) measuring system is designed. It researches various problems in single-photon (photoelectron) method. The electronic resolving time is less than 25 ps. The resolving time of single-photon (photoelectron) measuring system is 25 to 65 ps

  2. 3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data.

    Science.gov (United States)

    Luo, Yuan; Szolovits, Peter; Dighe, Anand S; Baron, Jason M

    2017-11-30

    A key challenge in clinical data mining is that most clinical datasets contain missing data. Since many commonly used machine learning algorithms require complete datasets (no missing data), clinical analytic approaches often entail an imputation procedure to "fill in" missing data. However, although most clinical datasets contain a temporal component, most commonly used imputation methods do not adequately accommodate longitudinal time-based data. We sought to develop a new imputation algorithm, 3-dimensional multiple imputation with chained equations (3D-MICE), that can perform accurate imputation of missing clinical time series data. We extracted clinical laboratory test results for 13 commonly measured analytes (clinical laboratory tests). We imputed missing test results for the 13 analytes using 3 imputation methods: multiple imputation with chained equations (MICE), Gaussian process (GP), and 3D-MICE. 3D-MICE utilizes both MICE and GP imputation to integrate cross-sectional and longitudinal information. To evaluate imputation method performance, we randomly masked selected test results and imputed these masked results alongside results missing from our original data. We compared predicted results to measured results for masked data points. 3D-MICE performed significantly better than MICE and GP-based imputation in a composite of all 13 analytes, predicting missing results with a normalized root-mean-square error of 0.342, compared to 0.373 for MICE alone and 0.358 for GP alone. 3D-MICE offers a novel and practical approach to imputing clinical laboratory time series data. 3D-MICE may provide an additional tool for use as a foundation in clinical predictive analytics and intelligent clinical decision support. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  3. Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets

    Directory of Open Access Journals (Sweden)

    Min-Wei Huang

    2018-01-01

    Full Text Available Many real-world medical datasets contain some proportion of missing (attribute values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.

  4. A nonparametric multiple imputation approach for missing categorical data.

    Science.gov (United States)

    Zhou, Muhan; He, Yulei; Yu, Mandi; Hsu, Chiu-Hsieh

    2017-06-06

    Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities. We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model) and the other fits a logistic regression for predicting missingness probabilities (the missingness model). A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with more than two levels for assessing the distribution of the outcome

  5. Multiple imputation with sequential penalized regression.

    Science.gov (United States)

    Zahid, Faisal M; Heumann, Christian

    2018-01-01

    Missing data is a common issue that can cause problems in estimation and inference in biomedical, epidemiological and social research. Multiple imputation is an increasingly popular approach for handling missing data. In case of a large number of covariates with missing data, existing multiple imputation software packages may not work properly and often produce errors. We propose a multiple imputation algorithm called mispr based on sequential penalized regression models. Each variable with missing values is assumed to have a different distributional form and is imputed with its own imputation model using the ridge penalty. In the case of a large number of predictors with respect to the sample size, the use of a quadratic penalty guarantees unique estimates for the parameters and leads to better predictions than the usual Maximum Likelihood Estimation (MLE), with a good compromise between bias and variance. As a result, the proposed algorithm performs well and provides imputed values that are better even for a large number of covariates with small samples. The results are compared with the existing software packages mice, VIM and Amelia in simulation studies. The missing at random mechanism was the main assumption in the simulation study. The imputation performance of the proposed algorithm is evaluated with mean squared imputation error and mean absolute imputation error. The mean squared error ([Formula: see text]), parameter estimates with their standard errors and confidence intervals are also computed to compare the performance in the regression context. The proposed algorithm is observed to be a good competitor to the existing algorithms, with smaller mean squared imputation error, mean absolute imputation error and mean squared error. The algorithm's performance becomes considerably better than that of the existing algorithms with increasing number of covariates, especially when the number of predictors is close to or even greater than the sample size. Two

  6. Multiple imputation for cure rate quantile regression with censored data.

    Science.gov (United States)

    Wu, Yuanshan; Yin, Guosheng

    2017-03-01

    The main challenge in the context of cure rate analysis is that one never knows whether censored subjects are cured or uncured, or whether they are susceptible or insusceptible to the event of interest. Considering the susceptible indicator as missing data, we propose a multiple imputation approach to cure rate quantile regression for censored data with a survival fraction. We develop an iterative algorithm to estimate the conditionally uncured probability for each subject. By utilizing this estimated probability and Bernoulli sample imputation, we can classify each subject as cured or uncured, and then employ the locally weighted method to estimate the quantile regression coefficients with only the uncured subjects. Repeating the imputation procedure multiple times and taking an average over the resultant estimators, we obtain consistent estimators for the quantile regression coefficients. Our approach relaxes the usual global linearity assumption, so that we can apply quantile regression to any particular quantile of interest. We establish asymptotic properties for the proposed estimators, including both consistency and asymptotic normality. We conduct simulation studies to assess the finite-sample performance of the proposed multiple imputation method and apply it to a lung cancer study as an illustration. © 2016, The International Biometric Society.

  7. Explicating the Conditions Under Which Multilevel Multiple Imputation Mitigates Bias Resulting from Random Coefficient-Dependent Missing Longitudinal Data.

    Science.gov (United States)

    Gottfredson, Nisha C; Sterba, Sonya K; Jackson, Kristina M

    2017-01-01

    Random coefficient-dependent (RCD) missingness is a non-ignorable mechanism through which missing data can arise in longitudinal designs. RCD, for which we cannot test, is a problematic form of missingness that occurs if subject-specific random effects correlate with propensity for missingness or dropout. Particularly when covariate missingness is a problem, investigators typically handle missing longitudinal data by using single-level multiple imputation procedures implemented with long-format data, which ignores within-person dependency entirely, or implemented with wide-format (i.e., multivariate) data, which ignores some aspects of within-person dependency. When either of these standard approaches to handling missing longitudinal data is used, RCD missingness leads to parameter bias and incorrect inference. We explain why multilevel multiple imputation (MMI) should alleviate bias induced by a RCD missing data mechanism under conditions that contribute to stronger determinacy of random coefficients. We evaluate our hypothesis with a simulation study. Three design factors are considered: intraclass correlation (ICC; ranging from .25 to .75), number of waves (ranging from 4 to 8), and percent of missing data (ranging from 20 to 50%). We find that MMI greatly outperforms the single-level wide-format (multivariate) method for imputation under a RCD mechanism. For the MMI analyses, bias was most alleviated when the ICC is high, there were more waves of data, and when there was less missing data. Practical recommendations for handling longitudinal missing data are suggested.

  8. 16 CFR 1115.11 - Imputed knowledge.

    Science.gov (United States)

    2010-01-01

    ... 16 Commercial Practices 2 2010-01-01 2010-01-01 false Imputed knowledge. 1115.11 Section 1115.11... PRODUCT HAZARD REPORTS General Interpretation § 1115.11 Imputed knowledge. (a) In evaluating whether or... care to ascertain the truth of complaints or other representations. This includes the knowledge a firm...

  9. Propensity score analysis with partially observed covariates: How should multiple imputation be used?

    Science.gov (United States)

    Leyrat, Clémence; Seaman, Shaun R; White, Ian R; Douglas, Ian; Smeeth, Liam; Kim, Joseph; Resche-Rigon, Matthieu; Carpenter, James R; Williamson, Elizabeth J

    2017-01-01

    Inverse probability of treatment weighting is a popular propensity score-based approach to estimate marginal treatment effects in observational studies at risk of confounding bias. A major issue when estimating the propensity score is the presence of partially observed covariates. Multiple imputation is a natural approach to handle missing data on covariates: covariates are imputed and a propensity score analysis is performed in each imputed dataset to estimate the treatment effect. The treatment effect estimates from each imputed dataset are then combined to obtain an overall estimate. We call this method MIte. However, an alternative approach has been proposed, in which the propensity scores are combined across the imputed datasets (MIps). Therefore, there are remaining uncertainties about how to implement multiple imputation for propensity score analysis: (a) should we apply Rubin's rules to the inverse probability of treatment weighting treatment effect estimates or to the propensity score estimates themselves? (b) does the outcome have to be included in the imputation model? (c) how should we estimate the variance of the inverse probability of treatment weighting estimator after multiple imputation? We studied the consistency and balancing properties of the MIte and MIps estimators and performed a simulation study to empirically assess their performance for the analysis of a binary outcome. We also compared the performance of these methods to complete case analysis and the missingness pattern approach, which uses a different propensity score model for each pattern of missingness, and a third multiple imputation approach in which the propensity score parameters are combined rather than the propensity scores themselves (MIpar). Under a missing at random mechanism, complete case and missingness pattern analyses were biased in most cases for estimating the marginal treatment effect, whereas multiple imputation approaches were approximately unbiased as long as the

  10. Multiple imputation as a means to assess Mammographic vs. Ultrasound technology in Determine Breast Cancer Recurrence

    Science.gov (United States)

    Helenowski, Irene B.; Demirtas, Hakan; Khan, Seema; Eladoumikdachi, Firas; Shidfar, Ali

    2014-03-01

    Tumor size based on mammographic and ultrasound data are two methods used in predicting recurrence in breast cancer patients. Which technology offers better determination of diagnosis is an ongoing debate among radiologists, biophysicists, and other clinicians, however. Further complications in assessing the performance of each technology arise from missing data. One approach to remedy this problem may involve multiple imputation. Here, we therefore examine how imputation affects our assessment of the relationship between recurrence and tumor size determined either by mammography of ultrasound technology. We specifically employ the semi-parametric approach for imputing mixed continuous and binary data as presented in Helenowski and Demirtas (2013).

  11. Comparison of results from different imputation techniques for missing data from an anti-obesity drug trial

    DEFF Research Database (Denmark)

    Jørgensen, Anders W.; Lundstrøm, Lars H; Wetterslev, Jørn

    2014-01-01

    . In anti-obesity drug trials, many data are usually missing, and the most used imputation method is last observation carried forward (LOCF). LOCF is generally considered conservative, but there are more reliable methods such as multiple imputation (MI). OBJECTIVES: To compare four different methods......BACKGROUND: In randomised trials of medical interventions, the most reliable analysis follows the intention-to-treat (ITT) principle. However, the ITT analysis requires that missing outcome data have to be imputed. Different imputation techniques may give different results and some may lead to bias...... of handling missing data in a 60-week placebo controlled anti-obesity drug trial on topiramate. METHODS: We compared an analysis of complete cases with datasets where missing body weight measurements had been replaced using three different imputation methods: LOCF, baseline carried forward (BOCF) and MI...

  12. Flexible Modeling of Survival Data with Covariates Subject to Detection Limits via Multiple Imputation.

    Science.gov (United States)

    Bernhardt, Paul W; Wang, Huixia Judy; Zhang, Daowen

    2014-01-01

    Models for survival data generally assume that covariates are fully observed. However, in medical studies it is not uncommon for biomarkers to be censored at known detection limits. A computationally-efficient multiple imputation procedure for modeling survival data with covariates subject to detection limits is proposed. This procedure is developed in the context of an accelerated failure time model with a flexible seminonparametric error distribution. The consistency and asymptotic normality of the multiple imputation estimator are established and a consistent variance estimator is provided. An iterative version of the proposed multiple imputation algorithm that approximates the EM algorithm for maximum likelihood is also suggested. Simulation studies demonstrate that the proposed multiple imputation methods work well while alternative methods lead to estimates that are either biased or more variable. The proposed methods are applied to analyze the dataset from a recently-conducted GenIMS study.

  13. Imputation of TPMT defective alleles for the identification of patients with high-risk phenotypes

    Directory of Open Access Journals (Sweden)

    Berta eAlmoguera

    2014-05-01

    Full Text Available Background: The activity of thiopurine methyltransferase (TPMT is subject to genetic variation. Loss-of-function alleles are associated with various degrees of myelosuppression after treatment with thiopurine drugs, thus genotype-based dosing recommendations currently exist. The aim of this study was to evaluate the potential utility of leveraging genomic data from large biorepositories in the identification of individuals with TPMT defective alleles. Material and methods: TPMT variants were imputed using the 1,000 Genomes Project reference panel in 87,979 samples from the biobank at The Children’s Hospital of Philadelphia. Population ancestry was determined by principal component analysis using HapMap3 samples as reference. Frequencies of the TPMT imputed alleles, genotypes and the associated phenotype were determined across the different populations. A sample of 630 subjects with genotype data from Sanger sequencing (N=59 and direct genotyping (N=583 (12 samples overlapping in the two groups was used to check the concordance between the imputed and observed genotypes, as well as the sensitivity, specificity and positive and negative predictive values of the imputation. Results: Two SNPs (rs1800460 and rs1142345 that represent three TPMT alleles (*3A, *3B, and *3C were imputed with adequate quality. Frequency for the associated enzyme activity varied across populations and 89.36-94.58% were predicted to have normal TPMT activity, 5.3-10.31% intermediate and 0.12-0.34% poor activities. Overall, 98.88% of individuals (623/630 were correctly imputed into carrying no risk alleles (553/553, heterozygous (45/46 and homozygous (25/31. Sensitivity, specificity and predictive values of imputation were over 90% in all cases except for the sensitivity of imputing homozygous subjects that was 80.64%. Conclusion: Imputation of TPMT alleles from existing genomic data can be used as a first step in the screening of individuals at risk of developing serious

  14. First Use of Multiple Imputation with the National Tuberculosis Surveillance System

    Directory of Open Access Journals (Sweden)

    Christopher Vinnard

    2013-01-01

    Full Text Available Aims. The purpose of this study was to compare methods for handling missing data in analysis of the National Tuberculosis Surveillance System of the Centers for Disease Control and Prevention. Because of the high rate of missing human immunodeficiency virus (HIV infection status in this dataset, we used multiple imputation methods to minimize the bias that may result from less sophisticated methods. Methods. We compared analysis based on multiple imputation methods with analysis based on deleting subjects with missing covariate data from regression analysis (case exclusion, and determined whether the use of increasing numbers of imputed datasets would lead to changes in the estimated association between isoniazid resistance and death. Results. Following multiple imputation, the odds ratio for initial isoniazid resistance and death was 2.07 (95% CI 1.30, 3.29; with case exclusion, this odds ratio decreased to 1.53 (95% CI 0.83, 2.83. The use of more than 5 imputed datasets did not substantively change the results. Conclusions. Our experience with the National Tuberculosis Surveillance System dataset supports the use of multiple imputation methods in epidemiologic analysis, but also demonstrates that close attention should be paid to the potential impact of missing covariates at each step of the analysis.

  15. Imputed forest structure uncertainty varies across elevational and longitudinal gradients in the western Cascade mountains, Oregon, USA

    Science.gov (United States)

    David M. Bell; Matthew J. Gregory; Janet L. Ohmann

    2015-01-01

    Imputation provides a useful method for mapping forest attributes across broad geographic areas based on field plot measurements and Landsat multi-spectral data, but the resulting map products may be of limited use without corresponding analyses of uncertainties in predictions. In the case of k-nearest neighbor (kNN) imputation with k = 1, such as the Gradient Nearest...

  16. Method for manufacturing a single crystal nanowire

    NARCIS (Netherlands)

    van den Berg, Albert; Bomer, Johan G.; Carlen, Edwin; Chen, S.; Kraaijenhagen, Roderik Adriaan; Pinedo, Herbert Michael

    2013-01-01

    A method for manufacturing a single crystal nano-structure is provided comprising the steps of providing a device layer with a 100 structure on a substrate; providing a stress layer onto the device layer; patterning the stress layer along the 110 direction of the device layer; selectively removing

  17. Method for manufacturing a single crystal nanowire

    NARCIS (Netherlands)

    van den Berg, Albert; Bomer, Johan G.; Carlen, Edwin; Chen, S.; Kraaijenhagen, R.A.; Pinedo, Herbert Michael

    2010-01-01

    A method for manufacturing a single crystal nano-structure is provided comprising the steps of providing a device layer with a 100 structure on a substrate; providing a stress layer onto the device layer; patterning the stress layer along the 110 direction of the device layer; selectively removing

  18. REALCOM-IMPUTE Software for Multilevel Multiple Imputation with Mixed Response Types

    OpenAIRE

    James R. Carpenter; Harvey Goldstein; Michael G. Kenward

    2011-01-01

    Multiple imputation is becoming increasingly established as the leading practical approach to modelling partially observed data, under the assumption that the data are missing at random. However, many medical and social datasets are multilevel, and this structures should be reflected not only in the model of interest, but also in the imputation model. In particular, the imputation model should reflect the differences between level 1 variables and level 2 variables (which are constant across l...

  19. Imputation of missing data in time series for air pollutants

    Science.gov (United States)

    Junger, W. L.; Ponce de Leon, A.

    2015-02-01

    Missing data are major concerns in epidemiological studies of the health effects of environmental air pollutants. This article presents an imputation-based method that is suitable for multivariate time series data, which uses the EM algorithm under the assumption of normal distribution. Different approaches are considered for filtering the temporal component. A simulation study was performed to assess validity and performance of proposed method in comparison with some frequently used methods. Simulations showed that when the amount of missing data was as low as 5%, the complete data analysis yielded satisfactory results regardless of the generating mechanism of the missing data, whereas the validity began to degenerate when the proportion of missing values exceeded 10%. The proposed imputation method exhibited good accuracy and precision in different settings with respect to the patterns of missing observations. Most of the imputations obtained valid results, even under missing not at random. The methods proposed in this study are implemented as a package called mtsdi for the statistical software system R.

  20. DISSCO: direct imputation of summary statistics allowing covariates.

    Science.gov (United States)

    Xu, Zheng; Duan, Qing; Yan, Song; Chen, Wei; Li, Mingyao; Lange, Ethan; Li, Yun

    2015-08-01

    Imputation of individual level genotypes at untyped markers using an external reference panel of genotyped or sequenced individuals has become standard practice in genetic association studies. Direct imputation of summary statistics can also be valuable, for example in meta-analyses where individual level genotype data are not available. Two methods (DIST and ImpG-Summary/LD), that assume a multivariate Gaussian distribution for the association summary statistics, have been proposed for imputing association summary statistics. However, both methods assume that the correlations between association summary statistics are the same as the correlations between the corresponding genotypes. This assumption can be violated in the presence of confounding covariates. We analytically show that in the absence of covariates, correlation among association summary statistics is indeed the same as that among the corresponding genotypes, thus serving as a theoretical justification for the recently proposed methods. We continue to prove that in the presence of covariates, correlation among association summary statistics becomes the partial correlation of the corresponding genotypes controlling for covariates. We therefore develop direct imputation of summary statistics allowing covariates (DISSCO). We consider two real-life scenarios where the correlation and partial correlation likely make practical difference: (i) association studies in admixed populations; (ii) association studies in presence of other confounding covariate(s). Application of DISSCO to real datasets under both scenarios shows at least comparable, if not better, performance compared with existing correlation-based methods, particularly for lower frequency variants. For example, DISSCO can reduce the absolute deviation from the truth by 3.9-15.2% for variants with minor allele frequency <5%. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Multi-population classical HLA type imputation.

    Directory of Open Access Journals (Sweden)

    Alexander Dilthey

    Full Text Available Statistical imputation of classical HLA alleles in case-control studies has become established as a valuable tool for identifying and fine-mapping signals of disease association in the MHC. Imputation into diverse populations has, however, remained challenging, mainly because of the additional haplotypic heterogeneity introduced by combining reference panels of different sources. We present an HLA type imputation model, HLA*IMP:02, designed to operate on a multi-population reference panel. HLA*IMP:02 is based on a graphical representation of haplotype structure. We present a probabilistic algorithm to build such models for the HLA region, accommodating genotyping error, haplotypic heterogeneity and the need for maximum accuracy at the HLA loci, generalizing the work of Browning and Browning (2007 and Ron et al. (1998. HLA*IMP:02 achieves an average 4-digit imputation accuracy on diverse European panels of 97% (call rate 97%. On non-European samples, 2-digit performance is over 90% for most loci and ethnicities where data available. HLA*IMP:02 supports imputation of HLA-DPB1 and HLA-DRB3-5, is highly tolerant of missing data in the imputation panel and works on standard genotype data from popular genotyping chips. It is publicly available in source code and as a user-friendly web service framework.

  2. Prostate cancer: net survival and cause-specific survival rates after multiple imputation.

    Science.gov (United States)

    Morisot, Adeline; Bessaoud, Faïza; Landais, Paul; Rébillard, Xavier; Trétarre, Brigitte; Daurès, Jean-Pierre

    2015-07-28

    Estimations of survival rates are diverse and the choice of the appropriate method depends on the context. Given the increasing interest in multiple imputation methods, we explored the interest of a multiple imputation approach in the estimation of cause-specific survival, when a subset of causes of death was observed. By using European Randomized Study of Screening for Prostate Cancer (ERSPC), 20 multiply imputed datasets were created and analyzed with a Multivariate Imputation by Chained Equation (MICE) algorithm. Then, cause-specific survival was estimated on each dataset with two methods: Kaplan-Meier and competing risks. The two pooled cause-specific survival and confidence intervals were obtained using Rubin's rules after complementary log-log transformation. Net survival was estimated using Pohar-Perme's estimator and was compared to pooled cause-specific survival. Finally, a sensitivity analysis was performed to test the robustness of our constructed multiple imputation model. Cause-specific survival performed better than net survival, since this latter exceeded 100 % for almost the first 2 years of follow-up and after 9 years whereas the cause-specific survival decreased slowly and than stabilized at around 94 % at 9 years. Sensibility study results were satisfactory. On our basis of prostate cancer data, the results obtained by cause-specific survival after multiple imputation appeared to be better and more realistic than those obtained using net survival.

  3. Model for Multiple Imputation to Estimate Daily Rainfall Data and Filling of Faults

    Directory of Open Access Journals (Sweden)

    José Ruy Porto de Carvalho

    Full Text Available Abstract Modeling by multiple enchained imputation is an area of growing importance. However, its models and methods are frequently developed for specific applications. In this study the model for multiple imputation was used to estimate daily rainfall data. Daily precipitation records from several meteorological stations were used, obtained from system AGRITEMPO for two homogenous climatic zones. The precipitation values obtained for two dates (Jan. 20th 2005 and May 2nd 2005 using the multiple imputation model were compared with geo-statistics techniques ordinary Kriging and Co-kriging with the altitude as an auxiliary variable. The multiple imputation model was 16% better for the first zone and over 23% for the second one, compared to the rainfall estimation obtained by geo-statistical techniques. The model proved to be a versatile technique, presenting coherent results with the conditions of different zones and times.

  4. A nonparametric multiple imputation approach for missing categorical data

    Directory of Open Access Journals (Sweden)

    Muhan Zhou

    2017-06-01

    Full Text Available Abstract Background Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness probabilities. Methods We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model and the other fits a logistic regression for predicting missingness probabilities (the missingness model. A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. Results The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. Conclusions We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with

  5. Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa.

    Directory of Open Access Journals (Sweden)

    Katya L Masconi

    Full Text Available Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation.Data from the Cape Town Bellville-South cohort served as the basis for this study. Imputation methods and models were identified via recent systematic reviews. Models' discrimination was assessed and compared using C-statistic and non-parametric methods, before and after recalibration through simple intercept adjustment.The study sample consisted of 1256 individuals, of whom 173 were excluded due to previously diagnosed diabetes. Of the final 1083 individuals, 329 (30.4% had missing data. Family history had the highest proportion of missing data (25%. Imputation of the outcome, undiagnosed diabetes, was highest in stochastic regression imputation (163 individuals. Overall, deletion resulted in the lowest model performances while simple imputation yielded the highest C-statistic for the Cambridge Diabetes Risk model, Kuwaiti Risk model, Omani Diabetes Risk model and Rotterdam Predictive model. Multiple imputation only yielded the highest C-statistic for the Rotterdam Predictive model, which were matched by simpler imputation methods.Deletion was confirmed as a poor technique for handling missing data. However, despite the emphasized disadvantages of simpler imputation methods, this study showed that implementing these methods results in similar predictive utility for undiagnosed diabetes when compared to multiple imputation.

  6. Missing Data and Multiple Imputation: An Unbiased Approach

    Science.gov (United States)

    Foy, M.; VanBaalen, M.; Wear, M.; Mendez, C.; Mason, S.; Meyers, V.; Alexander, D.; Law, J.

    2014-01-01

    The default method of dealing with missing data in statistical analyses is to only use the complete observations (complete case analysis), which can lead to unexpected bias when data do not meet the assumption of missing completely at random (MCAR). For the assumption of MCAR to be met, missingness cannot be related to either the observed or unobserved variables. A less stringent assumption, missing at random (MAR), requires that missingness not be associated with the value of the missing variable itself, but can be associated with the other observed variables. When data are truly MAR as opposed to MCAR, the default complete case analysis method can lead to biased results. There are statistical options available to adjust for data that are MAR, including multiple imputation (MI) which is consistent and efficient at estimating effects. Multiple imputation uses informing variables to determine statistical distributions for each piece of missing data. Then multiple datasets are created by randomly drawing on the distributions for each piece of missing data. Since MI is efficient, only a limited number, usually less than 20, of imputed datasets are required to get stable estimates. Each imputed dataset is analyzed using standard statistical techniques, and then results are combined to get overall estimates of effect. A simulation study will be demonstrated to show the results of using the default complete case analysis, and MI in a linear regression of MCAR and MAR simulated data. Further, MI was successfully applied to the association study of CO2 levels and headaches when initial analysis showed there may be an underlying association between missing CO2 levels and reported headaches. Through MI, we were able to show that there is a strong association between average CO2 levels and the risk of headaches. Each unit increase in CO2 (mmHg) resulted in a doubling in the odds of reported headaches.

  7. Dealing with missing covariates in epidemiologic studies: a comparison between multiple imputation and a full Bayesian approach.

    Science.gov (United States)

    Erler, Nicole S; Rizopoulos, Dimitris; Rosmalen, Joost van; Jaddoe, Vincent W V; Franco, Oscar H; Lesaffre, Emmanuel M E H

    2016-07-30

    Incomplete data are generally a challenge to the analysis of most large studies. The current gold standard to account for missing data is multiple imputation, and more specifically multiple imputation with chained equations (MICE). Numerous studies have been conducted to illustrate the performance of MICE for missing covariate data. The results show that the method works well in various situations. However, less is known about its performance in more complex models, specifically when the outcome is multivariate as in longitudinal studies. In current practice, the multivariate nature of the longitudinal outcome is often neglected in the imputation procedure, or only the baseline outcome is used to impute missing covariates. In this work, we evaluate the performance of MICE using different strategies to include a longitudinal outcome into the imputation models and compare it with a fully Bayesian approach that jointly imputes missing values and estimates the parameters of the longitudinal model. Results from simulation and a real data example show that MICE requires the analyst to correctly specify which components of the longitudinal process need to be included in the imputation models in order to obtain unbiased results. The full Bayesian approach, on the other hand, does not require the analyst to explicitly specify how the longitudinal outcome enters the imputation models. It performed well under different scenarios. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  8. REALCOM-IMPUTE Software for Multilevel Multiple Imputation with Mixed Response Types

    Directory of Open Access Journals (Sweden)

    James R. Carpenter

    2011-12-01

    Full Text Available Multiple imputation is becoming increasingly established as the leading practical approach to modelling partially observed data, under the assumption that the data are missing at random. However, many medical and social datasets are multilevel, and this structure should be reflected not only in the model of interest, but also in the imputation model. In particular, the imputation model should re ect the dierences between level 1 variables and level 2 variables (which are constant across level 1 units. This led us to develop the REALCOM-IMPUTE software, which we describe in this article. This software performs multilevel multiple imputation, and handles ordinal and unordered categorical data appropriately. It is freely available on-line, and may be used either as a standalone package, or in conjunction with the multilevel software MLwiN or Stata.

  9. Comparing methods for single paragraph similarity analysis.

    Science.gov (United States)

    Stone, Benjamin; Dennis, Simon; Kwantes, Peter J

    2011-01-01

    The focus of this paper is two-fold. First, similarities generated from six semantic models were compared to human ratings of paragraph similarity on two datasets-23 World Entertainment News Network paragraphs and 50 ABC newswire paragraphs. Contrary to findings on smaller textual units such as word associations (Griffiths, Tenenbaum, & Steyvers, 2007), our results suggest that when single paragraphs are compared, simple nonreductive models (word overlap and vector space) can provide better similarity estimates than more complex models (LSA, Topic Model, SpNMF, and CSM). Second, various methods of corpus creation were explored to facilitate the semantic models' similarity estimates. Removing numeric and single characters, and also truncating document length improved performance. Automated construction of smaller Wikipedia-based corpora proved to be very effective, even improving upon the performance of corpora that had been chosen for the domain. Model performance was further improved by augmenting corpora with dataset paragraphs. Copyright © 2010 Cognitive Science Society, Inc.

  10. The utility of imputed matched sets. Analyzing probabilistically linked databases in a low information setting.

    Science.gov (United States)

    Thomas, A M; Cook, L J; Dean, J M; Olson, L M

    2014-01-01

    To compare results from high probability matched sets versus imputed matched sets across differing levels of linkage information. A series of linkages with varying amounts of available information were performed on two simulated datasets derived from multiyear motor vehicle crash (MVC) and hospital databases, where true matches were known. Distributions of high probability and imputed matched sets were compared against the true match population for occupant age, MVC county, and MVC hour. Regression models were fit to simulated log hospital charges and hospitalization status. High probability and imputed matched sets were not significantly different from occupant age, MVC county, and MVC hour in high information settings (p > 0.999). In low information settings, high probability matched sets were significantly different from occupant age and MVC county (p sets were not (p > 0.493). High information settings saw no significant differences in inference of simulated log hospital charges and hospitalization status between the two methods. High probability and imputed matched sets were significantly different from the outcomes in low information settings; however, imputed matched sets were more robust. The level of information available to a linkage is an important consideration. High probability matched sets are suitable for high to moderate information settings and for situations involving case-specific analysis. Conversely, imputed matched sets are preferable for low information settings when conducting population-based analyses.

  11. Quantitative trait Loci association mapping by imputation of strain origins in multifounder crosses.

    Science.gov (United States)

    Zhou, Jin J; Ghazalpour, Anatole; Sobel, Eric M; Sinsheimer, Janet S; Lange, Kenneth

    2012-02-01

    Although mapping quantitative traits in inbred strains is simpler than mapping the analogous traits in humans, classical inbred crosses suffer from reduced genetic diversity compared to experimental designs involving outbred animal populations. Multiple crosses, for example the Complex Trait Consortium's eight-way cross, circumvent these difficulties. However, complex mating schemes and systematic inbreeding raise substantial computational difficulties. Here we present a method for locally imputing the strain origins of each genotyped animal along its genome. Imputed origins then serve as mean effects in a multivariate Gaussian model for testing association between trait levels and local genomic variation. Imputation is a combinatorial process that assigns the maternal and paternal strain origin of each animal on the basis of observed genotypes and prior pedigree information. Without smoothing, imputation is likely to be ill-defined or jump erratically from one strain to another as an animal's genome is traversed. In practice, one expects to see long stretches where strain origins are invariant. Smoothing can be achieved by penalizing strain changes from one marker to the next. A dynamic programming algorithm then solves the strain imputation process in one quick pass through the genome of an animal. Imputation accuracy exceeds 99% in practical examples and leads to high-resolution mapping in simulated and real data. The previous fastest quantitative trait loci (QTL) mapping software for dense genome scans reduced compute times to hours. Our implementation further reduces compute times from hours to minutes with no loss in statistical power. Indeed, power is enhanced for full pedigree data.

  12. Effects of imputation on correlation: implications for analysis of mass spectrometry data from multiple biological matrices.

    Science.gov (United States)

    Taylor, Sandra L; Ruhaak, L Renee; Kelly, Karen; Weiss, Robert H; Kim, Kyoungmi

    2017-03-01

    With expanded access to, and decreased costs of, mass spectrometry, investigators are collecting and analyzing multiple biological matrices from the same subject such as serum, plasma, tissue and urine to enhance biomarker discoveries, understanding of disease processes and identification of therapeutic targets. Commonly, each biological matrix is analyzed separately, but multivariate methods such as MANOVAs that combine information from multiple biological matrices are potentially more powerful. However, mass spectrometric data typically contain large amounts of missing values, and imputation is often used to create complete data sets for analysis. The effects of imputation on multiple biological matrix analyses have not been studied. We investigated the effects of seven imputation methods (half minimum substitution, mean substitution, k-nearest neighbors, local least squares regression, Bayesian principal components analysis, singular value decomposition and random forest), on the within-subject correlation of compounds between biological matrices and its consequences on MANOVA results. Through analysis of three real omics data sets and simulation studies, we found the amount of missing data and imputation method to substantially change the between-matrix correlation structure. The magnitude of the correlations was generally reduced in imputed data sets, and this effect increased with the amount of missing data. Significant results from MANOVA testing also were substantially affected. In particular, the number of false positives increased with the level of missing data for all imputation methods. No one imputation method was universally the best, but the simple substitution methods (Half Minimum and Mean) consistently performed poorly. © The Author 2016. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  13. Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation

    Science.gov (United States)

    Adikaram, K. K. L. B.; Becker, T.

    2015-01-01

    Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2/n to identify linear fit, where n is number of terms in a series. The ratio R max of a max − a min and S n − a min *n and that of R min of a max − a min and a max *n − S n are always equal to 2/n, where a max is the maximum element, a min is the minimum element and S n is the sum of all elements. If any series expected to follow y = c consists of data that do not agree with y = c form, R max > 2/n and R min > 2/n imply that the maximum and minimum elements, respectively, do not agree with linear fit. We define threshold values for outliers and noise detection as 2/n * (1 + k 1 ) and 2/n * (1 + k 2 ), respectively, where k 1 > k 2 and 0 ≤ k 1 ≤ n/2 − 1. Given this relation and transformation technique, which transforms data into the form y = c, we show that removing all data that do not agree with linear fit is possible. Furthermore, the method is independent of the number of data points, missing data, removed data points and nature of distribution (Gaussian or non-Gaussian) of outliers, noise and clean data. These are major advantages over the existing linear fit methods. Since having a perfect linear relation between two variables in the real world is impossible, we used artificial data sets with extreme conditions to verify the method. The method detects the correct linear fit when the percentage of data agreeing with linear fit is less than 50%, and the deviation of data that do not agree with linear fit is very small, of the order of ±10−4%. The method results in incorrect detections only when numerical accuracy is insufficient in the calculation process. PMID:26571035

  14. The utility of low-density genotyping for imputation in the Thoroughbred horse.

    Science.gov (United States)

    Corbin, Laura J; Kranis, Andreas; Blott, Sarah C; Swinburne, June E; Vaudin, Mark; Bishop, Stephen C; Woolliams, John A

    2014-02-04

    Despite the dramatic reduction in the cost of high-density genotyping that has occurred over the last decade, it remains one of the limiting factors for obtaining the large datasets required for genomic studies of disease in the horse. In this study, we investigated the potential for low-density genotyping and subsequent imputation to address this problem. Using the haplotype phasing and imputation program, BEAGLE, it is possible to impute genotypes from low- to high-density (50K) in the Thoroughbred horse with reasonable to high accuracy. Analysis of the sources of variation in imputation accuracy revealed dependence both on the minor allele frequency of the single nucleotide polymorphisms (SNPs) being imputed and on the underlying linkage disequilibrium structure. Whereas equidistant spacing of the SNPs on the low-density panel worked well, optimising SNP selection to increase their minor allele frequency was advantageous, even when the panel was subsequently used in a population of different geographical origin. Replacing base pair position with linkage disequilibrium map distance reduced the variation in imputation accuracy across SNPs. Whereas a 1K SNP panel was generally sufficient to ensure that more than 80% of genotypes were correctly imputed, other studies suggest that a 2K to 3K panel is more efficient to minimize the subsequent loss of accuracy in genomic prediction analyses. The relationship between accuracy and genotyping costs for the different low-density panels, suggests that a 2K SNP panel would represent good value for money. Low-density genotyping with a 2K SNP panel followed by imputation provides a compromise between cost and accuracy that could promote more widespread genotyping, and hence the use of genomic information in horses. In addition to offering a low cost alternative to high-density genotyping, imputation provides a means to combine datasets from different genotyping platforms, which is becoming necessary since researchers are

  15. Accuracy of estimation of genomic breeding values in pigs using low-density genotypes and imputation.

    Science.gov (United States)

    Badke, Yvonne M; Bates, Ronald O; Ernst, Catherine W; Fix, Justin; Steibel, Juan P

    2014-04-16

    Genomic selection has the potential to increase genetic progress. Genotype imputation of high-density single-nucleotide polymorphism (SNP) genotypes can improve the cost efficiency of genomic breeding value (GEBV) prediction for pig breeding. Consequently, the objectives of this work were to: (1) estimate accuracy of genomic evaluation and GEBV for three traits in a Yorkshire population and (2) quantify the loss of accuracy of genomic evaluation and GEBV when genotypes were imputed under two scenarios: a high-cost, high-accuracy scenario in which only selection candidates were imputed from a low-density platform and a low-cost, low-accuracy scenario in which all animals were imputed using a small reference panel of haplotypes. Phenotypes and genotypes obtained with the PorcineSNP60 BeadChip were available for 983 Yorkshire boars. Genotypes of selection candidates were masked and imputed using tagSNP in the GeneSeek Genomic Profiler (10K). Imputation was performed with BEAGLE using 128 or 1800 haplotypes as reference panels. GEBV were obtained through an animal-centric ridge regression model using de-regressed breeding values as response variables. Accuracy of genomic evaluation was estimated as the correlation between estimated breeding values and GEBV in a 10-fold cross validation design. Accuracy of genomic evaluation using observed genotypes was high for all traits (0.65-0.68). Using genotypes imputed from a large reference panel (accuracy: R(2) = 0.95) for genomic evaluation did not significantly decrease accuracy, whereas a scenario with genotypes imputed from a small reference panel (R(2) = 0.88) did show a significant decrease in accuracy. Genomic evaluation based on imputed genotypes in selection candidates can be implemented at a fraction of the cost of a genomic evaluation using observed genotypes and still yield virtually the same accuracy. On the other side, using a very small reference panel of haplotypes to impute training animals and candidates for

  16. Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies

    Directory of Open Access Journals (Sweden)

    McElwee Joshua

    2009-06-01

    Full Text Available Abstract Background Although high-throughput genotyping arrays have made whole-genome association studies (WGAS feasible, only a small proportion of SNPs in the human genome are actually surveyed in such studies. In addition, various SNP arrays assay different sets of SNPs, which leads to challenges in comparing results and merging data for meta-analyses. Genome-wide imputation of untyped markers allows us to address these issues in a direct fashion. Methods 384 Caucasian American liver donors were genotyped using Illumina 650Y (Ilmn650Y arrays, from which we also derived genotypes from the Ilmn317K array. On these data, we compared two imputation methods: MACH and BEAGLE. We imputed 2.5 million HapMap Release22 SNPs, and conducted GWAS on ~40,000 liver mRNA expression traits (eQTL analysis. In addition, 200 Caucasian American and 200 African American subjects were genotyped using the Affymetrix 500 K array plus a custom 164 K fill-in chip. We then imputed the HapMap SNPs and quantified the accuracy by randomly masking observed SNPs. Results MACH and BEAGLE perform similarly with respect to imputation accuracy. The Ilmn650Y results in excellent imputation performance, and it outperforms Affx500K or Ilmn317K sets. For Caucasian Americans, 90% of the HapMap SNPs were imputed at 98% accuracy. As expected, imputation of poorly tagged SNPs (untyped SNPs in weak LD with typed markers was not as successful. It was more challenging to impute genotypes in the African American population, given (1 shorter LD blocks and (2 admixture with Caucasian populations in this population. To address issue (2, we pooled HapMap CEU and YRI data as an imputation reference set, which greatly improved overall performance. The approximate 40,000 phenotypes scored in these populations provide a path to determine empirically how the power to detect associations is affected by the imputation procedures. That is, at a fixed false discovery rate, the number of cis

  17. Genotype Imputation To Improve the Cost-Efficiency of Genomic Selection in Farmed Atlantic Salmon

    Directory of Open Access Journals (Sweden)

    Hsin-Yuan Tsai

    2017-04-01

    Full Text Available Genomic selection uses genome-wide marker information to predict breeding values for traits of economic interest, and is more accurate than pedigree-based methods. The development of high density SNP arrays for Atlantic salmon has enabled genomic selection in selective breeding programs, alongside high-resolution association mapping of the genetic basis of complex traits. However, in sibling testing schemes typical of salmon breeding programs, trait records are available on many thousands of fish with close relationships to the selection candidates. Therefore, routine high density SNP genotyping may be prohibitively expensive. One means to reducing genotyping cost is the use of genotype imputation, where selected key animals (e.g., breeding program parents are genotyped at high density, and the majority of individuals (e.g., performance tested fish and selection candidates are genotyped at much lower density, followed by imputation to high density. The main objectives of the current study were to assess the feasibility and accuracy of genotype imputation in the context of a salmon breeding program. The specific aims were: (i to measure the accuracy of genotype imputation using medium (25 K and high (78 K density mapped SNP panels, by masking varying proportions of the genotypes and assessing the correlation between the imputed genotypes and the true genotypes; and (ii to assess the efficacy of imputed genotype data in genomic prediction of key performance traits (sea lice resistance and body weight. Imputation accuracies of up to 0.90 were observed using the simple two-generation pedigree dataset, and moderately high accuracy (0.83 was possible even with very low density SNP data (∼250 SNPs. The performance of genomic prediction using imputed genotype data was comparable to using true genotype data, and both were superior to pedigree-based prediction. These results demonstrate that the genotype imputation approach used in this study can

  18. Multiple imputation of missing genotype data for unrelated individuals

    NARCIS (Netherlands)

    Souverein, O. W.; Zwinderman, A. H.; Tanck, M. W. T.

    2006-01-01

    The objective of this study was to investigate the performance of multiple imputation of missing genotype data for unrelated individuals using the polytomous logistic regression model, focusing on different missingness mechanisms, percentages of missing data, and imputation models. A complete

  19. Multiple imputation with non-additively related variables: Joint-modeling and approximations.

    Science.gov (United States)

    Kim, Soeun; Belin, Thomas R; Sugar, Catherine A

    2016-09-19

    This paper investigates multiple imputation methods for regression models with interacting continuous and binary predictors when continuous variable may be missing. Usual implementations for parametric multiple imputation assume a multivariate normal structure for the variables, which is not satisfied for a binary variable nor its interaction with a continuous variable. To accommodate interactions, missing covariates are multiply imputed from conditional distribution in a manner consistent with the joint model. Alternative imputation methods under multivariate normal assumptions are also considered as candidate approximations and evaluated in a simulation study. The results suggest that the joint modeling procedure performs generally well across a wide range of scenarios and so do the approximation methods that incorporate interactions in the model appropriately by stratification. It is critical to include interactions in the imputation model as failure to do so may result in low coverage and bias. We apply the joint modeling approach and approximation methods in the study of childhood trauma with gender × trauma interaction. © The Author(s) 2016.

  20. PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.

    Directory of Open Access Journals (Sweden)

    Oren E Livne

    2015-03-01

    Full Text Available Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm, a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs, from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.

  1. Accounting for misclassified outcomes in binary regression models using multiple imputation with internal validation data.

    Science.gov (United States)

    Edwards, Jessie K; Cole, Stephen R; Troester, Melissa A; Richardson, David B

    2013-05-01

    Outcome misclassification is widespread in epidemiology, but methods to account for it are rarely used. We describe the use of multiple imputation to reduce bias when validation data are available for a subgroup of study participants. This approach is illustrated using data from 308 participants in the multicenter Herpetic Eye Disease Study between 1992 and 1998 (48% female; 85% white; median age, 49 years). The odds ratio comparing the acyclovir group with the placebo group on the gold-standard outcome (physician-diagnosed herpes simplex virus recurrence) was 0.62 (95% confidence interval (CI): 0.35, 1.09). We masked ourselves to physician diagnosis except for a 30% validation subgroup used to compare methods. Multiple imputation (odds ratio (OR) = 0.60; 95% CI: 0.24, 1.51) was compared with naive analysis using self-reported outcomes (OR = 0.90; 95% CI: 0.47, 1.73), analysis restricted to the validation subgroup (OR = 0.57; 95% CI: 0.20, 1.59), and direct maximum likelihood (OR = 0.62; 95% CI: 0.26, 1.53). In simulations, multiple imputation and direct maximum likelihood had greater statistical power than did analysis restricted to the validation subgroup, yet all 3 provided unbiased estimates of the odds ratio. The multiple-imputation approach was extended to estimate risk ratios using log-binomial regression. Multiple imputation has advantages regarding flexibility and ease of implementation for epidemiologists familiar with missing data methods.

  2. PREDICTD PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition.

    Science.gov (United States)

    Durham, Timothy J; Libbrecht, Maxwell W; Howbert, J Jeffry; Bilmes, Jeff; Noble, William Stafford

    2018-04-11

    The Encyclopedia of DNA Elements (ENCODE) and the Roadmap Epigenomics Project seek to characterize the epigenome in diverse cell types using assays that identify, for example, genomic regions with modified histones or accessible chromatin. These efforts have produced thousands of datasets but cannot possibly measure each epigenomic factor in all cell types. To address this, we present a method, PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition (PREDICTD), to computationally impute missing experiments. PREDICTD leverages an elegant model called "tensor decomposition" to impute many experiments simultaneously. Compared with the current state-of-the-art method, ChromImpute, PREDICTD produces lower overall mean squared error, and combining the two methods yields further improvement. We show that PREDICTD data captures enhancer activity at noncoding human accelerated regions. PREDICTD provides reference imputed data and open-source software for investigating new cell types, and demonstrates the utility of tensor decomposition and cloud computing, both promising technologies for bioinformatics.

  3. Equipercentile Equating via Data-Imputation Techniques.

    Science.gov (United States)

    Liou, Michelle; Cheng, Philip E.

    1995-01-01

    Different data imputation techniques that are useful for equipercentile equating are discussed, and empirical data are used to evaluate the accuracy of these techniques as compared with chained equipercentile equating. The kernel estimator, the EM algorithm, the EB model, and the iterative moment estimator are considered. (SLD)

  4. Effect of reference population size and available ancestor genotypes on imputation of Mexican Holstein genotypes

    Science.gov (United States)

    The effects of reference population size and the availability of information from genotyped ancestors on the accuracy of imputation of single nucleotide polymorphisms (SNPs) were investigated for Mexican Holstein cattle. Three scenarios for reference population size were examined: (1) a local popula...

  5. Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

    Science.gov (United States)

    Meseck, Kristin; Jankowska, Marta M.; Schipperijn, Jasper; Natarajan, Loki; Godbole, Suneeta; Carlson, Jordan; Takemoto, Michelle; Crist, Katie; Kerr, Jacqueline

    2016-01-01

    The main purpose of the present study was to assess the impact of global positioning system (GPS) signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accurate and viable method for correcting GPS data loss. Accelerometer and GPS data of 782 participants from 8 studies were pooled to represent a range of lifestyles and interactions with the built environment. Periods of GPS signal lapse were identified and extracted. Generalised linear mixed models were run with the number of lapses and the length of lapses as outcomes. The signal lapses were imputed using a simple ruleset, and imputation was validated against person-worn camera imagery. A final generalised linear mixed model was used to identify the difference between the amount of GPS minutes pre- and post-imputation for the activity categories of sedentary, light, and moderate-to-vigorous physical activity. Over 17% of the dataset was comprised of GPS data lapses. No strong associations were found between increasing lapse length and number of lapses and the demographic and built environment variables. A significant difference was found between the pre- and post-imputation minutes for each activity category. No demographic or environmental bias was found for length or number of lapses, but imputation of GPS data may make a significant difference for inclusion of physical activity data that occurred during a lapse. Imputing GPS data lapses is a viable technique for returning spatial context to accelerometer data and improving the completeness of the dataset. PMID:27245796

  6. Imputation strategies for missing binary outcomes in cluster randomized trials

    Directory of Open Access Journals (Sweden)

    Akhtar-Danesh Noori

    2011-02-01

    Full Text Available Abstract Background Attrition, which leads to missing data, is a common problem in cluster randomized trials (CRTs, where groups of patients rather than individuals are randomized. Standard multiple imputation (MI strategies may not be appropriate to impute missing data from CRTs since they assume independent data. In this paper, under the assumption of missing completely at random and covariate dependent missing, we compared six MI strategies which account for the intra-cluster correlation for missing binary outcomes in CRTs with the standard imputation strategies and complete case analysis approach using a simulation study. Method We considered three within-cluster and three across-cluster MI strategies for missing binary outcomes in CRTs. The three within-cluster MI strategies are logistic regression method, propensity score method, and Markov chain Monte Carlo (MCMC method, which apply standard MI strategies within each cluster. The three across-cluster MI strategies are propensity score method, random-effects (RE logistic regression approach, and logistic regression with cluster as a fixed effect. Based on the community hypertension assessment trial (CHAT which has complete data, we designed a simulation study to investigate the performance of above MI strategies. Results The estimated treatment effect and its 95% confidence interval (CI from generalized estimating equations (GEE model based on the CHAT complete dataset are 1.14 (0.76 1.70. When 30% of binary outcome are missing completely at random, a simulation study shows that the estimated treatment effects and the corresponding 95% CIs from GEE model are 1.15 (0.76 1.75 if complete case analysis is used, 1.12 (0.72 1.73 if within-cluster MCMC method is used, 1.21 (0.80 1.81 if across-cluster RE logistic regression is used, and 1.16 (0.82 1.64 if standard logistic regression which does not account for clustering is used. Conclusion When the percentage of missing data is low or intra

  7. Inclusion of Population-specific Reference Panel from India to the 1000 Genomes Phase 3 Panel Improves Imputation Accuracy.

    Science.gov (United States)

    Ahmad, Meraj; Sinha, Anubhav; Ghosh, Sreya; Kumar, Vikrant; Davila, Sonia; Yajnik, Chittaranjan S; Chandak, Giriraj R

    2017-07-27

    Imputation is a computational method based on the principle of haplotype sharing allowing enrichment of genome-wide association study datasets. It depends on the haplotype structure of the population and density of the genotype data. The 1000 Genomes Project led to the generation of imputation reference panels which have been used globally. However, recent studies have shown that population-specific panels provide better enrichment of genome-wide variants. We compared the imputation accuracy using 1000 Genomes phase 3 reference panel and a panel generated from genome-wide data on 407 individuals from Western India (WIP). The concordance of imputed variants was cross-checked with next-generation re-sequencing data on a subset of genomic regions. Further, using the genome-wide data from 1880 individuals, we demonstrate that WIP works better than the 1000 Genomes phase 3 panel and when merged with it, significantly improves the imputation accuracy throughout the minor allele frequency range. We also show that imputation using only South Asian component of the 1000 Genomes phase 3 panel works as good as the merged panel, making it computationally less intensive job. Thus, our study stresses that imputation accuracy using 1000 Genomes phase 3 panel can be further improved by including population-specific reference panels from South Asia.

  8. Multiple imputation and analysis for high-dimensional incomplete proteomics data.

    Science.gov (United States)

    Yin, Xiaoyan; Levy, Daniel; Willinger, Christine; Adourian, Aram; Larson, Martin G

    2016-04-15

    Multivariable analysis of proteomics data using standard statistical models is hindered by the presence of incomplete data. We faced this issue in a nested case-control study of 135 incident cases of myocardial infarction and 135 pair-matched controls from the Framingham Heart Study Offspring cohort. Plasma protein markers (K = 861) were measured on the case-control pairs (N = 135), and the majority of proteins had missing expression values for a subset of samples. In the setting of many more variables than observations (K ≫ N), we explored and documented the feasibility of multiple imputation approaches along with subsequent analysis of the imputed data sets. Initially, we selected proteins with complete expression data (K = 261) and randomly masked some values as the basis of simulation to tune the imputation and analysis process. We randomly shuffled proteins into several bins, performed multiple imputation within each bin, and followed up with stepwise selection using conditional logistic regression within each bin. This process was repeated hundreds of times. We determined the optimal method of multiple imputation, number of proteins per bin, and number of random shuffles using several performance statistics. We then applied this method to 544 proteins with incomplete expression data (≤ 40% missing values), from which we identified a panel of seven proteins that were jointly associated with myocardial infarction. © 2015 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

  9. Saturated linkage map construction in Rubus idaeus using genotyping by sequencing and genome-independent imputation

    Directory of Open Access Journals (Sweden)

    Ward Judson A

    2013-01-01

    Full Text Available Abstract Background Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry. Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. Results Genotyping by Sequencing (GBS was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs linked these results to published maps for cross-validation and map comparison. Conclusions GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation

  10. A New Missing Data Imputation Algorithm Applied to Electrical Data Loggers

    Directory of Open Access Journals (Sweden)

    Concepción Crespo Turrado

    2015-12-01

    Full Text Available Nowadays, data collection is a key process in the study of electrical power networks when searching for harmonics and a lack of balance among phases. In this context, the lack of data of any of the main electrical variables (phase-to-neutral voltage, phase-to-phase voltage, and current in each phase and power factor adversely affects any time series study performed. When this occurs, a data imputation process must be accomplished in order to substitute the data that is missing for estimated values. This paper presents a novel missing data imputation method based on multivariate adaptive regression splines (MARS and compares it with the well-known technique called multivariate imputation by chained equations (MICE. The results obtained demonstrate how the proposed method outperforms the MICE algorithm.

  11. An Improved Generalized-Trend-Diffusion-Based Data Imputation for Steel Industry

    Directory of Open Access Journals (Sweden)

    Ying Liu

    2013-01-01

    Full Text Available Integrality and validity of industrial data are the fundamental factors in the domain of data-driven modeling. Aiming at the data missing problem of gas flow in steel industry, an improved Generalized-Trend-Diffusion (iGTD algorithm is proposed in this study, where in particular it considers the sort of problem with data properties of consecutively missing and small samples. And, the imputation accuracy can be greatly increased by the proposed Gaussian membership-based GTD which expands the useful knowledge of data samples. In addition, the imputation order is further discussed to enhance the sequential forecasting accuracy of gas flow. To verify the effectiveness of the proposed method, a series of experiments that consists of three categories of data features in the gas system is presented, and the results indicate that this method is comprehensively better for the imputation of the periodical-like data and the time-series-like data.

  12. Multiple imputation for an incomplete covariate that is a ratio.

    Science.gov (United States)

    Morris, Tim P; White, Ian R; Royston, Patrick; Seaman, Shaun R; Wood, Angela M

    2014-01-15

    We are concerned with multiple imputation of the ratio of two variables, which is to be used as a covariate in a regression analysis. If the numerator and denominator are not missing simultaneously, it seems sensible to make use of the observed variable in the imputation model. One such strategy is to impute missing values for the numerator and denominator, or the log-transformed numerator and denominator, and then calculate the ratio of interest; we call this 'passive' imputation. Alternatively, missing ratio values might be imputed directly, with or without the numerator and/or the denominator in the imputation model; we call this 'active' imputation. In two motivating datasets, one involving body mass index as a covariate and the other involving the ratio of total to high-density lipoprotein cholesterol, we assess the sensitivity of results to the choice of imputation model and, as an alternative, explore fully Bayesian joint models for the outcome and incomplete ratio. Fully Bayesian approaches using Winbugs were unusable in both datasets because of computational problems. In our first dataset, multiple imputation results are similar regardless of the imputation model; in the second, results are sensitive to the choice of imputation model. Sensitivity depends strongly on the coefficient of variation of the ratio's denominator. A simulation study demonstrates that passive imputation without transformation is risky because it can lead to downward bias when the coefficient of variation of the ratio's denominator is larger than about 0.1. Active imputation or passive imputation after log-transformation is preferable. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.

  13. Multiple imputation for harmonizing longitudinal non-commensurate measures in individual participant data meta-analysis.

    Science.gov (United States)

    Siddique, Juned; Reiter, Jerome P; Brincks, Ahnalee; Gibbons, Robert D; Crespi, Catherine M; Brown, C Hendricks

    2015-11-20

    There are many advantages to individual participant data meta-analysis for combining data from multiple studies. These advantages include greater power to detect effects, increased sample heterogeneity, and the ability to perform more sophisticated analyses than meta-analyses that rely on published results. However, a fundamental challenge is that it is unlikely that variables of interest are measured the same way in all of the studies to be combined. We propose that this situation can be viewed as a missing data problem in which some outcomes are entirely missing within some trials and use multiple imputation to fill in missing measurements. We apply our method to five longitudinal adolescent depression trials where four studies used one depression measure and the fifth study used a different depression measure. None of the five studies contained both depression measures. We describe a multiple imputation approach for filling in missing depression measures that makes use of external calibration studies in which both depression measures were used. We discuss some practical issues in developing the imputation model including taking into account treatment group and study. We present diagnostics for checking the fit of the imputation model and investigate whether external information is appropriately incorporated into the imputed values. Copyright © 2015 John Wiley & Sons, Ltd.

  14. Multiple imputation as a flexible tool for missing data handling in clinical research.

    Science.gov (United States)

    Enders, Craig K

    2017-11-01

    The last 20 years has seen an uptick in research on missing data problems, and most software applications now implement one or more sophisticated missing data handling routines (e.g., multiple imputation or maximum likelihood estimation). Despite their superior statistical properties (e.g., less stringent assumptions, greater accuracy and power), the adoption of these modern analytic approaches is not uniform in psychology and related disciplines. Thus, the primary goal of this manuscript is to describe and illustrate the application of multiple imputation. Although maximum likelihood estimation is perhaps the easiest method to use in practice, psychological data sets often feature complexities that are currently difficult to handle appropriately in the likelihood framework (e.g., mixtures of categorical and continuous variables), but relatively simple to treat with imputation. The paper describes a number of practical issues that clinical researchers are likely to encounter when applying multiple imputation, including mixtures of categorical and continuous variables, item-level missing data in questionnaires, significance testing, interaction effects, and multilevel missing data. Analysis examples illustrate imputation with software packages that are freely available on the internet. Copyright © 2016 Elsevier Ltd. All rights reserved.

  15. Imputation by the mean score should be avoided when validating a Patient Reported Outcomes questionnaire by a Rasch model in presence of informative missing data

    LENUS (Irish Health Repository)

    Hardouin, Jean-Benoit

    2011-07-14

    Abstract Background Nowadays, more and more clinical scales consisting in responses given by the patients to some items (Patient Reported Outcomes - PRO), are validated with models based on Item Response Theory, and more specifically, with a Rasch model. In the validation sample, presence of missing data is frequent. The aim of this paper is to compare sixteen methods for handling the missing data (mainly based on simple imputation) in the context of psychometric validation of PRO by a Rasch model. The main indexes used for validation by a Rasch model are compared. Methods A simulation study was performed allowing to consider several cases, notably the possibility for the missing values to be informative or not and the rate of missing data. Results Several imputations methods produce bias on psychometrical indexes (generally, the imputation methods artificially improve the psychometric qualities of the scale). In particular, this is the case with the method based on the Personal Mean Score (PMS) which is the most commonly used imputation method in practice. Conclusions Several imputation methods should be avoided, in particular PMS imputation. From a general point of view, it is important to use an imputation method that considers both the ability of the patient (measured for example by his\\/her score), and the difficulty of the item (measured for example by its rate of favourable responses). Another recommendation is to always consider the addition of a random process in the imputation method, because such a process allows reducing the bias. Last, the analysis realized without imputation of the missing data (available case analyses) is an interesting alternative to the simple imputation in this context.

  16. A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation.

    Science.gov (United States)

    Välikangas, Tommi; Suomi, Tomi; Elo, Laura L

    2017-05-31

    Label-free mass spectrometry (MS) has developed into an important tool applied in various fields of biological and life sciences. Several software exist to process the raw MS data into quantified protein abundances, including open source and commercial solutions. Each software includes a set of unique algorithms for different tasks of the MS data processing workflow. While many of these algorithms have been compared separately, a thorough and systematic evaluation of their overall performance is missing. Moreover, systematic information is lacking about the amount of missing values produced by the different proteomics software and the capabilities of different data imputation methods to account for them.In this study, we evaluated the performance of five popular quantitative label-free proteomics software workflows using four different spike-in data sets. Our extensive testing included the number of proteins quantified and the number of missing values produced by each workflow, the accuracy of detecting differential expression and logarithmic fold change and the effect of different imputation and filtering methods on the differential expression results. We found that the Progenesis software performed consistently well in the differential expression analysis and produced few missing values. The missing values produced by the other software decreased their performance, but this difference could be mitigated using proper data filtering or imputation methods. Among the imputation methods, we found that the local least squares (lls) regression imputation consistently increased the performance of the software in the differential expression analysis, and a combination of both data filtering and local least squares imputation increased performance the most in the tested data sets. © The Author 2017. Published by Oxford University Press.

  17. Construction and application of a Korean reference panel for imputing classical alleles and amino acids of human leukocyte antigen genes.

    Science.gov (United States)

    Kim, Kwangwoo; Bang, So-Young; Lee, Hye-Soon; Bae, Sang-Cheol

    2014-01-01

    Genetic variations of human leukocyte antigen (HLA) genes within the major histocompatibility complex (MHC) locus are strongly associated with disease susceptibility and prognosis for many diseases, including many autoimmune diseases. In this study, we developed a Korean HLA reference panel for imputing classical alleles and amino acid residues of several HLA genes. An HLA reference panel has potential for use in identifying and fine-mapping disease associations with the MHC locus in East Asian populations, including Koreans. A total of 413 unrelated Korean subjects were analyzed for single nucleotide polymorphisms (SNPs) at the MHC locus and six HLA genes, including HLA-A, -B, -C, -DRB1, -DPB1, and -DQB1. The HLA reference panel was constructed by phasing the 5,858 MHC SNPs, 233 classical HLA alleles, and 1,387 amino acid residue markers from 1,025 amino acid positions as binary variables. The imputation accuracy of the HLA reference panel was assessed by measuring concordance rates between imputed and genotyped alleles of the HLA genes from a subset of the study subjects and East Asian HapMap individuals. Average concordance rates were 95.6% and 91.1% at 2-digit and 4-digit allele resolutions, respectively. The imputation accuracy was minimally affected by SNP density of a test dataset for imputation. In conclusion, the Korean HLA reference panel we developed was highly suitable for imputing HLA alleles and amino acids from MHC SNPs in East Asians, including Koreans.

  18. Construction and application of a Korean reference panel for imputing classical alleles and amino acids of human leukocyte antigen genes.

    Directory of Open Access Journals (Sweden)

    Kwangwoo Kim

    Full Text Available Genetic variations of human leukocyte antigen (HLA genes within the major histocompatibility complex (MHC locus are strongly associated with disease susceptibility and prognosis for many diseases, including many autoimmune diseases. In this study, we developed a Korean HLA reference panel for imputing classical alleles and amino acid residues of several HLA genes. An HLA reference panel has potential for use in identifying and fine-mapping disease associations with the MHC locus in East Asian populations, including Koreans. A total of 413 unrelated Korean subjects were analyzed for single nucleotide polymorphisms (SNPs at the MHC locus and six HLA genes, including HLA-A, -B, -C, -DRB1, -DPB1, and -DQB1. The HLA reference panel was constructed by phasing the 5,858 MHC SNPs, 233 classical HLA alleles, and 1,387 amino acid residue markers from 1,025 amino acid positions as binary variables. The imputation accuracy of the HLA reference panel was assessed by measuring concordance rates between imputed and genotyped alleles of the HLA genes from a subset of the study subjects and East Asian HapMap individuals. Average concordance rates were 95.6% and 91.1% at 2-digit and 4-digit allele resolutions, respectively. The imputation accuracy was minimally affected by SNP density of a test dataset for imputation. In conclusion, the Korean HLA reference panel we developed was highly suitable for imputing HLA alleles and amino acids from MHC SNPs in East Asians, including Koreans.

  19. High-accuracy haplotype imputation using unphased genotype data as the references.

    Science.gov (United States)

    Li, Wenzhi; Xu, Wei; Fu, Guoxing; Ma, Li; Richards, Jendai; Rao, Weinian; Bythwood, Tameka; Guo, Shiwen; Song, Qing

    2015-11-10

    Enormously growing genomic datasets present a new challenge on missing data imputation, a notoriously resource-demanding task. Haplotype imputation requires ethnicity-matched references. However, to date, haplotype references are not available for the majority of populations in the world. We explored to use existing unphased genotype datasets as references; if it succeeds, it will cover almost all of the populations in the world. The results showed that our HiFi software successfully yields 99.43% accuracy with unphased genotype references. Our method provides a cost-effective solution to breakthrough the bottleneck of limited reference availability for haplotype imputation in the big data era. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

    DEFF Research Database (Denmark)

    Meseck, Kristin; Jankowska, Marta M; Schipperijn, Jasper

    2016-01-01

    and viable method for correcting GPS data loss. Accelerometer and GPS data of 782 participants from 8 studies were pooled to represent a range of lifestyles and interactions with the built environment. Periods of GPS signal lapse were identified and extracted. Generalised linear mixed models were run......The main purpose of the present study was to assess the impact of global positioning system (GPS) signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accurate...... with the number of lapses and the length of lapses as outcomes. The signal lapses were imputed using a simple ruleset, and imputation was validated against person-worn camera imagery. A final generalised linear mixed model was used to identify the difference between the amount of GPS minutes pre- and post...

  1. Development of novel growth methods for halide single crystals

    Science.gov (United States)

    Yokota, Yuui; Kurosawa, Shunsuke; Shoji, Yasuhiro; Ohashi, Yuji; Kamada, Kei; Yoshikawa, Akira

    2017-03-01

    We developed novel growth methods for halide scintillator single crystals with hygroscopic nature, Halide micro-pulling-down [H-μ-PD] method and Halide Vertical Bridgman [H-VB] method. The H-μ-PD method with a removable chamber system can grow a single crystal of halide scintillator material with hygroscopicity at faster growth rate than the conventional methods. On the other hand, the H-VB method can grow a large bulk single crystal of halide scintillator without a quartz ampule. CeCl3, LaBr3, Ce:LaBr3 and Eu:SrI2 fiber single crystals could be grown by the H-μ-PD method and Eu:SrI2 bulk single crystals of 1 and 1.5 inch in diameter could be grown by the H-VB method. The grown fiber and bulk single crystals showed comparable scintillation properties to the previous reports using the conventional methods.

  2. Imputation of KIR Types from SNP Variation Data.

    Science.gov (United States)

    Vukcevic, Damjan; Traherne, James A; Næss, Sigrid; Ellinghaus, Eva; Kamatani, Yoichiro; Dilthey, Alexander; Lathrop, Mark; Karlsen, Tom H; Franke, Andre; Moffatt, Miriam; Cookson, William; Trowsdale, John; McVean, Gil; Sawcer, Stephen; Leslie, Stephen

    2015-10-01

    Large population studies of immune system genes are essential for characterizing their role in diseases, including autoimmune conditions. Of key interest are a group of genes encoding the killer cell immunoglobulin-like receptors (KIRs), which have known and hypothesized roles in autoimmune diseases, resistance to viruses, reproductive conditions, and cancer. These genes are highly polymorphic, which makes typing expensive and time consuming. Consequently, despite their importance, KIRs have been little studied in large cohorts. Statistical imputation methods developed for other complex loci (e.g., human leukocyte antigen [HLA]) on the basis of SNP data provide an inexpensive high-throughput alternative to direct laboratory typing of these loci and have enabled important findings and insights for many diseases. We present KIR∗IMP, a method for imputation of KIR copy number. We show that KIR∗IMP is highly accurate and thus allows the study of KIRs in large cohorts and enables detailed investigation of the role of KIRs in human disease. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  3. Imputation of KIR Types from SNP Variation Data

    Science.gov (United States)

    Vukcevic, Damjan; Traherne, James A.; Næss, Sigrid; Ellinghaus, Eva; Kamatani, Yoichiro; Dilthey, Alexander; Lathrop, Mark; Karlsen, Tom H.; Franke, Andre; Moffatt, Miriam; Cookson, William; Trowsdale, John; McVean, Gil; Sawcer, Stephen; Leslie, Stephen

    2015-01-01

    Large population studies of immune system genes are essential for characterizing their role in diseases, including autoimmune conditions. Of key interest are a group of genes encoding the killer cell immunoglobulin-like receptors (KIRs), which have known and hypothesized roles in autoimmune diseases, resistance to viruses, reproductive conditions, and cancer. These genes are highly polymorphic, which makes typing expensive and time consuming. Consequently, despite their importance, KIRs have been little studied in large cohorts. Statistical imputation methods developed for other complex loci (e.g., human leukocyte antigen [HLA]) on the basis of SNP data provide an inexpensive high-throughput alternative to direct laboratory typing of these loci and have enabled important findings and insights for many diseases. We present KIR∗IMP, a method for imputation of KIR copy number. We show that KIR∗IMP is highly accurate and thus allows the study of KIRs in large cohorts and enables detailed investigation of the role of KIRs in human disease. PMID:26430804

  4. Learning-Based Adaptive Imputation Methodwith kNN Algorithm for Missing Power Data

    Directory of Open Access Journals (Sweden)

    Minkyung Kim

    2017-10-01

    Full Text Available This paper proposes a learning-based adaptive imputation method (LAI for imputing missing power data in an energy system. This method estimates the missing power data by using the pattern that appears in the collected data. Here, in order to capture the patterns from past power data, we newly model a feature vector by using past data and its variations. The proposed LAI then learns the optimal length of the feature vector and the optimal historical length, which are significant hyper parameters of the proposed method, by utilizing intentional missing data. Based on a weighted distance between feature vectors representing a missing situation and past situation, missing power data are estimated by referring to the k most similar past situations in the optimal historical length. We further extend the proposed LAI to alleviate the effect of unexpected variation in power data and refer to this new approach as the extended LAI method (eLAI. The eLAI selects a method between linear interpolation (LI and the proposed LAI to improve accuracy under unexpected variations. Finally, from a simulation under various energy consumption profiles, we verify that the proposed eLAI achieves about a 74% reduction of the average imputation error in an energy system, compared to the existing imputation methods.

  5. Multiple imputation of missing fMRI data in whole brain analysis.

    Science.gov (United States)

    Vaden, Kenneth I; Gebregziabher, Mulugeta; Kuchinsky, Stefanie E; Eckert, Mark A

    2012-04-15

    Whole brain fMRI analyses rarely include the entire brain because of missing data that result from data acquisition limits and susceptibility artifact, in particular. This missing data problem is typically addressed by omitting voxels from analysis, which may exclude brain regions that are of theoretical interest and increase the potential for Type II error at cortical boundaries or Type I error when spatial thresholds are used to establish significance. Imputation could significantly expand statistical map coverage, increase power, and enhance interpretations of fMRI results. We examined multiple imputation for group level analyses of missing fMRI data using methods that leverage the spatial information in fMRI datasets for both real and simulated data. Available case analysis, neighbor replacement, and regression based imputation approaches were compared in a general linear model framework to determine the extent to which these methods quantitatively (effect size) and qualitatively (spatial coverage) increased the sensitivity of group analyses. In both real and simulated data analysis, multiple imputation provided 1) variance that was most similar to estimates for voxels with no missing data, 2) fewer false positive errors in comparison to mean replacement, and 3) fewer false negative errors in comparison to available case analysis. Compared to the standard analysis approach of omitting voxels with missing data, imputation methods increased brain coverage in this study by 35% (from 33,323 to 45,071 voxels). In addition, multiple imputation increased the size of significant clusters by 58% and number of significant clusters across statistical thresholds, compared to the standard voxel omission approach. While neighbor replacement produced similar results, we recommend multiple imputation because it uses an informed sampling distribution to deal with missing data across subjects that can include neighbor values and other predictors. Multiple imputation is

  6. Analysis of accelerated failure time data with dependent censoring using auxiliary variables via nonparametric multiple imputation.

    Science.gov (United States)

    Hsu, Chiu-Hsieh; Taylor, Jeremy M G; Hu, Chengcheng

    2015-08-30

    We consider the situation of estimating the marginal survival distribution from censored data subject to dependent censoring using auxiliary variables. We had previously developed a nonparametric multiple imputation approach. The method used two working proportional hazards (PH) models, one for the event times and the other for the censoring times, to define a nearest neighbor imputing risk set. This risk set was then used to impute failure times for censored observations. Here, we adapt the method to the situation where the event and censoring times follow accelerated failure time models and propose to use the Buckley-James estimator as the two working models. Besides studying the performances of the proposed method, we also compare the proposed method with two popular methods for handling dependent censoring through the use of auxiliary variables, inverse probability of censoring weighted and parametric multiple imputation methods, to shed light on the use of them. In a simulation study with time-independent auxiliary variables, we show that all approaches can reduce bias due to dependent censoring. The proposed method is robust to misspecification of either one of the two working models and their link function. This indicates that a working proportional hazards model is preferred because it is more cumbersome to fit an accelerated failure time model. In contrast, the inverse probability of censoring weighted method is not robust to misspecification of the link function of the censoring time model. The parametric imputation methods rely on the specification of the event time model. The approaches are applied to a prostate cancer dataset. Copyright © 2015 John Wiley & Sons, Ltd.

  7. Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution?

    Directory of Open Access Journals (Sweden)

    Kristin Meseck

    2016-05-01

    Full Text Available The main purpose of the present study was to assess the impact of global positioning system (GPS signal lapse on physical activity analyses, discover any existing associations between missing GPS data and environmental and demographics attributes, and to determine whether imputation is an accurate and viable method for correcting GPS data loss. Accelerometer and GPS data of 782 participants from 8 studies were pooled to represent a range of lifestyles and interactions with the built environment. Periods of GPS signal lapse were identified and extracted. Generalised linear mixed models were run with the number of lapses and the length of lapses as outcomes. The signal lapses were imputed using a simple ruleset, and imputation was validated against person-worn camera imagery. A final generalised linear mixed model was used to identify the difference between the amount of GPS minutes pre- and post-imputation for the activity categories of sedentary, light, and moderate-to-vigorous physical activity. Over 17% of the dataset was comprised of GPS data lapses. No strong associations were found between increasing lapse length and number of lapses and the demographic and built environment variables. A significant difference was found between the pre- and postimputation minutes for each activity category. No demographic or environmental bias was found for length or number of lapses, but imputation of GPS data may make a significant difference for inclusion of physical activity data that occurred during a lapse. Imputing GPS data lapses is a viable technique for returning spatial context to accelerometer data and improving the completeness of the dataset.

  8. Imputation of missing genotypes within LD-blocks relying on the basic coalescent and beyond: consideration of population growth and structure.

    Science.gov (United States)

    Kabisch, Maria; Hamann, Ute; Lorenzo Bermejo, Justo

    2017-10-17

    Genotypes not directly measured in genetic studies are often imputed to improve statistical power and to increase mapping resolution. The accuracy of standard imputation techniques strongly depends on the similarity of linkage disequilibrium (LD) patterns in the study and reference populations. Here we develop a novel approach for genotype imputation in low-recombination regions that relies on the coalescent and permits to explicitly account for population demographic factors. To test the new method, study and reference haplotypes were simulated and gene trees were inferred under the basic coalescent and also considering population growth and structure. The reference haplotypes that first coalesced with study haplotypes were used as templates for genotype imputation. Computer simulations were complemented with the analysis of real data. Genotype concordance rates were used to compare the accuracies of coalescent-based and standard (IMPUTE2) imputation. Simulations revealed that, in LD-blocks, imputation accuracy relying on the basic coalescent was higher and less variable than with IMPUTE2. Explicit consideration of population growth and structure, even if present, did not practically improve accuracy. The advantage of coalescent-based over standard imputation increased with the minor allele frequency and it decreased with population stratification. Results based on real data indicated that, even in low-recombination regions, further research is needed to incorporate recombination in coalescence inference, in particular for studies with genetically diverse and admixed individuals. To exploit the full potential of coalescent-based methods for the imputation of missing genotypes in genetic studies, further methodological research is needed to reduce computer time, to take into account recombination, and to implement these methods in user-friendly computer programs. Here we provide reproducible code which takes advantage of publicly available software to facilitate

  9. Missing Data and Multiple Imputation in the Context of Multivariate Analysis of Variance

    Science.gov (United States)

    Finch, W. Holmes

    2016-01-01

    Multivariate analysis of variance (MANOVA) is widely used in educational research to compare means on multiple dependent variables across groups. Researchers faced with the problem of missing data often use multiple imputation of values in place of the missing observations. This study compares the performance of 2 methods for combining p values in…

  10. Handbook of statistical methods single subject design

    CERN Document Server

    Satake, Eiki; Maxwell, David L

    2008-01-01

    This book is a practical guide of the most commonly used approaches in analyzing and interpreting single-subject data. It arranges the methodologies used in a logical sequence using an array of research studies from the existing published literature to illustrate specific applications. The book provides a brief discussion of each approach such as visual, inferential, and probabilistic model, the applications for which it is intended, and a step-by-step illustration of the test as used in an actual research study.

  11. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.

    Science.gov (United States)

    Lazar, Cosmin; Gatto, Laurent; Ferro, Myriam; Bruley, Christophe; Burger, Thomas

    2016-04-01

    Missing values are a genuine issue in label-free quantitative proteomics. Recent works have surveyed the different statistical methods to conduct imputation and have compared them on real or simulated data sets and recommended a list of missing value imputation methods for proteomics application. Although insightful, these comparisons do not account for two important facts: (i) depending on the proteomics data set, the missingness mechanism may be of different natures and (ii) each imputation method is devoted to a specific type of missingness mechanism. As a result, we believe that the question at stake is not to find the most accurate imputation method in general but instead the most appropriate one. We describe a series of comparisons that support our views: For instance, we show that a supposedly "under-performing" method (i.e., giving baseline average results), if applied at the "appropriate" time in the data-processing pipeline (before or after peptide aggregation) on a data set with the "appropriate" nature of missing values, can outperform a blindly applied, supposedly "better-performing" method (i.e., the reference method from the state-of-the-art). This leads us to formulate few practical guidelines regarding the choice and the application of an imputation method in a proteomics context.

  12. A brief introduction to single-molecule fluorescence methods

    NARCIS (Netherlands)

    Wildenberg, S.M.J.L.; Prevo, B.; Peterman, E.J.G.; Peterman, EJG; Wuite, GJL

    2011-01-01

    One of the more popular single-molecule approaches in biological science is single-molecule fluorescence microscopy, which is the subject of the following section of this volume. Fluorescence methods provide the sensitivity required to study biology on the single-molecule level, but they also allow

  13. A brief introduction to single-molecule fluorescence methods

    NARCIS (Netherlands)

    van den Wildenberg, Siet M.J.L.; Prevo, Bram; Peterman, Erwin J.G.

    2018-01-01

    One of the more popular single-molecule approaches in biological science is single-molecule fluorescence microscopy, which will be the subject of the following section of this volume. Fluorescence methods provide the sensitivity required to study biology on the single-molecule level, but they also

  14. Analysis of Case-Control Association Studies: SNPs, Imputation and Haplotypes

    KAUST Repository

    Chatterjee, Nilanjan

    2009-11-01

    Although prospective logistic regression is the standard method of analysis for case-control data, it has been recently noted that in genetic epidemiologic studies one can use the "retrospective" likelihood to gain major power by incorporating various population genetics model assumptions such as Hardy-Weinberg-Equilibrium (HWE), gene-gene and gene-environment independence. In this article we review these modern methods and contrast them with the more classical approaches through two types of applications (i) association tests for typed and untyped single nucleotide polymorphisms (SNPs) and (ii) estimation of haplotype effects and haplotype-environment interactions in the presence of haplotype-phase ambiguity. We provide novel insights to existing methods by construction of various score-tests and pseudo-likelihoods. In addition, we describe a novel two-stage method for analysis of untyped SNPs that can use any flexible external algorithm for genotype imputation followed by a powerful association test based on the retrospective likelihood. We illustrate applications of the methods using simulated and real data. © Institute of Mathematical Statistics, 2009.

  15. Genomic predictions for economically important traits in Brazilian Braford and Hereford beef cattle using true and imputed genotypes.

    Science.gov (United States)

    Piccoli, Mario L; Brito, Luiz F; Braccini, José; Cardoso, Fernando F; Sargolzaei, Mehdi; Schenkel, Flávio S

    2017-01-18

    Genomic selection (GS) has played an important role in cattle breeding programs. However, genotyping prices are still a challenge for implementation of GS in beef cattle and there is still a lack of information about the use of low-density Single Nucleotide Polymorphisms (SNP) chip panels for genomic predictions in breeds such as Brazilian Braford and Hereford. Therefore, this study investigated the effect of using imputed genotypes in the accuracy of genomic predictions for twenty economically important traits in Brazilian Braford and Hereford beef cattle. Various scenarios composed by different percentages of animals with imputed genotypes and different sizes of the training population were compared. De-regressed EBVs (estimated breeding values) were used as pseudo-phenotypes in a Genomic Best Linear Unbiased Prediction (GBLUP) model using two different mimicked panels derived from the 50 K (8 K and 15 K SNP panels), which were subsequently imputed to the 50 K panel. In addition, genomic prediction accuracies generated from a 777 K SNP (imputed from the 50 K SNP) were presented as another alternate scenario. The accuracy of genomic breeding values averaged over the twenty traits ranged from 0.38 to 0.40 across the different scenarios. The average losses in expected genomic estimated breeding values (GEBV) accuracy (accuracy obtained from the inverse of the mixed model equations) relative to the true 50 K genotypes ranged from -0.0007 to -0.0012 and from -0.0002 to -0.0005 when using the 50 K imputed from the 8 K or 15 K, respectively. When using the imputed 777 K panel the average losses in expected GEBV accuracy was -0.0021. The average gain in expected EBVs accuracy by including genomic information when compared to simple BLUP was between 0.02 and 0.03 across scenarios and traits. The percentage of animals with imputed genotypes in the training population did not significantly influence the validation accuracy. However, the size of the training

  16. The use of multiple imputation for the accurate measurements of individual feed intake by electronic feeders.

    Science.gov (United States)

    Jiao, S; Tiezzi, F; Huang, Y; Gray, K A; Maltecca, C

    2016-02-01

    Obtaining accurate individual feed intake records is the key first step in achieving genetic progress toward more efficient nutrient utilization in pigs. Feed intake records collected by electronic feeding systems contain errors (erroneous and abnormal values exceeding certain cutoff criteria), which are due to feeder malfunction or animal-feeder interaction. In this study, we examined the use of a novel data-editing strategy involving multiple imputation to minimize the impact of errors and missing values on the quality of feed intake data collected by an electronic feeding system. Accuracy of feed intake data adjustment obtained from the conventional linear mixed model (LMM) approach was compared with 2 alternative implementations of multiple imputation by chained equation, denoted as MI (multiple imputation) and MICE (multiple imputation by chained equation). The 3 methods were compared under 3 scenarios, where 5, 10, and 20% feed intake error rates were simulated. Each of the scenarios was replicated 5 times. Accuracy of the alternative error adjustment was measured as the correlation between the true daily feed intake (DFI; daily feed intake in the testing period) or true ADFI (the mean DFI across testing period) and the adjusted DFI or adjusted ADFI. In the editing process, error cutoff criteria are used to define if a feed intake visit contains errors. To investigate the possibility that the error cutoff criteria may affect any of the 3 methods, the simulation was repeated with 2 alternative error cutoff values. Multiple imputation methods outperformed the LMM approach in all scenarios with mean accuracies of 96.7, 93.5, and 90.2% obtained with MI and 96.8, 94.4, and 90.1% obtained with MICE compared with 91.0, 82.6, and 68.7% using LMM for DFI. Similar results were obtained for ADFI. Furthermore, multiple imputation methods consistently performed better than LMM regardless of the cutoff criteria applied to define errors. In conclusion, multiple imputation

  17. Genotype imputation in a tropical crossbred dairy cattle population.

    Science.gov (United States)

    Oliveira Júnior, Gerson A; Chud, Tatiane C S; Ventura, Ricardo V; Garrick, Dorian J; Cole, John B; Munari, Danísio P; Ferraz, José B S; Mullart, Erik; DeNise, Sue; Smith, Shannon; da Silva, Marcos Vinícius G B

    2017-12-01

    The objective of this study was to investigate different strategies for genotype imputation in a population of crossbred Girolando (Gyr × Holstein) dairy cattle. The data set consisted of 478 Girolando, 583 Gyr, and 1,198 Holstein sires genotyped at high density with the Illumina BovineHD (Illumina, San Diego, CA) panel, which includes ∼777K markers. The accuracy of imputation from low (20K) and medium densities (50K and 70K) to the HD panel density and from low to 50K density were investigated. Seven scenarios using different reference populations (RPop) considering Girolando, Gyr, and Holstein breeds separately or combinations of animals of these breeds were tested for imputing genotypes of 166 randomly chosen Girolando animals. The population genotype imputation were performed using FImpute. Imputation accuracy was measured as the correlation between observed and imputed genotypes (CORR) and also as the proportion of genotypes that were imputed correctly (CR). This is the first paper on imputation accuracy in a Girolando population. The sample-specific imputation accuracies ranged from 0.38 to 0.97 (CORR) and from 0.49 to 0.96 (CR) imputing from low and medium densities to HD, and 0.41 to 0.95 (CORR) and from 0.50 to 0.94 (CR) for imputation from 20K to 50K. The CORR anim exceeded 0.96 (for 50K and 70K panels) when only Girolando animals were included in RPop (S1). We found smaller CORR anim when Gyr (S2) was used instead of Holstein (S3) as RPop. The same behavior was observed between S4 (Gyr + Girolando) and S5 (Holstein + Girolando) because the target animals were more related to the Holstein population than to the Gyr population. The highest imputation accuracies were observed for scenarios including Girolando animals in the reference population, whereas using only Gyr animals resulted in low imputation accuracies, suggesting that the haplotypes segregating in the Girolando population had a greater effect on accuracy than the purebred haplotypes. All

  18. Methods for Gas Sensing with Single-Walled Carbon Nanotubes

    Science.gov (United States)

    Kaul, Anupama B. (Inventor)

    2013-01-01

    Methods for gas sensing with single-walled carbon nanotubes are described. The methods comprise biasing at least one carbon nanotube and exposing to a gas environment to detect variation in temperature as an electrical response.

  19. Comparing strategies for selection of low-density SNPs for imputation-mediated genomic prediction in U. S. Holsteins.

    Science.gov (United States)

    He, Jun; Xu, Jiaqi; Wu, Xiao-Lin; Bauck, Stewart; Lee, Jungjae; Morota, Gota; Kachman, Stephen D; Spangler, Matthew L

    2018-04-01

    SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821-0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825-0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction.

  20. Should "Multiple Imputations" Be Treated as "Multiple Indicators"?

    Science.gov (United States)

    Mislevy, Robert J.

    1993-01-01

    Multiple imputations for latent variables are constructed so that analyses treating them as true variables have the correct expectations for population characteristics. Analyzing multiple imputations in accordance with their construction yields correct estimates of population characteristics, whereas analyzing them as multiple indicators generally…

  1. mice: Multivariate Imputation by Chained Equations in R

    NARCIS (Netherlands)

    van Buuren, Stef; Groothuis-Oudshoorn, Catharina Gerarda Maria

    2011-01-01

    The R package mice imputes incomplete multivariate data by chained equations. The software mice 1.0 appeared in the year 2000 as an S-PLUS library, and in 2001 as an R package. mice 1.0 introduced predictor selection, passive imputation and automatic pooling. This article documents mice, which

  2. MICE: Multivariate Imputation by Chained Equations in R

    NARCIS (Netherlands)

    Buuren, S. van; Groothuis-Oudshoorn, K.

    2010-01-01

    Multivariate Imputation by Chained Equations (MICE) is the name of software for imputing incomplete multivariate data by Fully Conditional Speci cation (FCS). MICE V1.0 appeared in the year 2000 as an S-PLUS library, and in 2001 as an R package. MICE V1.0 introduced predictor selection, passive

  3. 12 CFR 367.9 - Imputation of causes.

    Science.gov (United States)

    2010-01-01

    ... 12 Banks and Banking 4 2010-01-01 2010-01-01 false Imputation of causes. 367.9 Section 367.9 Banks... SUSPENSION AND EXCLUSION OF CONTRACTOR AND TERMINATION OF CONTRACTS § 367.9 Imputation of causes. (a) Where there is cause to suspend and/or exclude any affiliated business entity of the contractor, that conduct...

  4. Factors associated with low birth weight in Nepal using multiple imputation

    Directory of Open Access Journals (Sweden)

    Usha Singh

    2017-02-01

    Full Text Available Abstract Background Survey data from low income countries on birth weight usually pose a persistent problem. The studies conducted on birth weight have acknowledged missing data on birth weight, but they are not included in the analysis. Furthermore, other missing data presented on determinants of birth weight are not addressed. Thus, this study tries to identify determinants that are associated with low birth weight (LBW using multiple imputation to handle missing data on birth weight and its determinants. Methods The child dataset from Nepal Demographic and Health Survey (NDHS, 2011 was utilized in this study. A total of 5,240 children were born between 2006 and 2011, out of which 87% had at least one measured variable missing and 21% had no recorded birth weight. All the analyses were carried out in R version 3.1.3. Transform-then impute method was applied to check for interaction between explanatory variables and imputed missing data. Survey package was applied to each imputed dataset to account for survey design and sampling method. Survey logistic regression was applied to identify the determinants associated with LBW. Results The prevalence of LBW was 15.4% after imputation. Women with the highest autonomy on their own health compared to those with health decisions involving husband or others (adjusted odds ratio (OR 1.87, 95% confidence interval (95% CI = 1.31, 2.67, and husband and women together (adjusted OR 1.57, 95% CI = 1.05, 2.35 were less likely to give birth to LBW infants. Mothers using highly polluting cooking fuels (adjusted OR 1.49, 95% CI = 1.03, 2.22 were more likely to give birth to LBW infants than mothers using non-polluting cooking fuels. Conclusion The findings of this study suggested that obtaining the prevalence of LBW from only the sample of measured birth weight and ignoring missing data results in underestimation.

  5. Nonlinear Imputation of PaO2/FIO2 From SpO2/FIO2 Among Mechanically Ventilated Patients in the ICU: A Prospective, Observational Study.

    Science.gov (United States)

    Brown, Samuel M; Duggal, Abhijit; Hou, Peter C; Tidswell, Mark; Khan, Akram; Exline, Matthew; Park, Pauline K; Schoenfeld, David A; Liu, Ming; Grissom, Colin K; Moss, Marc; Rice, Todd W; Hough, Catherine L; Rivers, Emanuel; Thompson, B Taylor; Brower, Roy G

    2017-08-01

    In the contemporary ICU, mechanically ventilated patients may not have arterial blood gas measurements available at relevant timepoints. Severity criteria often depend on arterial blood gas results. Retrospective studies suggest that nonlinear imputation of PaO2/FIO2 from SpO2/FIO2 is accurate, but this has not been established prospectively among mechanically ventilated ICU patients. The objective was to validate the superiority of nonlinear imputation of PaO2/FIO2 among mechanically ventilated patients and understand what factors influence the accuracy of imputation. Simultaneous SpO2, oximeter characteristics, receipt of vasopressors, and skin pigmentation were recorded at the time of a clinical arterial blood gas. Acute respiratory distress syndrome criteria were recorded. For each imputation method, we calculated both imputation error and the area under the curve for patients meeting criteria for acute respiratory distress syndrome (PaO2/FIO2 ≤ 300) and moderate-severe acute respiratory distress syndrome (PaO2/FIO2 ≤ 150). Nine hospitals within the Prevention and Early Treatment of Acute Lung Injury network. We prospectively enrolled 703 mechanically ventilated patients admitted to the emergency departments or ICUs of participating study hospitals. None. We studied 1,034 arterial blood gases from 703 patients; 650 arterial blood gases were associated with SpO2 less than or equal to 96%. Nonlinear imputation had consistently lower error than other techniques. Among all patients, nonlinear had a lower error (p < 0.001) and higher (p < 0.001) area under the curve (0.87; 95% CI, 0.85-0.90) for PaO2/FIO2 less than or equal to 300 than linear/log-linear (0.80; 95% CI, 0.76-0.83) imputation. All imputation methods better identified moderate-severe acute respiratory distress syndrome (PaO2/FIO2 ≤ 150); nonlinear imputation remained superior (p < 0.001). For PaO2/FIO2 less than or equal to 150, the sensitivity and specificity for nonlinear imputation were 0

  6. Multiple imputation to account for missing data in a survey: estimating the prevalence of osteoporosis.

    Science.gov (United States)

    Kmetic, Andrew; Joseph, Lawrence; Berger, Claudie; Tenenhouse, Alan

    2002-07-01

    Nonresponse bias is a concern in any epidemiologic survey in which a subset of selected individuals declines to participate. We reviewed multiple imputation, a widely applicable and easy to implement Bayesian methodology to adjust for nonresponse bias. To illustrate the method, we used data from the Canadian Multicentre Osteoporosis Study, a large cohort study of 9423 randomly selected Canadians, designed in part to estimate the prevalence of osteoporosis. Although subjects were randomly selected, only 42% of individuals who were contacted agreed to participate fully in the study. The study design included a brief questionnaire for those invitees who declined further participation in order to collect information on the major risk factors for osteoporosis. These risk factors (which included age, sex, previous fractures, family history of osteoporosis, and current smoking status) were then used to estimate the missing osteoporosis status for nonparticipants using multiple imputation. Both ignorable and nonignorable imputation models are considered. Our results suggest that selection bias in the study is of concern, but only slightly, in very elderly (age 80+ years), both women and men. Epidemiologists should consider using multiple imputation more often than is current practice.

  7. Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework.

    Science.gov (United States)

    Voillet, Valentin; Besse, Philippe; Liaubet, Laurence; San Cristobal, Magali; González, Ignacio

    2016-10-03

    In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue, we propose a multiple imputation (MI) approach in a multivariate framework. In this study, we focus on multiple factor analysis (MFA) as a tool to compare and integrate multiple layers of information. MI involves filling the missing rows with plausible values, resulting in M completed datasets. MFA is then applied to each completed dataset to produce M different configurations (the matrices of coordinates of individuals). Finally, the M configurations are combined to yield a single consensus solution. We assessed the performance of our method, named MI-MFA, on two real omics datasets. Incomplete artificial datasets with different patterns of missingness were created from these data. The MI-MFA results were compared with two other approaches i.e., regularized iterative MFA (RI-MFA) and mean variable imputation (MVI-MFA). For each configuration resulting from these three strategies, the suitability of the solution was determined against the true MFA configuration obtained from the original data and a comprehensive graphical comparison showing how the MI-, RI- or MVI-MFA configurations diverge from the true configuration was produced. Two approaches i.e., confidence ellipses and convex hulls, to visualize and assess the uncertainty due to missing values were also described. We showed how the areas of ellipses and convex hulls increased with the number of missing individuals. A free and easy-to-use code was proposed to implement the MI-MFA method in the R statistical environment. We believe that MI-MFA provides a useful and attractive method for estimating the coordinates of individuals on the first MFA components despite missing rows. MI-MFA configurations were close to the true

  8. Czochralski method of growing single crystals. State-of-art

    International Nuclear Information System (INIS)

    Bukowski, A.; Zabierowski, P.

    1999-01-01

    Modern Czochralski method of single crystal growing has been described. The example of Czochralski process is given. The advantages that caused the rapid progress of the method have been presented. The method limitations that motivated the further research and new solutions are also presented. As the example two different ways of the technique development has been described: silicon single crystals growth in the magnetic field; continuous liquid feed of silicon crystals growth. (author)

  9. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    DEFF Research Database (Denmark)

    Huang, Jie; Howie, Bryan; Mccarthy, Shane

    2015-01-01

    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low de...

  10. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    NARCIS (Netherlands)

    J. Huang (Jie); B. Howie (Bryan); S. McCarthy (Shane); Y. Memari (Yasin); K. Walter (Klaudia); J.L. Min (Josine L.); P. Danecek (Petr); G. Malerba (Giovanni); E. Trabetti (Elisabetta); H.-F. Zheng (Hou-Feng); G. Gambaro (Giovanni); J.B. Richards (Brent); R. Durbin (Richard); N.J. Timpson (Nicholas); J. Marchini (Jonathan); N. Soranzo (Nicole); S.H. Al Turki (Saeed); A. Amuzu (Antoinette); C. Anderson (Carl); R. Anney (Richard); D. Antony (Dinu); M.S. Artigas; M. Ayub (Muhammad); S. Bala (Senduran); J.C. Barrett (Jeffrey); I. Barroso (Inês); P.L. Beales (Philip); M. Benn (Marianne); J. Bentham (Jamie); S. Bhattacharya (Shoumo); E. Birney (Ewan); D.H.R. Blackwood (Douglas); M. Bobrow (Martin); E. Bochukova (Elena); P.F. Bolton (Patrick F.); R. Bounds (Rebecca); C. Boustred (Chris); G. Breen (Gerome); M. Calissano (Mattia); K. Carss (Keren); J.P. Casas (Juan Pablo); J.C. Chambers (John C.); R. Charlton (Ruth); K. Chatterjee (Krishna); L. Chen (Lu); A. Ciampi (Antonio); S. Cirak (Sebahattin); P. Clapham (Peter); G. Clement (Gail); G. Coates (Guy); M. Cocca (Massimiliano); D.A. Collier (David); C. Cosgrove (Catherine); T. Cox (Tony); N.J. Craddock (Nick); L. Crooks (Lucy); S. Curran (Sarah); D. Curtis (David); A. Daly (Allan); I.N.M. Day (Ian N.M.); A.G. Day-Williams (Aaron); G.V. Dedoussis (George); T. Down (Thomas); Y. Du (Yuanping); C.M. van Duijn (Cornelia); I. Dunham (Ian); T. Edkins (Ted); R. Ekong (Rosemary); P. Ellis (Peter); D.M. Evans (David); I.S. Farooqi (I. Sadaf); D.R. Fitzpatrick (David R.); P. Flicek (Paul); J. Floyd (James); A.R. Foley (A. Reghan); C.S. Franklin (Christopher S.); M. Futema (Marta); L. Gallagher (Louise); P. Gasparini (Paolo); T.R. Gaunt (Tom); M. Geihs (Matthias); D. Geschwind (Daniel); C.M.T. Greenwood (Celia); H. Griffin (Heather); D. Grozeva (Detelina); X. Guo (Xiaosen); X. Guo (Xueqin); H. Gurling (Hugh); D. Hart (Deborah); A.E. Hendricks (Audrey E.); P.A. Holmans (Peter A.); L. Huang (Liren); T. Hubbard (Tim); S.E. Humphries (Steve E.); M.E. Hurles (Matthew); P.G. Hysi (Pirro); V. Iotchkova (Valentina); A. Isaacs (Aaron); D.K. Jackson (David K.); Y. Jamshidi (Yalda); J. Johnson (Jon); C. Joyce (Chris); K.J. Karczewski (Konrad); J. Kaye (Jane); T. Keane (Thomas); J.P. Kemp (John); K. Kennedy (Karen); A. Kent (Alastair); J. Keogh (Julia); F. Khawaja (Farrah); M.E. Kleber (Marcus); M. Van Kogelenberg (Margriet); A. Kolb-Kokocinski (Anja); J.S. Kooner (Jaspal S.); G. Lachance (Genevieve); C. Langenberg (Claudia); C. Langford (Cordelia); D. Lawson (Daniel); I. Lee (Irene); E.M. van Leeuwen (Elisa); M. Lek (Monkol); R. Li (Rui); Y. Li (Yingrui); J. Liang (Jieqin); H. Lin (Hong); R. Liu (Ryan); J. Lönnqvist (Jouko); L.R. Lopes (Luis R.); M.C. Lopes (Margarida); J. Luan; D.G. MacArthur (Daniel G.); M. Mangino (Massimo); G. Marenne (Gaëlle); W. März (Winfried); J. Maslen (John); A. Matchan (Angela); I. Mathieson (Iain); P. McGuffin (Peter); A.M. McIntosh (Andrew); A.G. McKechanie (Andrew G.); A. McQuillin (Andrew); S. Metrustry (Sarah); N. Migone (Nicola); H.M. Mitchison (Hannah M.); A. Moayyeri (Alireza); J. Morris (James); R. Morris (Richard); D. Muddyman (Dawn); F. Muntoni; B.G. Nordestgaard (Børge G.); K. Northstone (Kate); M.C. O'donovan (Michael); S. O'Rahilly (Stephen); A. Onoufriadis (Alexandros); K. Oualkacha (Karim); M.J. Owen (Michael J.); A. Palotie (Aarno); K. Panoutsopoulou (Kalliope); V. Parker (Victoria); J.R. Parr (Jeremy R.); L. Paternoster (Lavinia); T. Paunio (Tiina); F. Payne (Felicity); S.J. Payne (Stewart J.); J.R.B. Perry (John); O.P.H. Pietiläinen (Olli); V. Plagnol (Vincent); R.C. Pollitt (Rebecca C.); S. Povey (Sue); M.A. Quail (Michael A.); L. Quaye (Lydia); L. Raymond (Lucy); K. Rehnström (Karola); C.K. Ridout (Cheryl K.); S.M. Ring (Susan); G.R.S. Ritchie (Graham R.S.); N. Roberts (Nicola); R.L. Robinson (Rachel L.); D.B. Savage (David); P.J. Scambler (Peter); S. Schiffels (Stephan); M. Schmidts (Miriam); N. Schoenmakers (Nadia); R.H. Scott (Richard H.); R.A. Scott (Robert); R.K. Semple (Robert K.); E. Serra (Eva); S.I. Sharp (Sally I.); A.C. Shaw (Adam C.); H.A. Shihab (Hashem A.); S.-Y. Shin (So-Youn); D. Skuse (David); K.S. Small (Kerrin); C. Smee (Carol); G.D. Smith; L. Southam (Lorraine); O. Spasic-Boskovic (Olivera); T.D. Spector (Timothy); D. St. Clair (David); B. St Pourcain (Beate); J. Stalker (Jim); E. Stevens (Elizabeth); J. Sun (Jianping); G. Surdulescu (Gabriela); J. Suvisaari (Jaana); P. Syrris (Petros); I. Tachmazidou (Ioanna); R. Taylor (Rohan); J. Tian (Jing); M.D. Tobin (Martin); D. Toniolo (Daniela); M. Traglia (Michela); A. Tybjaerg-Hansen; A.M. Valdes; A.M. Vandersteen (Anthony M.); A. Varbo (Anette); P. Vijayarangakannan (Parthiban); P.M. Visscher (Peter); L.V. Wain (Louise); J.T. Walters (James); G. Wang (Guangbiao); J. Wang (Jun); Y. Wang (Yu); K. Ward (Kirsten); E. Wheeler (Eleanor); P.H. Whincup (Peter); T. Whyte (Tamieka); H.J. Williams (Hywel J.); K.A. Williamson (Kathleen); C. Wilson (Crispian); S.G. Wilson (Scott); K. Wong (Kim); C. Xu (Changjiang); J. Yang (Jian); G. Zaza (Gianluigi); E. Zeggini (Eleftheria); F. Zhang (Feng); P. Zhang (Pingbo); W. Zhang (Weihua)

    2015-01-01

    textabstractImputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced

  11. Differential network analysis with multiply imputed lipidomic data.

    Directory of Open Access Journals (Sweden)

    Maiju Kujala

    Full Text Available The importance of lipids for cell function and health has been widely recognized, e.g., a disorder in the lipid composition of cells has been related to atherosclerosis caused cardiovascular disease (CVD. Lipidomics analyses are characterized by large yet not a huge number of mutually correlated variables measured and their associations to outcomes are potentially of a complex nature. Differential network analysis provides a formal statistical method capable of inferential analysis to examine differences in network structures of the lipids under two biological conditions. It also guides us to identify potential relationships requiring further biological investigation. We provide a recipe to conduct permutation test on association scores resulted from partial least square regression with multiple imputed lipidomic data from the LUdwigshafen RIsk and Cardiovascular Health (LURIC study, particularly paying attention to the left-censored missing values typical for a wide range of data sets in life sciences. Left-censored missing values are low-level concentrations that are known to exist somewhere between zero and a lower limit of quantification. To make full use of the LURIC data with the missing values, we utilize state of the art multiple imputation techniques and propose solutions to the challenges that incomplete data sets bring to differential network analysis. The customized network analysis helps us to understand the complexities of the underlying biological processes by identifying lipids and lipid classes that interact with each other, and by recognizing the most important differentially expressed lipids between two subgroups of coronary artery disease (CAD patients, the patients that had a fatal CVD event and the ones who remained stable during two year follow-up.

  12. Missing data in longitudinal studies: cross-sectional multiple imputation provides similar estimates to full-information maximum likelihood.

    Science.gov (United States)

    Ferro, Mark A

    2014-01-01

    The aim of this research was to examine, in an exploratory manner, whether cross-sectional multiple imputation generates valid parameter estimates for a latent growth curve model in a longitudinal data set with nonmonotone missingness. A simulated longitudinal data set of N = 5000 was generated and consisted of a continuous dependent variable, assessed at three measurement occasions and a categorical time-invariant independent variable. Missing data had a nonmonotone pattern and the proportion of missingness increased from the initial to the final measurement occasion (5%-20%). Three methods were considered to deal with missing data: listwise deletion, full-information maximum likelihood, and multiple imputation. A latent growth curve model was specified and analysis of variance was used to compare parameter estimates between the full data set and missing data approaches. Multiple imputation resulted in significantly lower slope variance compared with the full data set. There were no differences in any parameter estimates between the multiple imputation and full-information maximum likelihood approaches. This study suggested that in longitudinal studies with nonmonotone missingness, cross-sectional imputation at each time point may be viable and produces estimates comparable with those obtained with full-information maximum likelihood. Future research pursuing the validity of this method is warranted. Copyright © 2014 Elsevier Inc. All rights reserved.

  13. A new method of preparing single-walled carbon nanotubes

    Indian Academy of Sciences (India)

    A novel method of purification for single-walled carbon nanotubes, prepared by an arc-discharge method, is described. The method involves a combination of acid washing followed by high temperature hydrogen treatment to remove the metal nanoparticles and amorphous carbon present in the as-synthesized singlewalled ...

  14. mice : Multivariate Imputation by Chained Equations in R

    Directory of Open Access Journals (Sweden)

    Stef van Buuren

    2011-12-01

    Full Text Available The R package mice imputes incomplete multivariate data by chained equations. The software mice 1.0 appeared in the year 2000 as anS-PLUS library, and in 2001 as an R package. mice 1.0 introduced predictor selection, passive imputation and automatic pooling. This article documents mice 2.9, which extends the functionality ofmice 1.0 in several ways. In mice 2.9, the analysis of imputed data is made completely general, whereas the range of models under which pooling works is substantially extended. mice 2.9 adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs. Imputation of categorical data is improved in order to bypassproblems caused by perfect prediction. Special attention is paid to transformations, sum scores, indices and interactions using passive imputation, and to the proper setup of the predictor matrix. mice 2.9 can be downloaded from the Comprehensive R Archive Network. This article provides a hands-on, stepwise approach to solve applied incomplete data problems.

  15. Imputing Missing Race/Ethnicity in Pediatric Electronic Health Records: Reducing Bias with Use of U.S. Census Location and Surname Data.

    Science.gov (United States)

    Grundmeier, Robert W; Song, Lihai; Ramos, Mark J; Fiks, Alexander G; Elliott, Marc N; Fremont, Allen; Pace, Wilson; Wasserman, Richard C; Localio, Russell

    2015-08-01

    To assess the utility of imputing race/ethnicity using U.S. Census race/ethnicity, residential address, and surname information compared to standard missing data methods in a pediatric cohort. Electronic health record data from 30 pediatric practices with known race/ethnicity. In a simulation experiment, we constructed dichotomous and continuous outcomes with pre-specified associations with known race/ethnicity. Bias was introduced by nonrandomly setting race/ethnicity to missing. We compared typical methods for handling missing race/ethnicity (multiple imputation alone with clinical factors, complete case analysis, indicator variables) to multiple imputation incorporating surname and address information. Imputation using U.S. Census information reduced bias for both continuous and dichotomous outcomes. The new method reduces bias when race/ethnicity is partially, nonrandomly missing. © Health Research and Educational Trust.

  16. Multiple imputation in the presence of non-normal data.

    Science.gov (United States)

    Lee, Katherine J; Carlin, John B

    2017-02-20

    Multiple imputation (MI) is becoming increasingly popular for handling missing data. Standard approaches for MI assume normality for continuous variables (conditionally on the other variables in the imputation model). However, it is unclear how to impute non-normally distributed continuous variables. Using simulation and a case study, we compared various transformations applied prior to imputation, including a novel non-parametric transformation, to imputation on the raw scale and using predictive mean matching (PMM) when imputing non-normal data. We generated data from a range of non-normal distributions, and set 50% to missing completely at random or missing at random. We then imputed missing values on the raw scale, following a zero-skewness log, Box-Cox or non-parametric transformation and using PMM with both type 1 and 2 matching. We compared inferences regarding the marginal mean of the incomplete variable and the association with a fully observed outcome. We also compared results from these approaches in the analysis of depression and anxiety symptoms in parents of very preterm compared with term-born infants. The results provide novel empirical evidence that the decision regarding how to impute a non-normal variable should be based on the nature of the relationship between the variables of interest. If the relationship is linear in the untransformed scale, transformation can introduce bias irrespective of the transformation used. However, if the relationship is non-linear, it may be important to transform the variable to accurately capture this relationship. A useful alternative is to impute the variable using PMM with type 1 matching. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  17. Uncovering nativity disparities in cancer patterns: Multiple imputation strategy to handle missing nativity data in the Surveillance, Epidemiology, and End Results data file.

    Science.gov (United States)

    Montealegre, Jane R; Zhou, Renke; Amirian, E Susan; Scheurer, Michael E

    2014-04-15

    Although birthplace data are routinely collected in the participating Surveillance, Epidemiology, and End Results (SEER) registries, such data are missing in a nonrandom manner for a large percentage of cases. This hinders analysis of nativity-related cancer disparities. In the current study, the authors evaluated multiple imputation of nativity status among Hispanic patients diagnosed with cervical, prostate, and colorectal cancer and demonstrated the effect of multiple imputation on apparent nativity disparities in survival. Multiple imputation by logistic regression was used to generate nativity values (US-born vs foreign-born) using a priori-defined variables. The accuracy of the method was evaluated among a subset of cases. Kaplan-Meier curves were used to illustrate the effect of imputation by comparing survival among US-born and foreign-born Hispanics, with and without imputation of nativity. Birthplace was missing for 31%, 49%, and 39%, respectively, of cases of cervical, prostate, and colorectal cancer. The sensitivity of the imputation strategy for detecting foreign-born status was ≥90% and the specificity was ≥86%. The agreement between the true and imputed values was ≥0.80 and the misclassification error was ≤10%. Kaplan-Meier survival curves indicated different associations between nativity and survival when nativity was imputed versus when cases with missing birthplace were omitted from the analysis. Multiple imputation using variables available in the SEER data file can be used to accurately detect foreign-born status. This simple strategy may help researchers to disaggregate analyses by nativity and uncover important nativity disparities in regard to cancer diagnosis, treatment, and survival. © 2013 American Cancer Society.

  18. Freedom of the Will and Legal Imputability in Schopenhauer

    Directory of Open Access Journals (Sweden)

    Renato César Cardoso

    2015-12-01

    Full Text Available The present article aims to analyze Arthur Schopenhauer's criticism of the postulation that freedom of the will is the condition of possibility of legal imputability. According to the philosopher, an intellectually determinable will, not an unconditioned will, is what would be the true enabler of state imputability. In conclusion, we argue that it is with the potential of change of the agent, and not with the culpability, that society and the state should be concerned. This means that, according to Schopenhauer, an alternative and deterministic conception like yours, contrary to what is often said, does not compromise, but enhances the imputability, which is why there is nothing to fear.

  19. Comparison of multiple imputation and complete-case in a simulated longitudinal data with missing covariate

    Science.gov (United States)

    Yoke, Chin Wan; Khalid, Zarina Mohd

    2014-07-01

    Along a continual process of collecting data, missing recorded datum always a main problem faced by the real application. It happens due to the carelessness or the unawareness of a recorder to the importance of data documentation. In this study, a random-effects analysis which simulates data from a proposed algorithm is presented with a missing covariate. It is an improved simulation method which involves first-order autoregressive (AR(1)) process in measuring the correlation between measurements of a subject across two time sequence. Complete-case analysis and multiple imputation method are comparatively implemented for the estimation procedure. This study shows that the multiple imputation method results in estimations which fit well to the data which are not only missing completely at random (MCAR) but also missing at random (MAR). However, the complete-case analysis results in estimators which fit well to the data which are only MCAR.

  20. Combination of individual tree detection and area-based approach in imputation of forest variables using airborne laser data

    Science.gov (United States)

    Vastaranta, Mikko; Kankare, Ville; Holopainen, Markus; Yu, Xiaowei; Hyyppä, Juha; Hyyppä, Hannu

    2012-01-01

    The two main approaches to deriving forest variables from laser-scanning data are the statistical area-based approach (ABA) and individual tree detection (ITD). With ITD it is feasible to acquire single tree information, as in field measurements. Here, ITD was used for measuring training data for the ABA. In addition to automatic ITD (ITD auto), we tested a combination of ITD auto and visual interpretation (ITD visual). ITD visual had two stages: in the first, ITD auto was carried out and in the second, the results of the ITD auto were visually corrected by interpreting three-dimensional laser point clouds. The field data comprised 509 circular plots ( r = 10 m) that were divided equally for testing and training. ITD-derived forest variables were used for training the ABA and the accuracies of the k-most similar neighbor ( k-MSN) imputations were evaluated and compared with the ABA trained with traditional measurements. The root-mean-squared error (RMSE) in the mean volume was 24.8%, 25.9%, and 27.2% with the ABA trained with field measurements, ITD auto, and ITD visual, respectively. When ITD methods were applied in acquiring training data, the mean volume, basal area, and basal area-weighted mean diameter were underestimated in the ABA by 2.7-9.2%. This project constituted a pilot study for using ITD measurements as training data for the ABA. Further studies are needed to reduce the bias and to determine the accuracy obtained in imputation of species-specific variables. The method could be applied in areas with sparse road networks or when the costs of fieldwork must be minimized.

  1. Bootstrap imputation with a disease probability model minimized bias from misclassification due to administrative database codes.

    Science.gov (United States)

    van Walraven, Carl

    2017-04-01

    Diagnostic codes used in administrative databases cause bias due to misclassification of patient disease status. It is unclear which methods minimize this bias. Serum creatinine measures were used to determine severe renal failure status in 50,074 hospitalized patients. The true prevalence of severe renal failure and its association with covariates were measured. These were compared to results for which renal failure status was determined using surrogate measures including the following: (1) diagnostic codes; (2) categorization of probability estimates of renal failure determined from a previously validated model; or (3) bootstrap methods imputation of disease status using model-derived probability estimates. Bias in estimates of severe renal failure prevalence and its association with covariates were minimal when bootstrap methods were used to impute renal failure status from model-based probability estimates. In contrast, biases were extensive when renal failure status was determined using codes or methods in which model-based condition probability was categorized. Bias due to misclassification from inaccurate diagnostic codes can be minimized using bootstrap methods to impute condition status using multivariable model-derived probability estimates. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Imputation-based analysis of association studies: candidate regions and quantitative traits.

    Directory of Open Access Journals (Sweden)

    Bertrand Servin

    2007-07-01

    Full Text Available We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest with genotype data at tag SNPs collected on a phenotyped study sample, to estimate ("impute" unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene, the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html.

  3. An Asymmetrical Space Vector Method for Single Phase Induction Motor

    DEFF Research Database (Denmark)

    Cui, Yuanhai; Blaabjerg, Frede; Andersen, Gert Karmisholt

    2002-01-01

    Single phase induction motors are the workhorses in low-power applications in the world, and also the variable speed is necessary. Normally it is achieved either by the mechanical method or by controlling the capacitor connected with the auxiliary winding. Any above method has some drawback which...

  4. Cheap arbitrary high order methods for single integrand SDEs

    DEFF Research Database (Denmark)

    Debrabant, Kristian; Kværnø, Anne

    2017-01-01

    For a particular class of Stratonovich SDE problems, here denoted as single integrand SDEs, we prove that by applying a deterministic Runge-Kutta method of order $p_d$ we obtain methods converging in the mean-square and weak sense with order $\\lfloor p_d/2\\rfloor$. The reason is that the B-series...

  5. Multiple imputation for time series data with Amelia package.

    Science.gov (United States)

    Zhang, Zhongheng

    2016-02-01

    Time series data are common in medical researches. Many laboratory variables or study endpoints could be measured repeatedly over time. Multiple imputation (MI) without considering time trend of a variable may cause it to be unreliable. The article illustrates how to perform MI by using Amelia package in a clinical scenario. Amelia package is powerful in that it allows for MI for time series data. External information on the variable of interest can also be incorporated by using prior or bound argument. Such information may be based on previous published observations, academic consensus, and personal experience. Diagnostics of imputation model can be performed by examining the distributions of imputed and observed values, or by using over-imputation technique.

  6. Is there a role for expectation maximization imputation in addressing missing data in research using WOMAC questionnaire? Comparison to the standard mean approach and a tutorial

    Directory of Open Access Journals (Sweden)

    Rutledge John

    2011-05-01

    Full Text Available Abstract Background Standard mean imputation for missing values in the Western Ontario and Mc Master (WOMAC Osteoarthritis Index limits the use of collected data and may lead to bias. Probability model-based imputation methods overcome such limitations but were never before applied to the WOMAC. In this study, we compare imputation results for the Expectation Maximization method (EM and the mean imputation method for WOMAC in a cohort of total hip replacement patients. Methods WOMAC data on a consecutive cohort of 2062 patients scheduled for surgery were analyzed. Rates of missing values in each of the WOMAC items from this large cohort were used to create missing patterns in the subset of patients with complete data. EM and the WOMAC's method of imputation are then applied to fill the missing values. Summary score statistics for both methods are then described through box-plot and contrasted with the complete case (CC analysis and the true score (TS. This process is repeated using a smaller sample size of 200 randomly drawn patients with higher missing rate (5 times the rates of missing values observed in the 2062 patients capped at 45%. Results Rate of missing values per item ranged from 2.9% to 14.5% and 1339 patients had complete data. Probability model-based EM imputed a score for all subjects while WOMAC's imputation method did not. Mean subscale scores were very similar for both imputation methods and were similar to the true score; however, the EM method results were more consistent with the TS after simulation. This difference became more pronounced as the number of items in a subscale increased and the sample size decreased. Conclusions The EM method provides a better alternative to the WOMAC imputation method. The EM method is more accurate and imputes data to create a complete data set. These features are very valuable for patient-reported outcomes research in which resources are limited and the WOMAC score is used in a multivariate

  7. Single-Frame Attitude Determination Methods for Nanosatellites

    Directory of Open Access Journals (Sweden)

    Guler Demet Cilden

    2017-06-01

    Full Text Available Single-frame methods of determining the attitude of a nanosatellite are compared in this study. The methods selected for comparison are: Single Value Decomposition (SVD, q method, Quaternion ESTimator (QUEST, Fast Optimal Attitude Matrix (FOAM − all solving optimally the Wahba’s problem, and the algebraic method using only two vector measurements. For proper comparison, two sensors are chosen for the vector observations on-board: magnetometer and Sun sensors. Covariance results obtained as a result of using those methods have a critical importance for a non-traditional attitude estimation approach; therefore, the variance calculations are also presented. The examined methods are compared with respect to their root mean square (RMS error and variance results. Also, some recommendations are given.

  8. The UIC 406 capacity method used on single track sections

    DEFF Research Database (Denmark)

    Landex, Alex; Kaas, Anders H.; Jacobsen, Erik M.

    2007-01-01

    follow each other in the same direction. Anyway, special care has to be shown to how to expound the UIC 406 capacity method in specific cases. Therefore, this paper discusses where to divide the railway lines into line sections and how crossing stations and junctions and conflicts when entering......This paper describes the relatively new UIC 406 capacity method which is an easy and effective way of calculating capacity consumption on railway lines. However, it is possible to expound the method in different ways which can lead to different capacity consumptions. This paper describes the UIC...... 406 method for single track lines and how it is expounded in Denmark. Many capacity analyses using the UIC 406 capacity method for double track lines have been carried out and presented internationally but only few capacity analyses using the UIC 406 capacity method on single track lines have been...

  9. Compositions and methods for detecting single nucleotide polymorphisms

    Energy Technology Data Exchange (ETDEWEB)

    Yeh, Hsin-Chih; Werner, James; Martinez, Jennifer S.

    2016-11-22

    Described herein are nucleic acid based probes and methods for discriminating and detecting single nucleotide variants in nucleic acid molecules (e.g., DNA). The methods include use of a pair of probes can be used to detect and identify polymorphisms, for example single nucleotide polymorphism in DNA. The pair of probes emit a different fluorescent wavelength of light depending on the association and alignment of the probes when hybridized to a target nucleic acid molecule. Each pair of probes is capable of discriminating at least two different nucleic acid molecules that differ by at least a single nucleotide difference. The methods can probes can be used, for example, for detection of DNA polymorphisms that are indicative of a particular disease or condition.

  10. Development of two dimensional electrophoresis method using single chain DNA

    International Nuclear Information System (INIS)

    Ikeda, Junichi; Hidaka, So

    1998-01-01

    By combining a separation method due to molecular weight and a method to distinguish difference of mono-bases, it was aimed to develop a two dimensional single chain DNA labeled with Radioisotope (RI). From electrophoretic pattern difference of parent and variant strands, it was investigated to isolate the root module implantation control gene. At first, a Single Strand Conformation Polymorphism (SSCP) method using concentration gradient gel was investigated. As a result, it was formed that intervals between double chain and single chain DNAs expanded, but intervals of both single chain DNAs did not expand. On next, combination of non-modified acrylic amide electrophoresis method and Denaturing Gradient-Gel Electrophoresis (DGGE) method was examined. As a result, hybrid DNA developed by two dimensional electrophoresis arranged on two lines. But, among them a band of DNA modified by high concentration of urea could not be found. Therefore, in this fiscal year's experiments, no preferable result could be obtained. By the used method, it was thought to be impossible to detect the differences. (G.K.)

  11. Missing Value Imputation Using Contemporary Computer Capabilities: An Application to Financial Statements Data in Large Panels

    Directory of Open Access Journals (Sweden)

    Ales Gorisek

    2017-03-01

    Full Text Available This paper addresses an evaluation of the methods for automatic item imputation to large datasets with missing data in the particular setting of financial data often used in economic and business settings. The paper aims to bridge the gap between purely methodological papers concerned with individual imputation techniques with their implementation algorithms and common practices of missing value treatment in social sciences and other research. Historical methods for handling the missing values are rendered obsolete with the rise of cheap computing power. Regardless of the condition of input data, various computer programs and software packages almost always return some results. In spite of this fact, item imputation in scientific research should be executed only to reproduce reality, not to create a new one. In the review papers comparing different methods we usually find data on performance of algorithms on artificial datasets. However, on a simulated dataset that replicates a real-life financial database, we show, that algorithms different from the ones that perform best on purely artificial datasets, may perform better.

  12. Data Editing and Imputation in Business Surveys Using “R”

    Directory of Open Access Journals (Sweden)

    Elena Romascanu

    2014-06-01

    Full Text Available Purpose – Missing data are a recurring problem that can cause bias or lead to inefficient analyses. The objective of this paper is a direct comparison between the two statistical software features R and SPSS, in order to take full advantage of the existing automated methods for data editing process and imputation in business surveys (with a proper design of consistency rules as a partial alternative to the manual editing of data. Approach – The comparison of different methods on editing surveys data, in R with the ‘editrules’ and ‘survey’ packages because inside those, exist commonly used transformations in official statistics, as visualization of missing values pattern using ‘Amelia’ and ‘VIM’ packages, imputation approaches for longitudinal data using ‘VIMGUI’ and a comparison of another statistical software performance on the same features, such as SPSS. Findings – Data on business statistics received by NIS’s (National Institute of Statistics are not ready to be used for direct analysis due to in-record inconsistencies, errors and missing values from the collected data sets. The appropriate automatic methods from R packages, offers the ability to set the erroneous fields in edit-violating records, to verify the results after the imputation of missing values providing for users a flexible, less time consuming approach and easy to perform automation in R than in SPSS Macros syntax situations, when macros are very handy.

  13. Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation.

    Science.gov (United States)

    Wahl, Simone; Boulesteix, Anne-Laure; Zierer, Astrid; Thorand, Barbara; van de Wiel, Mark A

    2016-10-26

    Missing values are a frequent issue in human studies. In many situations, multiple imputation (MI) is an appropriate missing data handling strategy, whereby missing values are imputed multiple times, the analysis is performed in every imputed data set, and the obtained estimates are pooled. If the aim is to estimate (added) predictive performance measures, such as (change in) the area under the receiver-operating characteristic curve (AUC), internal validation strategies become desirable in order to correct for optimism. It is not fully understood how internal validation should be combined with multiple imputation. In a comprehensive simulation study and in a real data set based on blood markers as predictors for mortality, we compare three combination strategies: Val-MI, internal validation followed by MI on the training and test parts separately, MI-Val, MI on the full data set followed by internal validation, and MI(-y)-Val, MI on the full data set omitting the outcome followed by internal validation. Different validation strategies, including bootstrap und cross-validation, different (added) performance measures, and various data characteristics are considered, and the strategies are evaluated with regard to bias and mean squared error of the obtained performance estimates. In addition, we elaborate on the number of resamples and imputations to be used, and adopt a strategy for confidence interval construction to incomplete data. Internal validation is essential in order to avoid optimism, with the bootstrap 0.632+ estimate representing a reliable method to correct for optimism. While estimates obtained by MI-Val are optimistically biased, those obtained by MI(-y)-Val tend to be pessimistic in the presence of a true underlying effect. Val-MI provides largely unbiased estimates, with a slight pessimistic bias with increasing true effect size, number of covariates and decreasing sample size. In Val-MI, accuracy of the estimate is more strongly improved by

  14. Practical considerations for sensitivity analysis after multiple imputation applied to epidemiological studies with incomplete data

    Science.gov (United States)

    2012-01-01

    Background Multiple Imputation as usually implemented assumes that data are Missing At Random (MAR), meaning that the underlying missing data mechanism, given the observed data, is independent of the unobserved data. To explore the sensitivity of the inferences to departures from the MAR assumption, we applied the method proposed by Carpenter et al. (2007). This approach aims to approximate inferences under a Missing Not At random (MNAR) mechanism by reweighting estimates obtained after multiple imputation where the weights depend on the assumed degree of departure from the MAR assumption. Methods The method is illustrated with epidemiological data from a surveillance system of hepatitis C virus (HCV) infection in France during the 2001–2007 period. The subpopulation studied included 4343 HCV infected patients who reported drug use. Risk factors for severe liver disease were assessed. After performing complete-case and multiple imputation analyses, we applied the sensitivity analysis to 3 risk factors of severe liver disease: past excessive alcohol consumption, HIV co-infection and infection with HCV genotype 3. Results In these data, the association between severe liver disease and HIV was underestimated, if given the observed data the chance of observing HIV status is high when this is positive. Inference for two other risk factors were robust to plausible local departures from the MAR assumption. Conclusions We have demonstrated the practical utility of, and advocate, a pragmatic widely applicable approach to exploring plausible departures from the MAR assumption post multiple imputation. We have developed guidelines for applying this approach to epidemiological studies. PMID:22681630

  15. Method for harvesting rare earth barium copper oxide single crystals

    Science.gov (United States)

    Todt, Volker R.; Sengupta, Suvankar; Shi, Donglu

    1996-01-01

    A method of preparing high temperature superconductor single crystals. The method of preparation involves preparing precursor materials of a particular composition, heating the precursor material to achieve a peritectic mixture of peritectic liquid and crystals of the high temperature superconductor, cooling the peritectic mixture to quench directly the mixture on a porous, wettable inert substrate to wick off the peritectic liquid, leaving single crystals of the high temperature superconductor on the porous substrate. Alternatively, the peritectic mixture can be cooled to a solid mass and reheated on a porous, inert substrate to melt the matrix of peritectic fluid while leaving the crystals melted, allowing the wicking away of the peritectic liquid.

  16. Neural Models for Imputation of Missing Ozone Data in Air-Quality Datasets

    Directory of Open Access Journals (Sweden)

    Ángel Arroyo

    2018-01-01

    Full Text Available Ozone is one of the pollutants with most negative effects on human health and in general on the biosphere. Many data-acquisition networks collect data about ozone values in both urban and background areas. Usually, these data are incomplete or corrupt and the imputation of the missing values is a priority in order to obtain complete datasets, solving the uncertainty and vagueness of existing problems to manage complexity. In the present paper, multiple-regression techniques and Artificial Neural Network models are applied to approximate the absent ozone values from five explanatory variables containing air-quality information. To compare the different imputation methods, real-life data from six data-acquisition stations from the region of Castilla y León (Spain are gathered in different ways and then analyzed. The results obtained in the estimation of the missing values by applying these techniques and models are compared, analyzing the possible causes of the given response.

  17. A Brief Introduction to Single-Molecule Fluorescence Methods.

    Science.gov (United States)

    van den Wildenberg, Siet M J L; Prevo, Bram; Peterman, Erwin J G

    2018-01-01

    One of the more popular single-molecule approaches in biological science is single-molecule fluorescence microscopy, which will be the subject of the following section of this volume. Fluorescence methods provide the sensitivity required to study biology on the single-molecule level, but they also allow access to useful measurable parameters on time and length scales relevant for the biomolecular world. Before several detailed experimental approaches will be addressed, we will first give a general overview of single-molecule fluorescence microscopy. We start with discussing the phenomenon of fluorescence in general and the history of single-molecule fluorescence microscopy. Next, we will review fluorescent probes in more detail and the equipment required to visualize them on the single-molecule level. We will end with a description of parameters measurable with such approaches, ranging from protein counting and tracking, single-molecule localization super-resolution microscopy, to distance measurements with Förster Resonance Energy Transfer and orientation measurements with fluorescence polarization.

  18. A Note on the Effect of Data Clustering on the Multiple-Imputation Variance Estimator: A Theoretical Addendum to the Lewis et al. article in JOS 2014

    Directory of Open Access Journals (Sweden)

    He Yulei

    2016-03-01

    Full Text Available Multiple imputation is a popular approach to handling missing data. Although it was originally motivated by survey nonresponse problems, it has been readily applied to other data settings. However, its general behavior still remains unclear when applied to survey data with complex sample designs, including clustering. Recently, Lewis et al. (2014 compared single- and multiple-imputation analyses for certain incomplete variables in the 2008 National Ambulatory Medicare Care Survey, which has a nationally representative, multistage, and clustered sampling design. Their study results suggested that the increase of the variance estimate due to multiple imputation compared with single imputation largely disappears for estimates with large design effects. We complement their empirical research by providing some theoretical reasoning. We consider data sampled from an equally weighted, single-stage cluster design and characterize the process using a balanced, one-way normal random-effects model. Assuming that the missingness is completely at random, we derive analytic expressions for the within- and between-multiple-imputation variance estimators for the mean estimator, and thus conveniently reveal the impact of design effects on these variance estimators. We propose approximations for the fraction of missing information in clustered samples, extending previous results for simple random samples. We discuss some generalizations of this research and its practical implications for data release by statistical agencies.

  19. A multiple imputation approach for MNAR mechanisms compatible with Heckman's model.

    Science.gov (United States)

    Galimard, Jacques-Emmanuel; Chevret, Sylvie; Protopopescu, Camelia; Resche-Rigon, Matthieu

    2016-07-30

    Standard implementations of multiple imputation (MI) approaches provide unbiased inferences based on an assumption of underlying missing at random (MAR) mechanisms. However, in the presence of missing data generated by missing not at random (MNAR) mechanisms, MI is not satisfactory. Originating in an econometric statistical context, Heckman's model, also called the sample selection method, deals with selected samples using two joined linear equations, termed the selection equation and the outcome equation. It has been successfully applied to MNAR outcomes. Nevertheless, such a method only addresses missing outcomes, and this is a strong limitation in clinical epidemiology settings, where covariates are also often missing. We propose to extend the validity of MI to some MNAR mechanisms through the use of the Heckman's model as imputation model and a two-step estimation process. This approach will provide a solution that can be used in an MI by chained equation framework to impute missing (either outcomes or covariates) data resulting either from a MAR or an MNAR mechanism when the MNAR mechanism is compatible with a Heckman's model. The approach is illustrated on a real dataset from a randomised trial in patients with seasonal influenza. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  20. TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION.

    Science.gov (United States)

    Allen, Genevera I; Tibshirani, Robert

    2010-06-01

    Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable , meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal , in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.

  1. Using full-cohort data in nested case-control and case-cohort studies by multiple imputation.

    Science.gov (United States)

    Keogh, Ruth H; White, Ian R

    2013-10-15

    In many large prospective cohorts, expensive exposure measurements cannot be obtained for all individuals. Exposure-disease association studies are therefore often based on nested case-control or case-cohort studies in which complete information is obtained only for sampled individuals. However, in the full cohort, there may be a large amount of information on cheaply available covariates and possibly a surrogate of the main exposure(s), which typically goes unused. We view the nested case-control or case-cohort study plus the remainder of the cohort as a full-cohort study with missing data. Hence, we propose using multiple imputation (MI) to utilise information in the full cohort when data from the sub-studies are analysed. We use the fully observed data to fit the imputation models. We consider using approximate imputation models and also using rejection sampling to draw imputed values from the true distribution of the missing values given the observed data. Simulation studies show that using MI to utilise full-cohort information in the analysis of nested case-control and case-cohort studies can result in important gains in efficiency, particularly when a surrogate of the main exposure is available in the full cohort. In simulations, this method outperforms counter-matching in nested case-control studies and a weighted analysis for case-cohort studies, both of which use some full-cohort information. Approximate imputation models perform well except when there are interactions or non-linear terms in the outcome model, where imputation using rejection sampling works well. Copyright © 2013 John Wiley & Sons, Ltd.

  2. Validity of using multiple imputation for "unknown" stage at diagnosis in population-based cancer registry data.

    Science.gov (United States)

    Luo, Qingwei; Egger, Sam; Yu, Xue Qin; Smith, David P; O'Connell, Dianne L

    2017-01-01

    The multiple imputation approach to missing data has been validated by a number of simulation studies by artificially inducing missingness on fully observed stage data under a pre-specified missing data mechanism. However, the validity of multiple imputation has not yet been assessed using real data. The objective of this study was to assess the validity of using multiple imputation for "unknown" prostate cancer stage recorded in the New South Wales Cancer Registry (NSWCR) in real-world conditions. Data from the population-based cohort study NSW Prostate Cancer Care and Outcomes Study (PCOS) were linked to 2000-2002 NSWCR data. For cases with "unknown" NSWCR stage, PCOS-stage was extracted from clinical notes. Logistic regression was used to evaluate the missing at random assumption adjusted for variables from two imputation models: a basic model including NSWCR variables only and an enhanced model including the same NSWCR variables together with PCOS primary treatment. Cox regression was used to evaluate the performance of MI. Of the 1864 prostate cancer cases 32.7% were recorded as having "unknown" NSWCR stage. The missing at random assumption was satisfied when the logistic regression included the variables included in the enhanced model, but not those in the basic model only. The Cox models using data with imputed stage from either imputation model provided generally similar estimated hazard ratios but with wider confidence intervals compared with those derived from analysis of the data with PCOS-stage. However, the complete-case analysis of the data provided a considerably higher estimated hazard ratio for the low socio-economic status group and rural areas in comparison with those obtained from all other datasets. Using MI to deal with "unknown" stage data recorded in a population-based cancer registry appears to provide valid estimates. We would recommend a cautious approach to the use of this method elsewhere.

  3. Review of methods to probe single cell metabolism and bioenergetics.

    Science.gov (United States)

    Vasdekis, Andreas E; Stephanopoulos, Gregory

    2015-01-01

    Single cell investigations have enabled unexpected discoveries, such as the existence of biological noise and phenotypic switching in infection, metabolism and treatment. Herein, we review methods that enable such single cell investigations specific to metabolism and bioenergetics. Firstly, we discuss how to isolate and immobilize individuals from a cell suspension, including both permanent and reversible approaches. We also highlight specific advances in microbiology for its implications in metabolic engineering. Methods for probing single cell physiology and metabolism are subsequently reviewed. The primary focus therein is on dynamic and high-content profiling strategies based on label-free and fluorescence microspectroscopy and microscopy. Non-dynamic approaches, such as mass spectrometry and nuclear magnetic resonance, are also briefly discussed. Published by Elsevier Inc.

  4. Single particle electrochemical sensors and methods of utilization

    Science.gov (United States)

    Schoeniger, Joseph [Oakland, CA; Flounders, Albert W [Berkeley, CA; Hughes, Robert C [Albuquerque, NM; Ricco, Antonio J [Los Gatos, CA; Wally, Karl [Lafayette, CA; Kravitz, Stanley H [Placitas, NM; Janek, Richard P [Oakland, CA

    2006-04-04

    The present invention discloses an electrochemical device for detecting single particles, and methods for using such a device to achieve high sensitivity for detecting particles such as bacteria, viruses, aggregates, immuno-complexes, molecules, or ionic species. The device provides for affinity-based electrochemical detection of particles with single-particle sensitivity. The disclosed device and methods are based on microelectrodes with surface-attached, affinity ligands (e.g., antibodies, combinatorial peptides, glycolipids) that bind selectively to some target particle species. The electrodes electrolyze chemical species present in the particle-containing solution, and particle interaction with a sensor element modulates its electrolytic activity. The devices may be used individually, employed as sensors, used in arrays for a single specific type of particle or for a range of particle types, or configured into arrays of sensors having both these attributes.

  5. A new method of preparing single-walled carbon nanotubes

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Chemical Sciences; Volume 115; Issue 5-6. A new method of preparing single-walled carbon nanotubes ... Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur PO, Bangalore 560 064, India; Solid State and Structural Chemistry Unit, Indian Institute of Science, Bangalore 560 012, ...

  6. A new method of preparing single-walled carbon nanotubes

    Indian Academy of Sciences (India)

    Unknown

    A new method of preparing single-walled carbon nanotubes. ¶. S R C VIVEKCHAND1 and A GOVINDARAJ1,2,*. 1Chemistry and Physics of Materials Unit, Jawaharlal Nehru Centre for. Advanced Scientific Research, Jakkur PO, Bangalore 560 064, India. 2Solid State and Structural Chemistry Unit, Indian Institute of Science ...

  7. METHOD FOR MANUFACTURING A SINGLE CRYSTAL NANO-WIRE.

    NARCIS (Netherlands)

    Van Den Berg, Albert; Bomer, Johan; Carlen Edwin, Thomas; Chen, Songyue; Kraaijenhagen Roderik, Adriaan; Pinedo Herbert, Michael

    2011-01-01

    A method for manufacturing a single crystal nano-structure is provided comprising the steps of providing a device layer with a 100 structure on a substrate; providing a stress layer onto the device layer; patterning the stress layer along the 110 direction of the device layer; selectively removing

  8. METHOD FOR MANUFACTURING A SINGLE CRYSTAL NANO-WIRE

    NARCIS (Netherlands)

    Van Den Berg, Albert; Bomer, Johan; Carlen Edwin, Thomas; Chen, Songyue; Kraaijenhagen Roderik, Adriaan; Pinedo Herbert, Michael

    2012-01-01

    A method for manufacturing a single crystal nano-structure includes providing a device layer with a 100 structure on a substrate; providing a stress layer onto the device layer; patterning the stress layer along the 110 direction of the device layer; selectively removing parts of the stress layer to

  9. Missing data in a multi-item instrument were best handled by multiple imputation at the item score level

    NARCIS (Netherlands)

    Eekhout, Iris; de Vet, Henrica C. W.; Twisk, Jos W. R.; Brand, Jaap P. L.; de Boer, Michiel R.; Heymans, Martijn W.

    Objectives: Regardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced techniques such as multiple imputation (MI) are available. The objective of this study was to explore the performance of simple and more advanced methods for handling

  10. Quick, “Imputation-free” meta-analysis with proxy-SNPs

    Directory of Open Access Journals (Sweden)

    Meesters Christian

    2012-09-01

    Full Text Available Abstract Background Meta-analysis (MA is widely used to pool genome-wide association studies (GWASes in order to a increase the power to detect strong or weak genotype effects or b as a result verification method. As a consequence of differing SNP panels among genotyping chips, imputation is the method of choice within GWAS consortia to avoid losing too many SNPs in a MA. YAMAS (Yet Another Meta Analysis Software, however, enables cross-GWAS conclusions prior to finished and polished imputation runs, which eventually are time-consuming. Results Here we present a fast method to avoid forfeiting SNPs present in only a subset of studies, without relying on imputation. This is accomplished by using reference linkage disequilibrium data from 1,000 Genomes/HapMap projects to find proxy-SNPs together with in-phase alleles for SNPs missing in at least one study. MA is conducted by combining association effect estimates of a SNP and those of its proxy-SNPs. Our algorithm is implemented in the MA software YAMAS. Association results from GWAS analysis applications can be used as input files for MA, tremendously speeding up MA compared to the conventional imputation approach. We show that our proxy algorithm is well-powered and yields valuable ad hoc results, possibly providing an incentive for follow-up studies. We propose our method as a quick screening step prior to imputation-based MA, as well as an additional main approach for studies without available reference data matching the ethnicities of study participants. As a proof of principle, we analyzed six dbGaP Type II Diabetes GWAS and found that the proxy algorithm clearly outperforms naïve MA on the p-value level: for 17 out of 23 we observe an improvement on the p-value level by a factor of more than two, and a maximum improvement by a factor of 2127. Conclusions YAMAS is an efficient and fast meta-analysis program which offers various methods, including conventional MA as well as inserting proxy

  11. On the performance of multiple imputation based on chained equations in tackling missing data of the African α3.7 -globin deletion in a malaria association study.

    Science.gov (United States)

    Sepúlveda, Nuno; Manjurano, Alphaxard; Drakeley, Chris; Clark, Taane G

    2014-07-01

    Multiple imputation based on chained equations (MICE) is an alternative missing genotype method that can use genetic and nongenetic auxiliary data to inform the imputation process. Previously, MICE was successfully tested on strongly linked genetic data. We have now tested it on data of the HBA2 gene which, by the experimental design used in a malaria association study in Tanzania, shows a high missing data percentage and is weakly linked with the remaining genetic markers in the data set. We constructed different imputation models and studied their performance under different missing data conditions. Overall, MICE failed to accurately predict the true genotypes. However, using the best imputation model for the data, we obtained unbiased estimates for the genetic effects, and association signals of the HBA2 gene on malaria positivity. When the whole data set was analyzed with the same imputation model, the association signal increased from 0.80 to 2.70 before and after imputation, respectively. Conversely, postimputation estimates for the genetic effects remained the same in relation to the complete case analysis but showed increased precision. We argue that these postimputation estimates are reasonably unbiased, as a result of a good study design based on matching key socio-environmental factors. © 2014 The Authors. Annals of Human Genetics published by John Wiley & Sons Ltd and University College London (UCL).

  12. Single Image Super Resolution using a Joint GMM Method.

    Science.gov (United States)

    Sandeep, P; Jacob, Tony

    2016-07-07

    Single Image Super Resolution (SR) algorithms based on joint dictionaries and sparse representations of image patches have received significant attention in literature and deliver state of the art results. Recently, Gaussian Mixture Models (GMMs) have emerged as favored prior for natural image patches in various image restoration problems. In this work, we approach the single image SR problem by using a joint GMM learnt from concatenated vectors of high and low resolution patches sampled from a large database of pairs of high resolution and the corresponding low resolution images. Covariance matrices of the learnt Gaussian models capture the inherent correlations between high and low resolution patches which are utilized for inferring high resolution patches from given low resolution patches. The proposed joint GMM method can be interpreted as the GMM analogue of joint dictionary based algorithms for single image SR. We study the performance of the proposed joint GMM method by comparing with various competing algorithms for single image SR. Our experiments on various natural images demonstrate the competitive performance obtained by the proposed method at low computational cost.

  13. A method of object recognition for single pixel imaging

    Science.gov (United States)

    Li, Boxuan; Zhang, Wenwen

    2018-01-01

    Computational ghost imaging(CGI), utilizing a single-pixel detector, has been extensively used in many fields. However, in order to achieve a high-quality reconstructed image, a large number of iterations are needed, which limits the flexibility of using CGI in practical situations, especially in the field of object recognition. In this paper, we purpose a method utilizing the feature matching to identify the number objects. In the given system, approximately 90% of accuracy of recognition rates can be achieved, which provides a new idea for the application of single pixel imaging in the field of object recognition

  14. Appropriate inclusion of interactions was needed to avoid bias in multiple imputation.

    Science.gov (United States)

    Tilling, Kate; Williamson, Elizabeth J; Spratt, Michael; Sterne, Jonathan A C; Carpenter, James R

    2016-12-01

    Missing data are a pervasive problem, often leading to bias in complete records analysis (CRA). Multiple imputation (MI) via chained equations is one solution, but its use in the presence of interactions is not straightforward. We simulated data with outcome Y dependent on binary explanatory variables X and Z and their interaction XZ. Six scenarios were simulated (Y continuous and binary, each with no interaction, a weak and a strong interaction), under five missing data mechanisms. We use directed acyclic graphs to identify when CRA and MI would each be unbiased. We evaluate the performance of CRA, MI without interactions, MI including all interactions, and stratified imputation. We also illustrated these methods using a simple example from the National Child Development Study (NCDS). MI excluding interactions is invalid and resulted in biased estimates and low coverage. When XZ was zero, MI excluding interactions gave unbiased estimates but overcoverage. MI including interactions and stratified MI gave equivalent, valid inference in all cases. In the NCDS example, MI excluding interactions incorrectly concluded there was no evidence for an important interaction. Epidemiologists carrying out MI should ensure that their imputation model(s) are compatible with their analysis model. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  15. Real stabilization method for nuclear single-particle resonances

    International Nuclear Information System (INIS)

    Zhang Li; Zhou Shangui; Meng Jie; Zhao Enguang

    2008-01-01

    We develop the real stabilization method within the framework of the relativistic mean-field (RMF) model. With the self-consistent nuclear potentials from the RMF model, the real stabilization method is used to study single-particle resonant states in spherical nuclei. As examples, the energies, widths, and wave functions of low-lying neutron resonant states in 120 Sn are obtained. These results are compared with those from the scattering phase-shift method and the analytic continuation in the coupling constant approach and satisfactory agreements are found

  16. References for Haplotype Imputation in the Big Data Era.

    Science.gov (United States)

    Li, Wenzhi; Xu, Wei; Li, Qiling; Ma, Li; Song, Qing

    2015-11-01

    Imputation is a powerful in silico approach to fill in those missing values in the big datasets. This process requires a reference panel, which is a collection of big data from which the missing information can be extracted and imputed. Haplotype imputation requires ethnicity-matched references; a mismatched reference panel will significantly reduce the quality of imputation. However, currently existing big datasets cover only a small number of ethnicities, there is a lack of ethnicity-matched references for many ethnic populations in the world, which has hampered the data imputation of haplotypes and its downstream applications. To solve this issue, several approaches have been proposed and explored, including the mixed reference panel, the internal reference panel and genotype-converted reference panel. This review article provides the information and comparison between these approaches. Increasing evidence showed that not just one or two genetic elements dictate the gene activity and functions; instead, cis-interactions of multiple elements dictate gene activity. Cis-interactions require the interacting elements to be on the same chromosome molecule, therefore, haplotype analysis is essential for the investigation of cis-interactions among multiple genetic variants at different loci, and appears to be especially important for studying the common diseases. It will be valuable in a wide spectrum of applications from academic research, to clinical diagnosis, prevention, treatment, and pharmaceutical industry.

  17. Improvement of Source Number Estimation Method for Single Channel Signal.

    Directory of Open Access Journals (Sweden)

    Zhi Dong

    Full Text Available Source number estimation methods for single channel signal have been investigated and the improvements for each method are suggested in this work. Firstly, the single channel data is converted to multi-channel form by delay process. Then, algorithms used in the array signal processing, such as Gerschgorin's disk estimation (GDE and minimum description length (MDL, are introduced to estimate the source number of the received signal. The previous results have shown that the MDL based on information theoretic criteria (ITC obtains a superior performance than GDE at low SNR. However it has no ability to handle the signals containing colored noise. On the contrary, the GDE method can eliminate the influence of colored noise. Nevertheless, its performance at low SNR is not satisfactory. In order to solve these problems and contradictions, the work makes remarkable improvements on these two methods on account of the above consideration. A diagonal loading technique is employed to ameliorate the MDL method and a jackknife technique is referenced to optimize the data covariance matrix in order to improve the performance of the GDE method. The results of simulation have illustrated that the performance of original methods have been promoted largely.

  18. Principles of crystallization, and methods of single crystal growth

    International Nuclear Information System (INIS)

    Chacra, T.

    2010-01-01

    Most of single crystals (monocrystals), have distinguished optical, electrical, or magnetic properties, which make from single crystals, key elements in most of technical modern devices, as they may be used as lenses, Prisms, or grating sin optical devises, or Filters in X-Ray and spectrographic devices, or conductors and semiconductors in electronic, and computer industries. Furthermore, Single crystals are used in transducer devices. Moreover, they are indispensable elements in Laser and Maser emission technology.Crystal Growth Technology (CGT), has started, and developed in the international Universities and scientific institutions, aiming at some of single crystals, which may have significant properties and industrial applications, that can attract the attention of international crystal growth centers, to adopt the industrial production and marketing of such crystals. Unfortunately, Arab universities generally, and Syrian universities specifically, do not give even the minimum interest, to this field of Science.The purpose of this work is to attract the attention of Crystallographers, Physicists and Chemists in the Arab universities and research centers to the importance of crystal growth, and to work on, in the first stage to establish simple, uncomplicated laboratories for the growth of single crystal. Such laboratories can be supplied with equipment, which are partly available or can be manufactured in the local market. Many references (Articles, Papers, Diagrams, etc..) has been studied, to conclude the most important theoretical principles of Phase transitions,especially of crystallization. The conclusions of this study, are summarized in three Principles; Thermodynamic-, Morphologic-, and Kinetic-Principles. The study is completed by a brief description of the main single crystal growth methods with sketches, of equipment used in each method, which can be considered as primary designs for the equipment, of a new crystal growth laboratory. (author)

  19. Oil Reservoir Production Optimization using Single Shooting and ESDIRK Methods

    DEFF Research Database (Denmark)

    Capolei, Andrea; Völcker, Carsten; Frydendall, Jan

    2012-01-01

    the injections and oil production such that flow is uniform in a given geological structure. Even in the case of conventional water flooding, feedback based optimal control technologies may enable higher oil recovery than with conventional operational strategies. The optimal control problems that must be solved......Conventional recovery techniques enable recovery of 10-50% of the oil in an oil field. Advances in smart well technology and enhanced oil recovery techniques enable significant larger recovery. To realize this potential, feedback model-based optimal control technologies are needed to manipulate...... are large-scale problems and require specialized numerical algorithms. In this paper, we combine a single shooting optimization algorithm based on sequential quadratic programming (SQP) with explicit singly diagonally implicit Runge-Kutta (ESDIRK) integration methods and the a continuous adjoint method...

  20. Defects detecting method of lamp cap of single soldering lug

    Science.gov (United States)

    Cai, Jihe; Lv, Jidong

    2017-07-01

    In order to resolve the problems of low efficiency and large separating difference in fault detection of lamp holders with single soldering lug, an image-detection-based defect detection method is presented in this paper. The selected image is first preprocessed, where the possible area of soldering lug is cut in this preprocessing to narrow the scope for subsequent partition with the consideration that the smooth surface of metal at lamp holder and black insulation glass may reflect the light. Then, the soldering lug is extracted by a series of processing including clustering partition. Based on this, the defects are detected by regional marking, area comparison, circularity and coordinate deviation. The experiment results show that the designed method is simple and practical to detect main quality defects of lamp holder with single soldering lug correctly and efficiently.

  1. Methods of forming single source precursors, methods of forming polymeric single source precursors, and single source precursors formed by such methods

    Science.gov (United States)

    Fox, Robert V.; Rodriguez, Rene G.; Pak, Joshua J.; Sun, Chivin; Margulieux, Kelsey R.; Holland, Andrew W.

    2014-09-09

    Methods of forming single source precursors (SSPs) include forming intermediate products having the empirical formula 1/2{L.sub.2N(.mu.-X).sub.2M'X.sub.2}.sub.2, and reacting MER with the intermediate products to form SSPs of the formula L.sub.2N(.mu.-ER).sub.2M'(ER).sub.2, wherein L is a Lewis base, M is a Group IA atom, N is a Group IB atom, M' is a Group IIIB atom, each E is a Group VIB atom, each X is a Group VIIA atom or a nitrate group, and each R group is an alkyl, aryl, vinyl, (per)fluoro alkyl, (per)fluoro aryl, silane, or carbamato group. Methods of forming polymeric or copolymeric SSPs include reacting at least one of HE.sup.1R.sup.1E.sup.1H and MER with one or more substances having the empirical formula L.sub.2N(.mu.-ER).sub.2M'(ER).sub.2 or L.sub.2N(.mu.-X).sub.2M'(X).sub.2 to form a polymeric or copolymeric SSP. New SSPs and intermediate products are formed by such methods.

  2. Methods of forming single source precursors, methods of forming polymeric single source precursors, and single source precursors and intermediate products formed by such methods

    Science.gov (United States)

    Fox, Robert V.; Rodriguez, Rene G.; Pak, Joshua J.; Sun, Chivin; Margulieux, Kelsey R.; Holland, Andrew W.

    2012-12-04

    Methods of forming single source precursors (SSPs) include forming intermediate products having the empirical formula 1/2{L.sub.2N(.mu.-X).sub.2M'X.sub.2}.sub.2, and reacting MER with the intermediate products to form SSPs of the formula L.sub.2N(.mu.-ER).sub.2M'(ER).sub.2, wherein L is a Lewis base, M is a Group IA atom, N is a Group IB atom, M' is a Group IIIB atom, each E is a Group VIB atom, each X is a Group VIIA atom or a nitrate group, and each R group is an alkyl, aryl, vinyl, (per)fluoro alkyl, (per)fluoro aryl, silane, or carbamato group. Methods of forming polymeric or copolymeric SSPs include reacting at least one of HE.sup.1R.sup.1E.sup.1H and MER with one or more substances having the empirical formula L.sub.2N(.mu.-ER).sub.2M'(ER).sub.2 or L.sub.2N(.mu.-X).sub.2M'(X).sub.2 to form a polymeric or copolymeric SSP. New SSPs and intermediate products are formed by such methods.

  3. TRIP: An interactive retrieving-inferring data imputation approach

    KAUST Repository

    Li, Zhixu

    2016-06-25

    Data imputation aims at filling in missing attribute values in databases. Existing imputation approaches to nonquantitive string data can be roughly put into two categories: (1) inferring-based approaches [2], and (2) retrieving-based approaches [1]. Specifically, the inferring-based approaches find substitutes or estimations for the missing ones from the complete part of the data set. However, they typically fall short in filling in unique missing attribute values which do not exist in the complete part of the data set [1]. The retrieving-based approaches resort to external resources for help by formulating proper web search queries to retrieve web pages containing the missing values from the Web, and then extracting the missing values from the retrieved web pages [1]. This webbased retrieving approach reaches a high imputation precision and recall, but on the other hand, issues a large number of web search queries, which brings a large overhead [1]. © 2016 IEEE.

  4. Addressing missing data mechanism uncertainty using multiple-model multiple imputation: Application to a longitudinal clinical trial

    OpenAIRE

    Siddique, Juned; Harel, Ofer; Crespi, Catherine M.

    2013-01-01

    We present a framework for generating multiple imputations for continuous data when the missing data mechanism is unknown. Imputations are generated from more than one imputation model in order to incorporate uncertainty regarding the missing data mechanism. Parameter estimates based on the different imputation models are combined using rules for nested multiple imputation. Through the use of simulation, we investigate the impact of missing data mechanism uncertainty on post-imputation infere...

  5. Genome-wide association study based on multiple imputation with low-depth sequencing data: application to biofuel traits in reed canarygrass.

    Science.gov (United States)

    Ramstein, Guillaume P; Lipka, Alexander E; Lu, Fei; Costich, Denise E; Cherney, Jerome H; Buckler, Edward S; Casler, Michael D

    2015-03-12

    Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species with potential as a biofuel crop. Our study involved two linkage populations and an association panel of 590 reed canarygrass genotypes. Plants were assayed for up to 5228 single nucleotide polymorphism markers and 35 traits. The genotypic markers were derived from low-depth sequencing with 78% missing data on average. To soundly infer marker-trait associations, multiple imputation (MI) was used: several imputes of the marker data were generated to reflect imputation uncertainty and association tests were performed on marker effects across imputes. A total of nine significant markers were identified, three of which showed significant homology with the Brachypodium dystachion genome. Because no physical map of the reed canarygrass genome was available, imputation was conducted using classification trees. In general, MI showed good consistency with the complete-case analysis and adequate control over imputation uncertainty. A gain in significance of marker effects was achieved through MI, but only for rare cases when missing data were <45%. In addition to providing insight into the genetic basis of important traits in reed canarygrass, this study presents one of the first applications of MI to genome-wide analyses and provides useful guidelines for conducting GWAS based on genotyping-by-sequencing data. Copyright © 2015 Ramstein et al.

  6. A suggested approach for imputation of missing dietary data for young children in daycare

    OpenAIRE

    Stevens, June; Ou, Fang-Shu; Truesdale, Kimberly P.; Zeng, Donglin; Vaughn, Amber E.; Pratt, Charlotte; Ward, Dianne S.

    2015-01-01

    Background: Parent-reported 24-h diet recalls are an accepted method of estimating intake in young children. However, many children eat while at childcare making accurate proxy reports by parents difficult.Objective: The goal of this study was to demonstrate a method to impute missing weekday lunch and daytime snack nutrient data for daycare children and to explore the concurrent predictive and criterion validity of the method.Design: Data were from children aged 2-5 years in the My Parenting...

  7. 5 CFR 919.630 - May the OPM impute conduct of one person to another?

    Science.gov (United States)

    2010-01-01

    ...) General Principles Relating to Suspension and Debarment Actions § 919.630 May the OPM impute conduct of one person to another? For purposes of actions taken under this rule, we may impute conduct as follows...

  8. Multiple imputation of missing passenger boarding data in the national census of ferry operators

    Science.gov (United States)

    2008-08-01

    This report presents findings from the 2006 National Census of Ferry Operators (NCFO) augmented with imputed values for passengers and passenger miles. Due to the imputation procedures used to calculate missing data, totals in Table 1 may not corresp...

  9. An alternative method for restoring single-tooth implants.

    Science.gov (United States)

    McArdle, B F; Clarizio, L F

    2001-09-01

    Having laboratory technicians prepare soft-tissue casts and implant abutments with or without concomitant removable temporary prostheses during the restorative phase of single-tooth replacement is an accepted practice. It can, however, result in functional and esthetic intraoral discrepancies. Single-tooth implants can be restored with crowns (like those for natural teeth) fabricated at a dental laboratory on casts obtained from final impressions of prepared implant abutments. In the case reported, the restorative dentist restored the patient's single-tooth implant after taking a transfer impression. He constructed a cast simulating the peri-implant soft tissue with final impression material and prepared the abutment on this model. His dental assistant then fabricated a fixed provisional restoration on the prepared abutment. At the patient's next visit, the dentist torqued the prepared abutment onto the implant, took a final impression and inserted the provisional restoration. A crown was made conventionally at the dental laboratory and cemented in place at the following visit. This alternative method for restoring single-tooth implants enhances esthetics by more accurately simulating marginal gingival architecture. It also improves function by preloading the implant through fixed temporization after the dentist, rather than the laboratory technician, prepares the abutment to the dentist's preferred contours.

  10. Assessment of Consequences of Replacement of System of the Uniform Tax on Imputed Income Patent System of the Taxation

    Directory of Open Access Journals (Sweden)

    Galina A. Manokhina

    2012-11-01

    Full Text Available The article highlights the main questions concerning possible consequences of replacement of nowadays operating system in the form of a single tax in reference to imputed income with patent system of the taxation. The main advantages and drawbacks of new system of the taxation are shown, including the opinion that not the replacement of one special mode of the taxation with another is more effective, but the introduction of patent a taxation system as an auxilary system.

  11. Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research

    Directory of Open Access Journals (Sweden)

    Hardt Jochen

    2012-12-01

    Full Text Available Abstract Background Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. Methods A simulation study of a linear regression with a response Y and two predictors X1 and X2 was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10 vs. moderate correlations (r=.50 with X’s and Y. Results The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful. Conclusion More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.

  12. Age at menopause: imputing age at menopause for women with a hysterectomy with application to risk of postmenopausal breast cancer

    Science.gov (United States)

    Rosner, Bernard; Colditz, Graham A.

    2011-01-01

    Purpose Age at menopause, a major marker in the reproductive life, may bias results for evaluation of breast cancer risk after menopause. Methods We follow 38,948 premenopausal women in 1980 and identify 2,586 who reported hysterectomy without bilateral oophorectomy, and 31,626 who reported natural menopause during 22 years of follow-up. We evaluate risk factors for natural menopause, impute age at natural menopause for women reporting hysterectomy without bilateral oophorectomy and estimate the hazard of reaching natural menopause in the next 2 years. We apply this imputed age at menopause to both increase sample size and to evaluate the relation between postmenopausal exposures and risk of breast cancer. Results Age, cigarette smoking, age at menarche, pregnancy history, body mass index, history of benign breast disease, and history of breast cancer were each significantly related to age at natural menopause; duration of oral contraceptive use and family history of breast cancer were not. The imputation increased sample size substantially and although some risk factors after menopause were weaker in the expanded model (height, and alcohol use), use of hormone therapy is less biased. Conclusions Imputing age at menopause increases sample size, broadens generalizability making it applicable to women with hysterectomy, and reduces bias. PMID:21441037

  13. On the multiple imputation variance estimator for control-based and delta-adjusted pattern mixture models.

    Science.gov (United States)

    Tang, Yongqiang

    2017-12-01

    Control-based pattern mixture models (PMM) and delta-adjusted PMMs are commonly used as sensitivity analyses in clinical trials with non-ignorable dropout. These PMMs assume that the statistical behavior of outcomes varies by pattern in the experimental arm in the imputation procedure, but the imputed data are typically analyzed by a standard method such as the primary analysis model. In the multiple imputation (MI) inference, Rubin's variance estimator is generally biased when the imputation and analysis models are uncongenial. One objective of the article is to quantify the bias of Rubin's variance estimator in the control-based and delta-adjusted PMMs for longitudinal continuous outcomes. These PMMs assume the same observed data distribution as the mixed effects model for repeated measures (MMRM). We derive analytic expressions for the MI treatment effect estimator and the associated Rubin's variance in these PMMs and MMRM as functions of the maximum likelihood estimator from the MMRM analysis and the observed proportion of subjects in each dropout pattern when the number of imputations is infinite. The asymptotic bias is generally small or negligible in the delta-adjusted PMM, but can be sizable in the control-based PMM. This indicates that the inference based on Rubin's rule is approximately valid in the delta-adjusted PMM. A simple variance estimator is proposed to ensure asymptotically valid MI inferences in these PMMs, and compared with the bootstrap variance. The proposed method is illustrated by the analysis of an antidepressant trial, and its performance is further evaluated via a simulation study. © 2017, The International Biometric Society.

  14. GNSS Single Frequency, Single Epoch Reliable Attitude Determination Method with Baseline Vector Constraint

    Directory of Open Access Journals (Sweden)

    Ang Gong

    2015-12-01

    Full Text Available For Global Navigation Satellite System (GNSS single frequency, single epoch attitude determination, this paper proposes a new reliable method with baseline vector constraint. First, prior knowledge of baseline length, heading, and pitch obtained from other navigation equipment or sensors are used to reconstruct objective function rigorously. Then, searching strategy is improved. It substitutes gradually Enlarged ellipsoidal search space for non-ellipsoidal search space to ensure correct ambiguity candidates are within it and make the searching process directly be carried out by least squares ambiguity decorrelation algorithm (LAMBDA method. For all vector candidates, some ones are further eliminated by derived approximate inequality, which accelerates the searching process. Experimental results show that compared to traditional method with only baseline length constraint, this new method can utilize a priori baseline three-dimensional knowledge to fix ambiguity reliably and achieve a high success rate. Experimental tests also verify it is not very sensitive to baseline vector error and can perform robustly when angular error is not great.

  15. GNSS Single Frequency, Single Epoch Reliable Attitude Determination Method with Baseline Vector Constraint.

    Science.gov (United States)

    Gong, Ang; Zhao, Xiubin; Pang, Chunlei; Duan, Rong; Wang, Yong

    2015-12-02

    For Global Navigation Satellite System (GNSS) single frequency, single epoch attitude determination, this paper proposes a new reliable method with baseline vector constraint. First, prior knowledge of baseline length, heading, and pitch obtained from other navigation equipment or sensors are used to reconstruct objective function rigorously. Then, searching strategy is improved. It substitutes gradually Enlarged ellipsoidal search space for non-ellipsoidal search space to ensure correct ambiguity candidates are within it and make the searching process directly be carried out by least squares ambiguity decorrelation algorithm (LAMBDA) method. For all vector candidates, some ones are further eliminated by derived approximate inequality, which accelerates the searching process. Experimental results show that compared to traditional method with only baseline length constraint, this new method can utilize a priori baseline three-dimensional knowledge to fix ambiguity reliably and achieve a high success rate. Experimental tests also verify it is not very sensitive to baseline vector error and can perform robustly when angular error is not great.

  16. A Comparison of Item-Level and Scale-Level Multiple Imputation for Questionnaire Batteries

    Science.gov (United States)

    Gottschall, Amanda C.; West, Stephen G.; Enders, Craig K.

    2012-01-01

    Behavioral science researchers routinely use scale scores that sum or average a set of questionnaire items to address their substantive questions. A researcher applying multiple imputation to incomplete questionnaire data can either impute the incomplete items prior to computing scale scores or impute the scale scores directly from other scale…

  17. Multiple imputation: an application to income nonresponse in the National Survey on Recreation and the Environment

    Science.gov (United States)

    Stanley J. Zarnoch; H. Ken Cordell; Carter J. Betz; John C. Bergstrom

    2010-01-01

    Multiple imputation is used to create values for missing family income data in the National Survey on Recreation and the Environment. We present an overview of the survey and a description of the missingness pattern for family income and other key variables. We create a logistic model for the multiple imputation process and to impute data sets for family income. We...

  18. Sensitivity analysis in multiple imputation in effectiveness studies of psychotherapy.

    Science.gov (United States)

    Crameri, Aureliano; von Wyl, Agnes; Koemeda, Margit; Schulthess, Peter; Tschuschke, Volker

    2015-01-01

    The importance of preventing and treating incomplete data in effectiveness studies is nowadays emphasized. However, most of the publications focus on randomized clinical trials (RCT). One flexible technique for statistical inference with missing data is multiple imputation (MI). Since methods such as MI rely on the assumption of missing data being at random (MAR), a sensitivity analysis for testing the robustness against departures from this assumption is required. In this paper we present a sensitivity analysis technique based on posterior predictive checking, which takes into consideration the concept of clinical significance used in the evaluation of intra-individual changes. We demonstrate the possibilities this technique can offer with the example of irregular longitudinal data collected with the Outcome Questionnaire-45 (OQ-45) and the Helping Alliance Questionnaire (HAQ) in a sample of 260 outpatients. The sensitivity analysis can be used to (1) quantify the degree of bias introduced by missing not at random data (MNAR) in a worst reasonable case scenario, (2) compare the performance of different analysis methods for dealing with missing data, or (3) detect the influence of possible violations to the model assumptions (e.g., lack of normality). Moreover, our analysis showed that ratings from the patient's and therapist's version of the HAQ could significantly improve the predictive value of the routine outcome monitoring based on the OQ-45. Since analysis dropouts always occur, repeated measurements with the OQ-45 and the HAQ analyzed with MI are useful to improve the accuracy of outcome estimates in quality assurance assessments and non-randomized effectiveness studies in the field of outpatient psychotherapy.

  19. Multiple Imputation Strategies for Multiple Group Structural Equation Models

    Science.gov (United States)

    Enders, Craig K.; Gottschall, Amanda C.

    2011-01-01

    Although structural equation modeling software packages use maximum likelihood estimation by default, there are situations where one might prefer to use multiple imputation to handle missing data rather than maximum likelihood estimation (e.g., when incorporating auxiliary variables). The selection of variables is one of the nuances associated…

  20. Investigation of Multiple Imputation in Low-Quality Questionnaire Data

    Science.gov (United States)

    Van Ginkel, Joost R.

    2010-01-01

    The performance of multiple imputation in questionnaire data has been studied in various simulation studies. However, in practice, questionnaire data are usually more complex than simulated data. For example, items may be counterindicative or may have unacceptably low factor loadings on every subscale, or completely missing subscales may…

  1. A Multiple-Imputation "Forward Bridging" Approach to Address Changes in the Classification of Asian Race/Ethnicity on the US Death Certificate.

    Science.gov (United States)

    Thompson, Caroline A; Boothroyd, Derek B; Hastings, Katherine G; Cullen, Mark R; Palaniappan, Latha P; Rehkopf, David H

    2018-02-01

    The incomparability of old and new classification systems for describing the same data can be seen as a missing-data problem, and, under certain assumptions, multiple imputation may be used to "bridge" 2 classification systems. One example of such a change is the introduction of detailed Asian-American race/ethnicity classifications on the 2003 version of the US national death certificate, which was adopted for use by 38 states between 2003 and 2011. Using county- and decedent-level data from 3 different national sources for pre- and postadoption years, we fitted within-state multiple-imputation models to impute ethnicities for decedents classified as "other Asian" during preadoption years. We present mortality rates derived using 3 different methods of calculation: 1) including all states but ignoring the gradual adoption of the new death certificate over time, 2) including only the 7 states with complete reporting of all ethnicities, and 3) including all states and applying multiple imputation. Estimates from our imputation model were consistently in the middle of the other 2 estimates, and trend results demonstrated that the year-by-year estimates of the imputation model were more similar to those of the 7-state model. This work demonstrates how multiple imputation can provide a "forward bridging" approach to make more accurate estimates over time in newly categorized populations. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  2. Underwater Environment SDAP Method Using Multi Single-Beam Sonars

    Directory of Open Access Journals (Sweden)

    Zheping Yan

    2013-01-01

    Full Text Available A new autopilot system for unmanned underwater vehicle (UUV using multi-single-beam sonars is proposed for environmental exploration. The proposed autopilot system is known as simultaneous detection and patrolling (SDAP, which addresses two fundamental challenges: autonomous guidance and control. Autonomous guidance, autonomous path planning, and target tracking are based on the desired reference path which is reconstructed from the sonar data collected from the environmental contour with the predefined safety distance. The reference path is first estimated by using a support vector clustering inertia method and then refined by Bézier curves in order to satisfy the inertia property of the UUV. Differential geometry feedback linearization method is used to guide the vehicle entering into the predefined path while finite predictive stable inversion control algorithm is employed for autonomous target approaching. The experimental results from sea trials have demonstrated that the proposed system can provide satisfactory performance implying its great potential for future underwater exploration tasks.

  3. Single well tracer method to evaluate enhanced recovery

    Science.gov (United States)

    Sheely, Jr., Clyde Q.; Baldwin, Jr., David E.

    1978-01-01

    Data useful to evaluate the effectiveness of or to design an enhanced recovery process (the recovery process involving mobilizing and moving hydrocarbons through a hydrocarbon-bearing subterranean formation from an injection well to a production well by injecting a mobilizing fluid into the injection well) are obtained by a process which comprises sequentially: determining hydrocarbon saturation in the formation in a volume in the formation near a well bore penetrating the formation, injecting sufficient of the mobilizing fluid to mobilize and move hydrocarbons from a volume in the formation near the well bore penetrating the formation, and determining by the single well tracer method a hydrocarbon saturation profile in a volume from which hydrocarbons are moved. The single well tracer method employed is disclosed by U.S. Pat. No. 3,623,842. The process is useful to evaluate surfactant floods, water floods, polymer floods, CO.sub.2 floods, caustic floods, micellar floods, and the like in the reservoir in much less time at greatly reduced costs, compared to conventional multi-well pilot test.

  4. Imputing amino acid polymorphisms in human leukocyte antigens.

    Directory of Open Access Journals (Sweden)

    Xiaoming Jia

    Full Text Available DNA sequence variation within human leukocyte antigen (HLA genes mediate susceptibility to a wide range of human diseases. The complex genetic structure of the major histocompatibility complex (MHC makes it difficult, however, to collect genotyping data in large cohorts. Long-range linkage disequilibrium between HLA loci and SNP markers across the major histocompatibility complex (MHC region offers an alternative approach through imputation to interrogate HLA variation in existing GWAS data sets. Here we describe a computational strategy, SNP2HLA, to impute classical alleles and amino acid polymorphisms at class I (HLA-A, -B, -C and class II (-DPA1, -DPB1, -DQA1, -DQB1, and -DRB1 loci. To characterize performance of SNP2HLA, we constructed two European ancestry reference panels, one based on data collected in HapMap-CEPH pedigrees (90 individuals and another based on data collected by the Type 1 Diabetes Genetics Consortium (T1DGC, 5,225 individuals. We imputed HLA alleles in an independent data set from the British 1958 Birth Cohort (N = 918 with gold standard four-digit HLA types and SNPs genotyped using the Affymetrix GeneChip 500 K and Illumina Immunochip microarrays. We demonstrate that the sample size of the reference panel, rather than SNP density of the genotyping platform, is critical to achieve high imputation accuracy. Using the larger T1DGC reference panel, the average accuracy at four-digit resolution is 94.7% using the low-density Affymetrix GeneChip 500 K, and 96.7% using the high-density Illumina Immunochip. For amino acid polymorphisms within HLA genes, we achieve 98.6% and 99.3% accuracy using the Affymetrix GeneChip 500 K and Illumina Immunochip, respectively. Finally, we demonstrate how imputation and association testing at amino acid resolution can facilitate fine-mapping of primary MHC association signals, giving a specific example from type 1 diabetes.

  5. Correcting hazard ratio estimates for outcome misclassification using multiple imputation with internal validation data.

    Science.gov (United States)

    Ni, Jiayi; Leong, Aaron; Dasgupta, Kaberi; Rahme, Elham

    2017-08-01

    Outcome misclassification may occur in observational studies using administrative databases. We evaluated a two-step multiple imputation approach based on complementary internal validation data obtained from two subsamples of study participants to reduce bias in hazard ratio (HR) estimates in Cox regressions. We illustrated this approach using data from a surveyed sample of 6247 individuals in a study of statin-diabetes association in Quebec. We corrected diabetes status and onset assessed from health administrative data against self-reported diabetes and/or elevated fasting blood glucose (FBG) assessed in subsamples. The association between statin use and new onset diabetes was evaluated using administrative data and the corrected data. By simulation, we assessed the performance of this method varying the true HR, sensitivity, specificity, and the size of validation subsamples. The adjusted HR of new onset diabetes among statin users versus non-users was 1.61 (95% confidence interval: 1.09-2.38) using administrative data only, 1.49 (0.95-2.34) when diabetes status and onset were corrected based on self-report and undiagnosed diabetes (FBG ≥ 7 mmol/L), and 1.36 (0.92-2.01) when corrected for self-report and undiagnosed diabetes/impaired FBG (≥ 6 mmol/L). In simulations, the multiple imputation approach yielded less biased HR estimates and appropriate coverage for both non-differential and differential misclassification. Large variations in the corrected HR estimates were observed using validation subsamples with low participation proportion. The bias correction was sometimes outweighed by the uncertainty introduced by the unknown time of event occurrence. Multiple imputation is useful to correct for outcome misclassification in time-to-event analyses if complementary validation data are available from subsamples. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  6. Multiple Imputation to Account for Measurement Error in Marginal Structural Models.

    Science.gov (United States)

    Edwards, Jessie K; Cole, Stephen R; Westreich, Daniel; Crane, Heidi; Eron, Joseph J; Mathews, W Christopher; Moore, Richard; Boswell, Stephen L; Lesko, Catherine R; Mugavero, Michael J

    2015-09-01

    Marginal structural models are an important tool for observational studies. These models typically assume that variables are measured without error. We describe a method to account for differential and nondifferential measurement error in a marginal structural model. We illustrate the method estimating the joint effects of antiretroviral therapy initiation and current smoking on all-cause mortality in a United States cohort of 12,290 patients with HIV followed for up to 5 years between 1998 and 2011. Smoking status was likely measured with error, but a subset of 3,686 patients who reported smoking status on separate questionnaires composed an internal validation subgroup. We compared a standard joint marginal structural model fit using inverse probability weights to a model that also accounted for misclassification of smoking status using multiple imputation. In the standard analysis, current smoking was not associated with increased risk of mortality. After accounting for misclassification, current smoking without therapy was associated with increased mortality (hazard ratio [HR]: 1.2 [95% confidence interval [CI] = 0.6, 2.3]). The HR for current smoking and therapy [0.4 (95% CI = 0.2, 0.7)] was similar to the HR for no smoking and therapy (0.4; 95% CI = 0.2, 0.6). Multiple imputation can be used to account for measurement error in concert with methods for causal inference to strengthen results from observational studies.

  7. An Imputation Model for Dropouts in Unemployment Data

    Directory of Open Access Journals (Sweden)

    Nilsson Petra

    2016-09-01

    Full Text Available Incomplete unemployment data is a fundamental problem when evaluating labour market policies in several countries. Many unemployment spells end for unknown reasons; in the Swedish Public Employment Service’s register as many as 20 percent. This leads to an ambiguity regarding destination states (employment, unemployment, retired, etc.. According to complete combined administrative data, the employment rate among dropouts was close to 50 for the years 1992 to 2006, but from 2007 the employment rate has dropped to 40 or less. This article explores an imputation approach. We investigate imputation models estimated both on survey data from 2005/2006 and on complete combined administrative data from 2005/2006 and 2011/2012. The models are evaluated in terms of their ability to make correct predictions. The models have relatively high predictive power.

  8. Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans

    Directory of Open Access Journals (Sweden)

    Assaf Gottlieb

    2017-11-01

    Full Text Available Abstract Background Genome-wide association studies are useful for discovering genotype–phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into “gene level” effects. Methods Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression—on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. Results We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Conclusions Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort

  9. Evaluation of Multiple Imputation in Missing Data Analysis: An Application on Repeated Measurement Data in Animal Science

    Directory of Open Access Journals (Sweden)

    Gazel Ser

    2015-12-01

    Full Text Available The purpose of this study was to evaluate the performance of multiple imputation method in case that missing observation structure is at random and completely at random from the approach of general linear mixed model. The application data of study was consisted of a total 77 heads of Norduz ram lambs at 7 months of age. After slaughtering, pH values measured at five different time points were determined as dependent variable. In addition, hot carcass weight, muscle glycogen level and fasting durations were included as independent variables in the model. In the dependent variable without missing observation, two missing observation structures including Missing Completely at Random (MCAR and Missing at Random (MAR were created by deleting the observations at certain rations (10% and 25%. After that, in data sets that have missing observation structure, complete data sets were obtained using MI (multiple imputation. The results obtained by applying general linear mixed model to the data sets that were completed using MI method were compared to the results regarding complete data. In the mixed model which was applied to the complete data and MI data sets, results whose covariance structures were the same and parameter estimations and standard estimations were rather close to the complete data are obtained. As a result, in this study, it was ensured that reliable information was obtained in mixed model in case of choosing MI as imputation method in missing observation structure and rates of both cases.

  10. Missing data in a multi-item instrument were best handled by multiple imputation at the item score level.

    Science.gov (United States)

    Eekhout, Iris; de Vet, Henrica C W; Twisk, Jos W R; Brand, Jaap P L; de Boer, Michiel R; Heymans, Martijn W

    2014-03-01

    Regardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced techniques such as multiple imputation (MI) are available. The objective of this study was to explore the performance of simple and more advanced methods for handling missing data in cases when some, many, or all item scores are missing in a multi-item instrument. Real-life missing data situations were simulated in a multi-item variable used as a covariate in a linear regression model. Various missing data mechanisms were simulated with an increasing percentage of missing data. Subsequently, several techniques to handle missing data were applied to decide on the most optimal technique for each scenario. Fitted regression coefficients were compared using the bias and coverage as performance parameters. Mean imputation caused biased estimates in every missing data scenario when data are missing for more than 10% of the subjects. Furthermore, when a large percentage of subjects had missing items (>25%), MI methods applied to the items outperformed methods applied to the total score. We recommend applying MI to the item scores to get the most accurate regression model estimates. Moreover, we advise not to use any form of mean imputation to handle missing data. Copyright © 2014 Elsevier Inc. All rights reserved.

  11. Imputation of the rare HOXB13 G84E mutation and cancer risk in a large population-based cohort.

    Directory of Open Access Journals (Sweden)

    Thomas J Hoffmann

    2015-01-01

    Full Text Available An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into large well-phenotyped cohorts with existing genome-wide genotype data using large sequenced referenced panels. The success of this approach hinges on the accuracy of rare variant imputation, which remains controversial. For example, a recent study suggested that one cannot adequately impute the HOXB13 G84E mutation associated with prostate cancer risk (carrier frequency of 0.0034 in European ancestry participants in the 1000 Genomes Project. We show that by utilizing the 1000 Genomes Project data plus an enriched reference panel of mutation carriers we were able to accurately impute the G84E mutation into a large cohort of 83,285 non-Hispanic White participants from the Kaiser Permanente Research Program on Genes, Environment and Health Genetic Epidemiology Research on Adult Health and Aging cohort. Imputation authenticity was confirmed via a novel classification and regression tree method, and then empirically validated analyzing a subset of these subjects plus an additional 1,789 men from Kaiser specifically genotyped for the G84E mutation (r2 = 0.57, 95% CI = 0.37–0.77. We then show the value of this approach by using the imputed data to investigate the impact of the G84E mutation on age-specific prostate cancer risk and on risk of fourteen other cancers in the cohort. The age-specific risk of prostate cancer among G84E mutation carriers was higher than among non-carriers. Risk estimates from Kaplan-Meier curves were 36.7% versus 13.6% by age 72, and 64.2% versus 24.2% by age 80, for G84E mutation carriers and non-carriers, respectively (p = 3.4x10-12. The G84E mutation was also associated with an increase in risk for the fourteen other most common cancers considered collectively (p = 5.8x10-4 and more so in cases diagnosed with multiple cancer types, both those including and not including prostate cancer, strongly suggesting

  12. Bootstrap imputation minimized misclassification bias when measuring Colles' fracture prevalence and its associations using health administrative data.

    Science.gov (United States)

    van Walraven, Carl

    2018-04-01

    Misclassification bias can result from the incorrect assignment of disease status using inaccurate diagnostic codes in health administrative data. This study quantified misclassification bias in the study of Colles' fracture. Colles' fracture status was determined in all patients >50 years old seen in the emergency room at a single teaching hospital between 2006 and 2014 by manually reviewing all forearm radiographs. This data set was linked to population-based data capturing all emergency room visits. Reference disease prevalence and its association with covariates were measured. A multivariate model using covariates derived from administrative data was used to impute Colles' fracture status and measure its prevalence and associations using bootstrapping methods. These values were compared with reference values to measure misclassification bias. This was repeated using diagnostic codes to determine Colles' fracture status. Five hundred eighteen thousand, seven hundred forty-four emergency visits were included with 3,538 (0.7%) having a Colles' fracture. Determining disease status using the diagnostic code (sensitivity 69.4%, positive predictive value 79.9%) resulted in significant underestimate of Colles' fracture prevalence (relative difference -13.3%) and biased associations with covariates. The Colles' fracture model accurately determined disease probability (c-statistic 98.9 [95% confidence interval {CI} 98.7-99.1], calibration slope 1.009 [95% CI 1.004-1.013], Nagelkerke's R 2 0.71 [95% CI 0.70-0.72]). Using disease probability estimates from this model, bootstrap imputation (BI) resulted in minimal misclassification bias (relative difference in disease prevalence -0.01%). The statistical significance of the association between Colles' fracture and age was accurate in 32.4% and 70.4% of samples when using the code or BI, respectively. Misclassification bias in estimating disease prevalence and its associations can be minimized with BI using accurate disease

  13. Defining, evaluating, and removing bias induced by linear imputation in longitudinal clinical trials with MNAR missing data.

    Science.gov (United States)

    Helms, Ronald W; Reece, Laura Helms; Helms, Russell W; Helms, Mary W

    2011-03-01

    Missing not at random (MNAR) post-dropout missing data from a longitudinal clinical trial result in the collection of "biased data," which leads to biased estimators and tests of corrupted hypotheses. In a full rank linear model analysis the model equation, E[Y] = Xβ, leads to the definition of the primary parameter β = (X'X)(-1)X'E[Y], and the definition of linear secondary parameters of the form θ = Lβ = L(X'X)(-1)X'E[Y], including, for example, a parameter representing a "treatment effect." These parameters depend explicitly on E[Y], which raises the questions: What is E[Y] when some elements of the incomplete random vector Y are not observed and MNAR, or when such a Y is "completed" via imputation? We develop a rigorous, readily interpretable definition of E[Y] in this context that leads directly to definitions of β, Bias(β) = E[β] - β, Bias(θ) = E[θ] - Lβ, and the extent of hypothesis corruption. These definitions provide a basis for evaluating, comparing, and removing biases induced by various linear imputation methods for MNAR incomplete data from longitudinal clinical trials. Linear imputation methods use earlier data from a subject to impute values for post-dropout missing values and include "Last Observation Carried Forward" (LOCF) and "Baseline Observation Carried Forward" (BOCF), among others. We illustrate the methods of evaluating, comparing, and removing biases and the effects of testing corresponding corrupted hypotheses via a hypothetical but very realistic longitudinal analgesic clinical trial.

  14. Determination of heterogeneous medium parameters by single fuel element method

    International Nuclear Information System (INIS)

    Veloso, M.A.F.

    1985-01-01

    The neutron pulse propagation technique was employed to study an heterogeneous system consisting of a single fuel element placed at the symmetry axis of a large cylindrical D 2 O tank. The response of system for the pulse propagation technique is related to the inverse complex relaxation length of the neutron waves also known as the system dispersion law ρ (ω). Experimental values of ρ (ω) were compared with the ones derived from Fermi age - Diffusion theory. The main purpose of the experiment was to obtain the Feinberg-Galanin thermal constant (γ), which is the logaritmic derivative of the neutron flux at the fuel-moderator interface and a such a main input data for heterogeneous reactor theory calculations. The γ thermal constant was determined as the number giving the best agreement between the theoretical and experimental values of ρ (ω). The simultaneous determination of two among four parameters η,ρ,τ and L s is possible through the intersection of dispersion laws of the pure moderator system and the fuel moderator system. The parameters τ and η were termined by this method. It was shown that the thermal constant γ and the product η ρ can be computed from the real and imaginary parts of the fuel-moderator dispersion law. The results for this evaluation scheme showns a not stable behavior of γ as a function of frequency, a result not foreseen by the theoretical model. (Author) [pt

  15. Application of Multiple Imputation Method for Missing Data Estimation

    OpenAIRE

    SER, Gazel

    2011-01-01

    The existence of missing observation in the data collected particularly in different fields of study cause researchers to make incorrect decisions at analysis stage and in generalizations of the results. Problems and solutions which are possible to be encountered at the estimation stage of missing observations were emphasized in this study. In estimating the missing observations, missing observations were assumed to be missing at random  and Markov Chain Monte Carlo technique and mul...

  16. Single-photon source engineering using a Modal Method

    DEFF Research Database (Denmark)

    Gregersen, Niels

    Solid-state sources of single indistinguishable photons are of great interest for quantum information applications. The semiconductor quantum dot embedded in a host material represents an attractive platform to realize such a single-photon source (SPS). A near-unity efficiency, defined as the num...... nanowire SPSs...

  17. Single molecule force spectroscopy: methods and applications in biology

    International Nuclear Information System (INIS)

    Shen Yi; Hu Jun

    2012-01-01

    Single molecule measurements have transformed our view of biomolecules. Owing to the ability of monitoring the activity of individual molecules, we now see them as uniquely structured, fluctuating molecules that stochastically transition between frequently many substrates, as two molecules do not follow precisely the same trajectory. Indeed, it is this discovery of critical yet short-lived substrates that were often missed in ensemble measurements that has perhaps contributed most to the better understanding of biomolecular functioning resulting from single molecule experiments. In this paper, we give a review on the three major techniques of single molecule force spectroscopy, and their applications especially in biology. The single molecular study of biotin-streptavidin interactions is introduced as a successful example. The problems and prospects of the single molecule force spectroscopy are discussed, too. (authors)

  18. Forecasting Forest Inventory Using Imputed Tree Lists for LiDAR Grid Cells and a Tree-List Growth Model

    Directory of Open Access Journals (Sweden)

    Sean M. Lamb

    2018-03-01

    Full Text Available A method to forecast forest inventory variables derived from light detection and ranging (LiDAR would increase the usefulness of such data in future forest management. We evaluated the accuracy of forecasted inventory from imputed tree lists for LiDAR grid cells (20 × 20 m in spruce (Picea sp. plantations and tree growth predicted using a locally calibrated tree-list growth model. Tree lists were imputed by matching measurements from a library of sample plots with grid cells based on planted species and the smallest sum of squared difference between six inventory variables. Total and merchantable basal area, total and merchantable volume, Lorey’s height, and quadratic mean diameter increments predicted using imputed tree lists were highly correlated (0.75–0.86 with those from measured tree lists in 98 validation plots. Percent root mean squared error ranged from 12.8–49.0% but was much lower (4.9–13.5% for plots with ≤10% LiDAR-derived error for all plot-matched variables. When compared with volumes from 15 blocks harvested 3–5 years after LiDAR acquisition, average forecasted volume differed by only 1.5%. To demonstrate the novel application of this method for operational management decisions, annual commercial thinning was planned at grid-cell resolution from 2018–2020 using forecasted inventory variables and commercial thinning eligibility rules.

  19. Treatments of Missing Values in Large National Data Affect Conclusions: The Impact of Multiple Imputation on Arthroplasty Research.

    Science.gov (United States)

    Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Su, Edwin P; Grauer, Jonathan N

    2018-03-01

    Despite the advantages of large, national datasets, one continuing concern is missing data values. Complete case analysis, where only cases with complete data are analyzed, is commonly used rather than more statistically rigorous approaches such as multiple imputation. This study characterizes the potential selection bias introduced using complete case analysis and compares the results of common regressions using both techniques following unicompartmental knee arthroplasty. Patients undergoing unicompartmental knee arthroplasty were extracted from the 2005 to 2015 National Surgical Quality Improvement Program. As examples, the demographics of patients with and without missing preoperative albumin and hematocrit values were compared. Missing data were then treated with both complete case analysis and multiple imputation (an approach that reproduces the variation and associations that would have been present in a full dataset) and the conclusions of common regressions for adverse outcomes were compared. A total of 6117 patients were included, of which 56.7% were missing at least one value. Younger, female, and healthier patients were more likely to have missing preoperative albumin and hematocrit values. The use of complete case analysis removed 3467 patients from the study in comparison with multiple imputation which included all 6117 patients. The 2 methods of handling missing values led to differing associations of low preoperative laboratory values with commonly studied adverse outcomes. The use of complete case analysis can introduce selection bias and may lead to different conclusions in comparison with the statistically rigorous multiple imputation approach. Joint surgeons should consider the methods of handling missing values when interpreting arthroplasty research. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Using Beta Coefficients to Impute Missing Correlations in Meta-Analysis Research: Reasons for Caution.

    Science.gov (United States)

    Roth, Philip L; Le, Huy; Oh, In-Sue; Van Iddekinge, Chad H; Bobko, Philip

    2018-01-25

    Meta-analysis has become a well-accepted method for synthesizing empirical research about a given phenomenon. Many meta-analyses focus on synthesizing correlations across primary studies, but some primary studies do not report correlations. Peterson and Brown (2005) suggested that researchers could use standardized regression weights (i.e., beta coefficients) to impute missing correlations. Indeed, their beta estimation procedures (BEPs) have been used in meta-analyses in a wide variety of fields. In this study, the authors evaluated the accuracy of BEPs in meta-analysis. We first examined how use of BEPs might affect results from a published meta-analysis. We then developed a series of Monte Carlo simulations that systematically compared the use of existing correlations (that were not missing) to data sets that incorporated BEPs (that impute missing correlations from corresponding beta coefficients). These simulations estimated ρ̄ (mean population correlation) and SDρ (true standard deviation) across a variety of meta-analytic conditions. Results from both the existing meta-analysis and the Monte Carlo simulations revealed that BEPs were associated with potentially large biases when estimating ρ̄ and even larger biases when estimating SDρ. Using only existing correlations often substantially outperformed use of BEPs and virtually never performed worse than BEPs. Overall, the authors urge a return to the standard practice of using only existing correlations in meta-analysis. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  1. FCMPSO: An Imputation for Missing Data Features in Heart Disease Classification

    Science.gov (United States)

    Salleh, Mohd Najib Mohd; Ashikin Samat, Nurul

    2017-08-01

    The application of data mining and machine learning in directing clinical research into possible hidden knowledge is becoming greatly influential in medical areas. Heart Disease is a killer disease around the world, and early prevention through efficient methods can help to reduce the mortality number. Medical data may contain many uncertainties, as they are fuzzy and vague in nature. Nonetheless, imprecise features data such as no values and missing values can affect quality of classification results. Nevertheless, the other complete features are still capable to give information in certain features. Therefore, an imputation approach based on Fuzzy C-Means and Particle Swarm Optimization (FCMPSO) is developed in preprocessing stage to help fill in the missing values. Then, the complete dataset is trained in classification algorithm, Decision Tree. The experiment is trained with Heart Disease dataset and the performance is analysed using accuracy, precision, and ROC values. Results show that the performance of Decision Tree is increased after the application of FCMSPO for imputation.

  2. Imputation of genotypes in Danish two-way crossbred pigs using low density panels

    DEFF Research Database (Denmark)

    Xiang, Tao; Christensen, Ole Fredslund; Legarra, Andres

    Genotype imputation is commonly used as an initial step of genomic selection. Studies on humans, plants and ruminants suggested many factors would affect the performance of imputation. However, studies rarely investigated pigs, especially crossbred pigs. In this study, different scenarios...... SNPs. This dataset will be analyzed for genomic selection in a future study...... of imputation from 5K SNPs to 7K SNPs on Danish Landrace, Yorkshire, and crossbred Landrace-Yorkshire were compared. In conclusion, genotype imputation on crossbreds performs equally well as in purebreds, when parental breeds are used as the reference panel. When the size of reference is considerably large...

  3. Multiple imputation of missing data in multilevel designs: A comparison of different strategies.

    Science.gov (United States)

    Lüdtke, Oliver; Robitzsch, Alexander; Grund, Simon

    2017-03-01

    Multiple imputation is a widely recommended means of addressing the problem of missing data in psychological research. An often-neglected requirement of this approach is that the imputation model used to generate the imputed values must be at least as general as the analysis model. For multilevel designs in which lower level units (e.g., students) are nested within higher level units (e.g., classrooms), this means that the multilevel structure must be taken into account in the imputation model. In the present article, we compare different strategies for multiply imputing incomplete multilevel data using mathematical derivations and computer simulations. We show that ignoring the multilevel structure in the imputation may lead to substantial negative bias in estimates of intraclass correlations as well as biased estimates of regression coefficients in multilevel models. We also demonstrate that an ad hoc strategy that includes dummy indicators in the imputation model to represent the multilevel structure may be problematic under certain conditions (e.g., small groups, low intraclass correlations). Imputation based on a multivariate linear mixed effects model was the only strategy to produce valid inferences under most of the conditions investigated in the simulation study. Data from an educational psychology research project are also used to illustrate the impact of the various multiple imputation strategies. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  4. Imputation and quality control steps for combining multiple genome-wide datasets.

    Science.gov (United States)

    Verma, Shefali S; de Andrade, Mariza; Tromp, Gerard; Kuivaniemi, Helena; Pugh, Elizabeth; Namjou-Khales, Bahram; Mukherjee, Shubhabrata; Jarvik, Gail P; Kottyan, Leah C; Burt, Amber; Bradford, Yuki; Armstrong, Gretta D; Derr, Kimberly; Crawford, Dana C; Haines, Jonathan L; Li, Rongling; Crosslin, David; Ritchie, Marylyn D

    2014-01-01

    The electronic MEdical Records and GEnomics (eMERGE) network brings together DNA biobanks linked to electronic health records (EHRs) from multiple institutions. Approximately 51,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R (2) (estimated correlation between the imputed and true genotypes), and the relationship between allelic R (2) and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2) were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.

  5. Highly accurate sequence imputation enables precise QTL mapping in Brown Swiss cattle.

    Science.gov (United States)

    Frischknecht, Mirjam; Pausch, Hubert; Bapst, Beat; Signer-Hasler, Heidi; Flury, Christine; Garrick, Dorian; Stricker, Christian; Fries, Ruedi; Gredler-Grandl, Birgit

    2017-12-29

    Within the last few years a large amount of genomic information has become available in cattle. Densities of genomic information vary from a few thousand variants up to whole genome sequence information. In order to combine genomic information from different sources and infer genotypes for a common set of variants, genotype imputation is required. In this study we evaluated the accuracy of imputation from high density chips to whole genome sequence data in Brown Swiss cattle. Using four popular imputation programs (Beagle, FImpute, Impute2, Minimac) and various compositions of reference panels, the accuracy of the imputed sequence variant genotypes was high and differences between the programs and scenarios were small. We imputed sequence variant genotypes for more than 1600 Brown Swiss bulls and performed genome-wide association studies for milk fat percentage at two stages of lactation. We found one and three quantitative trait loci for early and late lactation fat content, respectively. Known causal variants that were imputed from the sequenced reference panel were among the most significantly associated variants of the genome-wide association study. Our study demonstrates that whole-genome sequence information can be imputed at high accuracy in cattle populations. Using imputed sequence variant genotypes in genome-wide association studies may facilitate causal variant detection.

  6. Imputation and quality control steps for combining multiple genome-wide datasets

    Directory of Open Access Journals (Sweden)

    Shefali S Verma

    2014-12-01

    Full Text Available The electronic MEdical Records and GEnomics (eMERGE network brings together DNA biobanks linked to electronic health records (EHRs from multiple institutions. Approximately 52,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes, and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2 were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.

  7. Method for thermal processing alumina-enriched spinel single crystals

    Science.gov (United States)

    Jantzen, Carol M.

    1995-01-01

    A process for age-hardening alumina-rich magnesium aluminum spinel to obtain the desired combination of characteristics of hardness, clarity, flexural strength and toughness comprises selection of the time-temperature pair for isothermal heating followed by quenching. The time-temperature pair is selected from the region wherein the precipitate groups have the characteristics sought. The single crystal spinel is isothermally heated and will, if heated long enough pass from its single phase through two pre-precipitates and two metastable precipitates to a stable secondary phase precipitate within the spinel matrix. Quenching is done slowly at first to avoid thermal shock, then rapidly.

  8. Method: a single nucleotide polymorphism genotyping method for Wheat streak mosaic virus

    Science.gov (United States)

    2012-01-01

    Background The September 11, 2001 attacks on the World Trade Center and the Pentagon increased the concern about the potential for terrorist attacks on many vulnerable sectors of the US, including agriculture. The concentrated nature of crops, easily obtainable biological agents, and highly detrimental impacts make agroterrorism a potential threat. Although procedures for an effective criminal investigation and attribution following such an attack are available, important enhancements are still needed, one of which is the capability for fine discrimination among pathogen strains. The purpose of this study was to develop a molecular typing assay for use in a forensic investigation, using Wheat streak mosaic virus (WSMV) as a model plant virus. Method This genotyping technique utilizes single base primer extension to generate a genetic fingerprint. Fifteen single nucleotide polymorphisms (SNPs) within the coat protein and helper component-protease genes were selected as the genetic markers for this assay. Assay optimization and sensitivity testing was conducted using synthetic targets. WSMV strains and field isolates were collected from regions around the world and used to evaluate the assay for discrimination. The assay specificity was tested against a panel of near-neighbors consisting of genetic and environmental near-neighbors. Result Each WSMV strain or field isolate tested produced a unique SNP fingerprint, with the exception of three isolates collected within the same geographic location that produced indistinguishable fingerprints. The results were consistent among replicates, demonstrating the reproducibility of the assay. No SNP fingerprints were generated from organisms included in the near-neighbor panel, suggesting the assay is specific for WSMV. Using synthetic targets, a complete profile could be generated from as low as 7.15 fmoles of cDNA. Conclusion The molecular typing method presented is one tool that could be incorporated into the forensic

  9. Single-sheet identification method of heavy charged particles using ...

    Indian Academy of Sciences (India)

    of the single-sheet particle identification technique in CR-39 and CN-85 polycarbonate by plotting track cone length ... in neutron dosimetry, gamma and cosmic rays detection, heavy ion and nuclear physics and corpuscular ..... [13] R P Henke and E V Benton, Charged particle tracks in polymers: No. 5-A com- puter code for ...

  10. Effects of Single Film, Packaging Methods and Relative Humidity on ...

    African Journals Online (AJOL)

    This study was carried out to determine the effect of single film packaging and relative humidity (RH) on the moisture content and water activity of Kilishi during storage. Polypropylene (PP) and high-density polyethylene (HDPE) films were used for the packaging. Kilishi was prepared by trimming off blood vessels, fat and ...

  11. High sensitivity fluorescent single particle and single molecule detection apparatus and method

    Science.gov (United States)

    Mathies, Richard A.; Peck, Konan; Stryer, Lubert

    1990-01-01

    Apparatus is described for ultrasensitive detection of single fluorescent particles down to the single fluorescent molecule limit in a fluid or on a substrate comprising means for illuminating a predetermined volume of the fluid or area of the substrate whereby to emit light including background light from the fluid and burst of photons from particles residing in the area. The photon burst is detected in real time to generate output representative signal. The signal is received and the burst of energy from the fluorescent particles is distinguished from the background energy to provide an indication of the number, location or concentration of the particles or molecules.

  12. The Choice Method of Selected Material has influence single evaporation flash method

    International Nuclear Information System (INIS)

    Sunaryo, Geni Rina; Sumijanto; Nurul L, Siti

    2000-01-01

    The final objective of this research is to design the mini scale of desalination installation. It has been started from 1997/1998 and has been doing for this 3 years. Where the study on the assessment of various desalination system has been done in the first year and thermodynamic in the second year. In this third year, literatully study on material resistance from outside pressure has been done. The number of pressure for single evaporator flashing method is mainly depend on the temperature that applied in that system. In this paper, the configuration stage, the choice method of selecting material for main evaporator vessel, tube, tube plates, water boxes, pipework, and valves for multistage flash distillation will be described. The choice of selecting material for MSF is base on economical consideration, cheap, high resistance and easy to be maintained

  13. An Asymmetrical Space Vector Method for Single Phase Induction Motor

    DEFF Research Database (Denmark)

    Cui, Yuanhai; Blaabjerg, Frede; Andersen, Gert Karmisholt

    2002-01-01

    the motor torque performance is not good enough. This paper addresses a new control method, an asymmetrical space vector method with PWM modulation, also a three-phase inverter is used for the main winding and the auxiliary winding. This method with PWM modulation is implemented to control the motor speed...

  14. Single pass kernel k-means clustering method

    Indian Academy of Sciences (India)

    easily implemented and is suitable for large data sets, like those in data mining appli- cations. Experimental results show that, with a small loss of quality, the proposed method can significantly reduce the time taken than the conventional kernel k-means cluster- ing method. The proposed method is also compared with other ...

  15. New library construction method for single-cell genomes.

    Directory of Open Access Journals (Sweden)

    Larry Xi

    Full Text Available A central challenge in sequencing single-cell genomes is the accurate determination of point mutations, phasing of these mutations, and identifying copy number variations with few assumptions. Ideally, this is accomplished under as low sequencing coverage as possible. Here we report our attempt to meet these goals with a novel library construction and library amplification methodology. In our approach, single-cell genomic DNA is first fragmented with saturated transposition to make a primary library that uniformly covers the whole genome by short fragments. The library is then amplified by a carefully optimized PCR protocol in a uniform and synchronized fashion for next-generation sequencing. Each step of the protocol can be quantitatively characterized. Our shallow sequencing data show that the library is tightly distributed and is useful for the determination of copy number variations.

  16. Comparison on genomic predictions using three GBLUP methods and two single-step blending methods in the Nordic Holstein population

    Directory of Open Access Journals (Sweden)

    Gao Hongding

    2012-07-01

    Full Text Available Abstract Background A single-step blending approach allows genomic prediction using information of genotyped and non-genotyped animals simultaneously. However, the combined relationship matrix in a single-step method may need to be adjusted because marker-based and pedigree-based relationship matrices may not be on the same scale. The same may apply when a GBLUP model includes both genomic breeding values and residual polygenic effects. The objective of this study was to compare single-step blending methods and GBLUP methods with and without adjustment of the genomic relationship matrix for genomic prediction of 16 traits in the Nordic Holstein population. Methods The data consisted of de-regressed proofs (DRP for 5 214 genotyped and 9 374 non-genotyped bulls. The bulls were divided into a training and a validation population by birth date, October 1, 2001. Five approaches for genomic prediction were used: 1 a simple GBLUP method, 2 a GBLUP method with a polygenic effect, 3 an adjusted GBLUP method with a polygenic effect, 4 a single-step blending method, and 5 an adjusted single-step blending method. In the adjusted GBLUP and single-step methods, the genomic relationship matrix was adjusted for the difference of scale between the genomic and the pedigree relationship matrices. A set of weights on the pedigree relationship matrix (ranging from 0.05 to 0.40 was used to build the combined relationship matrix in the single-step blending method and the GBLUP method with a polygenetic effect. Results Averaged over the 16 traits, reliabilities of genomic breeding values predicted using the GBLUP method with a polygenic effect (relative weight of 0.20 were 0.3% higher than reliabilities from the simple GBLUP method (without a polygenic effect. The adjusted single-step blending and original single-step blending methods (relative weight of 0.20 had average reliabilities that were 2.1% and 1.8% higher than the simple GBLUP method, respectively. In

  17. Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing

    DEFF Research Database (Denmark)

    Hou, Yong; Wu, Kui; Shi, Xulian

    2015-01-01

    BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate-oligonucleoti......BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate...

  18. Development of nondestructive screening methods for single kernel characterization of wheat

    DEFF Research Database (Denmark)

    Nielsen, J.P.; Pedersen, D.K.; Munck, L.

    2003-01-01

    The development of nondestructive screening methods for single seed protein, vitreousness, density, and hardness index has been studied for single kernels of European wheat. A single kernel procedure was applied involving, image analysis, near-infrared transmittance (NIT) spectroscopy, laboratory...... predictability. However, by applying an averaging approach, in which single seed replicate measurements are mathematically simulated, a very good NIT prediction model was achieved. This suggests that the single seed NIT spectra contain hardness information, but that a single seed hardness method with higher...

  19. Evaluation of Multi-parameter Test Statistics for Multiple Imputation.

    Science.gov (United States)

    Liu, Yu; Enders, Craig K

    2017-01-01

    In Ordinary Least Square regression, researchers often are interested in knowing whether a set of parameters is different from zero. With complete data, this could be achieved using the gain in prediction test, hierarchical multiple regression, or an omnibus F test. However, in substantive research scenarios, missing data often exist. In the context of multiple imputation, one of the current state-of-art missing data strategies, there are several different analogous multi-parameter tests of the joint significance of a set of parameters, and these multi-parameter test statistics can be referenced to various distributions to make statistical inferences. However, little is known about the performance of these tests, and virtually no research study has compared the Type 1 error rates and statistical power of these tests in scenarios that are typical of behavioral science data (e.g., small to moderate samples, etc.). This paper uses Monte Carlo simulation techniques to examine the performance of these multi-parameter test statistics for multiple imputation under a variety of realistic conditions. We provide a number of practical recommendations for substantive researchers based on the simulation results, and illustrate the calculation of these test statistics with an empirical example.

  20. 48 CFR 1830.7002-4 - Determining imputed cost of money.

    Science.gov (United States)

    2010-10-01

    ... money. 1830.7002-4 Section 1830.7002-4 Federal Acquisition Regulations System NATIONAL AERONAUTICS AND... Determining imputed cost of money. (a) Determine the imputed cost of money for an asset under construction, fabrication, or development by applying a cost of money rate (see 1830.7002-2) to the representative...

  1. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    DEFF Research Database (Denmark)

    Tachmazidou, Ioanna; Süveges, Dániel; Min, Josine L

    2017-01-01

    Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader alleli...

  2. Multiple imputation for IPD meta-analysis: allowing for heterogeneity and studies with missing covariates.

    Science.gov (United States)

    Quartagno, M; Carpenter, J R

    2016-07-30

    Recently, multiple imputation has been proposed as a tool for individual patient data meta-analysis with sporadically missing observations, and it has been suggested that within-study imputation is usually preferable. However, such within study imputation cannot handle variables that are completely missing within studies. Further, if some of the contributing studies are relatively small, it may be appropriate to share information across studies when imputing. In this paper, we develop and evaluate a joint modelling approach to multiple imputation of individual patient data in meta-analysis, with an across-study probability distribution for the study specific covariance matrices. This retains the flexibility to allow for between-study heterogeneity when imputing while allowing (i) sharing information on the covariance matrix across studies when this is appropriate, and (ii) imputing variables that are wholly missing from studies. Simulation results show both equivalent performance to the within-study imputation approach where this is valid, and good results in more general, practically relevant, scenarios with studies of very different sizes, non-negligible between-study heterogeneity and wholly missing variables. We illustrate our approach using data from an individual patient data meta-analysis of hypertension trials. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  3. A Simplified Framework for Using Multiple Imputation in Social Work Research

    Science.gov (United States)

    Rose, Roderick A.; Fraser, Mark W.

    2008-01-01

    Missing data are nearly always a problem in research, and missing values represent a serious threat to the validity of inferences drawn from findings. Increasingly, social science researchers are turning to multiple imputation to handle missing data. Multiple imputation, in which missing values are replaced by values repeatedly drawn from…

  4. Correcting for Selective Nonresponse in the National Longitudinal Survey of Youth Using Multiple Imputation.

    Science.gov (United States)

    Davey, Adam; Shanahan, Michael J.; Schafer, Joseph L.

    2001-01-01

    Principal components analysis revealed four patterns of nonresponse on children's psychosocial adjustment, lifetime poverty experiences, and family history. Results from examining latent growth curve models using listwise deletion and multiple imputation indicated that multiple imputation corrected for selective nonresponse, providing less-biased…

  5. Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model.

    Science.gov (United States)

    Bartlett, Jonathan W; Seaman, Shaun R; White, Ian R; Carpenter, James R

    2015-08-01

    Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation. Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of multiple imputation may impute covariates from models that are incompatible with such substantive models. We show how imputation by fully conditional specification, a popular approach for performing multiple imputation, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it with existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible. Stata software implementing the approach is freely available. © The Author(s) 2014.

  6. System and method for single-phase, single-stage grid-interactive inverter

    Science.gov (United States)

    Liu, Liming; Li, Hui

    2015-09-01

    The present invention provides for the integration of distributed renewable energy sources/storages utilizing a cascaded DC-AC inverter, thereby eliminating the need for a DC-DC converter. The ability to segment the energy sources and energy storages improves the maintenance capability and system reliability of the distributed generation system, as well as achieve wide range reactive power compensation. In the absence of a DC-DC converter, single stage energy conversion can be achieved to enhance energy conversion efficiency.

  7. Imputing data that are missing at high rates using a boosting algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Cauthen, Katherine Regina [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Lambert, Gregory [Apple Inc., Cupertino, CA (United States); Ray, Jaideep [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Lefantzi, Sophia [Sandia National Lab. (SNL-CA), Livermore, CA (United States)

    2016-09-01

    Traditional multiple imputation approaches may perform poorly for datasets with high rates of missingness unless many m imputations are used. This paper implements an alternative machine learning-based approach to imputing data that are missing at high rates. Here, we use boosting to create a strong learner from a weak learner fitted to a dataset missing many observations. This approach may be applied to a variety of types of learners (models). The approach is demonstrated by application to a spatiotemporal dataset for predicting dengue outbreaks in India from meteorological covariates. A Bayesian spatiotemporal CAR model is boosted to produce imputations, and the overall RMSE from a k-fold cross-validation is used to assess imputation accuracy.

  8. [Imputing missing data in public health: general concepts and application to dichotomous variables].

    Science.gov (United States)

    Hernández, Gilma; Moriña, David; Navarro, Albert

    The presence of missing data in collected variables is common in health surveys, but the subsequent imputation thereof at the time of analysis is not. Working with imputed data may have certain benefits regarding the precision of the estimators and the unbiased identification of associations between variables. The imputation process is probably still little understood by many non-statisticians, who view this process as highly complex and with an uncertain goal. To clarify these questions, this note aims to provide a straightforward, non-exhaustive overview of the imputation process to enable public health researchers ascertain its strengths. All this in the context of dichotomous variables which are commonplace in public health. To illustrate these concepts, an example in which missing data is handled by means of simple and multiple imputation is introduced. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.

  9. Single pass kernel k-means clustering method

    Indian Academy of Sciences (India)

    In unsupervised classification, kernel -means clustering method has been shown to perform better than conventional -means clustering method in ... 518501, India; Department of Computer Science and Engineering, Jawaharlal Nehru Technological University, Anantapur College of Engineering, Anantapur 515002, India ...

  10. Single pass kernel k-means clustering method

    Indian Academy of Sciences (India)

    This approach has reduced both time complexity and memory requirements. However, the clustering result of this method will be very much deviated form that obtained using the conventional kernel k-means method. This is because of the fact that pseudo cluster centers in the input space may not represent the exact cluster ...

  11. Anti-dynamic-crosstalk method for single photon LIDAR detection

    Science.gov (United States)

    Zhang, Fan; Liu, Qiang; Gong, Mali; Fu, Xing

    2017-11-01

    With increasing number of vehicles equipped with light detection and ranging (LIDAR), crosstalk is identified as a critical and urgent issue in the range detection for active collision avoidance. Chaotic pulse position modulation (CPPM) applied in the transmitting pulse train has been shown to prevent crosstalk as well as range ambiguity. However, static and unified strategy on discrimination threshold and the number of accumulated pulse is not valid against crosstalk with varying number of sources and varying intensity of each source. This paper presents an adaptive algorithm to distinguish the target echo from crosstalk with dynamic and unknown level of intensity in the context of intelligent vehicles. New strategy is given based on receiver operating characteristics (ROC) curves that consider the detection requirements of the probability of detection and false alarm for the scenario with varying crosstalk. In the adaptive algorithm, the detected results are compared by the new strategy with both the number of accumulated pulses and the threshold being raised step by step, so that the target echo can be exactly identified from crosstalk with the dynamic and unknown level of intensity. The validity of the algorithm has been verified through the experiments with a single photon detector and the time correlated single photo counting (TCSPC) technique, demonstrating a marked drop in required shots for identifying the target compared with static and unified strategy

  12. Optimal sampling strategies to assess inulin clearance in children by the inulin single-injection method

    NARCIS (Netherlands)

    van Rossum, Lyonne K.; Mathot, Ron A. A.; Cransberg, Karlien; Vulto, Arnold G.

    2003-01-01

    Glomerular filtration rate in patients can be determined by estimating the plasma clearance of inulin with the single-injection method. In this method, a single bolus injection of inulin is administered and several blood samples are collected. For practical and convenient application of this method

  13. Double-bootstrap methods that use a single double-bootstrap simulation

    OpenAIRE

    Chang, Jinyuan; Hall, Peter

    2014-01-01

    We show that, when the double bootstrap is used to improve performance of bootstrap methods for bias correction, techniques based on using a single double-bootstrap sample for each single-bootstrap sample can be particularly effective. In particular, they produce third-order accuracy for much less computational expense than is required by conventional double-bootstrap methods. However, this improved level of performance is not available for the single double-bootstrap methods that have been s...

  14. Comparing results from multiple imputation and dynamic marginal structural models for estimating when to start antiretroviral therapy.

    Science.gov (United States)

    Shepherd, Bryan E; Liu, Qi; Mercaldo, Nathaniel; Jenkins, Cathy A; Lau, Bryan; Cole, Stephen R; Saag, Michael S; Sterling, Timothy R

    2016-10-30

    Optimal timing of initiating antiretroviral therapy has been a controversial topic in HIV research. Two highly publicized studies applied different analytical approaches, a dynamic marginal structural model and a multiple imputation method, to different observational databases and came up with different conclusions. Discrepancies between the two studies' results could be due to differences between patient populations, fundamental differences between statistical methods, or differences between implementation details. For example, the two studies adjusted for different covariates, compared different thresholds, and had different criteria for qualifying measurements. If both analytical approaches were applied to the same cohort holding technical details constant, would their results be similar? In this study, we applied both statistical approaches using observational data from 12,708 HIV-infected persons throughout the USA. We held technical details constant between the two methods and then repeated analyses varying technical details to understand what impact they had on findings. We also present results applying both approaches to simulated data. Results were similar, although not identical, when technical details were held constant between the two statistical methods. Confidence intervals for the dynamic marginal structural model tended to be wider than those from the imputation approach, although this may have been due in part to additional external data used in the imputation analysis. We also consider differences in the estimands, required data, and assumptions of the two statistical methods. Our study provides insights into assessing optimal dynamic treatment regimes in the context of starting antiretroviral therapy and in more general settings. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  15. Single Station System and Method of Locating Lightning Strikes

    Science.gov (United States)

    Medelius, Pedro J. (Inventor); Starr, Stanley O. (Inventor)

    2003-01-01

    An embodiment of the present invention uses a single detection system to approximate a location of lightning strikes. This system is triggered by a broadband RF detector and measures a time until the arrival of a leading edge of the thunder acoustic pulse. This time difference is used to determine a slant range R from the detector to the closest approach of the lightning. The azimuth and elevation are determined by an array of acoustic sensors. The leading edge of the thunder waveform is cross-correlated between the various acoustic sensors in the array to determine the difference in time of arrival, AT. A set of AT S is used to determine the direction of arrival, AZ and EL. The three estimated variables (R, AZ, EL) are used to locate a probable point of the lightning strike.

  16. Large pyramid shaped single crystals of BiFeO{sub 3} by solvothermal synthesis method

    Energy Technology Data Exchange (ETDEWEB)

    Sornadurai, D.; Ravindran, T. R.; Paul, V. Thomas; Sastry, V. Sankara [Condensed Matter Physics Division, Materials Science Group, Physical Metallurgy Division, Metallurgy and Materials Group, Indira Gandhi Centre for Atomic Research, Kalpakkam, Tamil Nadu (India); Condensed Matter Physics Division, Materials Science Group (India)

    2012-06-05

    Synthesis parameters are optimized in order to grow single crystals of multiferroic BiFeO{sub 3}. 2 to 3 mm size pyramid (tetrahedron) shaped single crystals were successfully obtained by solvothermal method. Scanning electron microscopy with EDAX confirmed the phase formation. Raman scattering spectra of bulk BiFeO3 single crystals have been measured which match well with reported spectra.

  17. Quantitative Single-letter Sequencing: a method for simultaneously monitoring numerous known allelic variants in single DNA samples

    Directory of Open Access Journals (Sweden)

    Duborjal Hervé

    2008-02-01

    Full Text Available Abstract Background Pathogens such as fungi, bacteria and especially viruses, are highly variable even within an individual host, intensifying the difficulty of distinguishing and accurately quantifying numerous allelic variants co-existing in a single nucleic acid sample. The majority of currently available techniques are based on real-time PCR or primer extension and often require multiplexing adjustments that impose a practical limitation of the number of alleles that can be monitored simultaneously at a single locus. Results Here, we describe a novel method that allows the simultaneous quantification of numerous allelic variants in a single reaction tube and without multiplexing. Quantitative Single-letter Sequencing (QSS begins with a single PCR amplification step using a pair of primers flanking the polymorphic region of interest. Next, PCR products are submitted to single-letter sequencing with a fluorescently-labelled primer located upstream of the polymorphic region. The resulting monochromatic electropherogram shows numerous specific diagnostic peaks, attributable to specific variants, signifying their presence/absence in the DNA sample. Moreover, peak fluorescence can be quantified and used to estimate the frequency of the corresponding variant in the DNA population. Using engineered allelic markers in the genome of Cauliflower mosaic virus, we reliably monitored six different viral genotypes in DNA extracted from infected plants. Evaluation of the intrinsic variance of this method, as applied to both artificial plasmid DNA mixes and viral genome populations, demonstrates that QSS is a robust and reliable method of detection and quantification for variants with a relative frequency of between 0.05 and 1. Conclusion This simple method is easily transferable to many other biological systems and questions, including those involving high throughput analysis, and can be performed in any laboratory since it does not require specialized

  18. A Method for Turbocharging Four-Stroke Single Cylinder Engines

    Science.gov (United States)

    Buchman, Michael; Winter, Amos

    2014-11-01

    Turbocharging is not conventionally used with single cylinder engines due to the timing mismatch between when the turbo is powered and when it can deliver air to the cylinder. The proposed solution involves a fixed, pressurized volume - which we call an air capacitor - on the intake side of the engine between the turbocharger and intake valves. The capacitor acts as a buffer and would be implemented as a new style of intake manifold with a larger volume than traditional systems. This talk will present the flow analysis used to determine the optimal size for the capacitor, which was found to be four to five times the engine capacity, as well as its anticipated contributions to engine performance. For a capacitor sized for a one-liter engine, the time to reach operating pressure was found to be approximately two seconds, which would be acceptable for slowly accelerating applications and steady state applications. The air density increase that could be achieved, compared to ambient air, was found to vary between fifty percent for adiabatic compression and no heat transfer from the capacitor, to eighty percent for perfect heat transfer. These increases in density are proportional to, to first order, the anticipated power increases that could be realized. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. 1122374.

  19. Polymorphic transitions in single crystals: A new molecular dynamics method

    Energy Technology Data Exchange (ETDEWEB)

    Parrinello, M.; Rahman, A.

    1981-12-01

    A new Lagrangian formulation is introduced. It can be used to make molecular dynamics (MD) calculations on systems under the most general, externally applied, conditions of stress. In this formulation the MD cell shape and size can change according to dynamical equations given by this Lagrangian. This new MD technique is well suited to the study of structural transformations in solids under external stress and at finite temperature. As an example of the use of this technique we show how a single crystal of Ni behaves under uniform uniaxial compressive and tensile loads. This work confirms some of the results of static (i.e., zero temperature) calculations reported in the literature. We also show that some results regarding the stress-strain relation obtained by static calculations are invalid at finite temperature. We find that, under compressive loading, our model of Ni shows a bifurcation in its stress-strain relation; this bifurcation provides a link in configuration space between cubic and hexagonal close packing. It is suggested that such a transformation could perhaps be observed experimentally under extreme conditions of shock.

  20. Power decoupling method for single phase differential buck converter

    DEFF Research Database (Denmark)

    Yao, Wenli; Tang, Yi; Zhang, Xiaobin

    2015-01-01

    inverter to improve the dc link power quality, and an improved active power decoupling method is proposed to achieve ripple power reduction for both AC-DC and DC-AC conversions. The ripple energy storage is realized by the filter capacitors, which are connected between the output terminal and the negative...... generation technique is proposed to provide accurate ripple power compensation, and closed-loop controllers are also designed based on small signal models. The effectiveness of this power decoupling method is verified by detailed simulation studies as well as laboratory prototype experimental results....... dc bus. By properly controlling the differential mode voltage of the capacitors, it is possible to transfer desired energy between the DC port and AC port. The common mode voltage is controlled in such a way that the ripple power on the dc side will be reduced. Furthermore, an autonomous reference...

  1. Heuristic methods for single link shared backup path protection

    DEFF Research Database (Denmark)

    Haahr, Jørgen Thorlund; Stidsen, Thomas Riis; Zachariasen, Martin

    2014-01-01

    schemes are employed. In contrast to manual intervention, automatic protection schemes such as shared backup path protection (SBPP) can recover from failure quickly and efficiently. SBPP is a simple but efficient protection scheme that can be implemented in backbone networks with technology available...... today. In SBPP backup paths are planned in advance for every failure scenario in order to recover from failures quickly and efficiently. Planning SBPP is an NP-hard optimization problem, and previous work confirms that it is time-consuming to solve the problem in practice using exact methods.We present...... heuristic algorithms and lower bound methods for the SBPP planning problem. Experimental results show that the heuristic algorithms are able to find good quality solutions in minutes. A solution gap of less than 3.5 % was achieved for 5 of 7 benchmark instances (and a gap of less than 11 % for the remaining...

  2. Single photon imaging and timing array sensor apparatus and method

    Science.gov (United States)

    Smith, R. Clayton

    2003-06-24

    An apparatus and method are disclosed for generating a three-dimension image of an object or target. The apparatus is comprised of a photon source for emitting a photon at a target. The emitted photons are received by a photon receiver for receiving the photon when reflected from the target. The photon receiver determines a reflection time of the photon and further determines an arrival position of the photon on the photon receiver. An analyzer is communicatively coupled to the photon receiver, wherein the analyzer generates a three-dimensional image of the object based upon the reflection time and the arrival position.

  3. Methods for producing single crystal mixed halide perovskites

    Energy Technology Data Exchange (ETDEWEB)

    Zhu, Kai; Zhao, Yixin

    2017-07-11

    An aspect of the present invention is a method that includes contacting a metal halide and a first alkylammonium halide in a solvent to form a solution and maintaining the solution at a first temperature, resulting in the formation of at least one alkylammonium halide perovskite crystal, where the metal halide includes a first halogen and a metal, the first alkylammonium halide includes the first halogen, the at least one alkylammonium halide perovskite crystal includes the metal and the first halogen, and the first temperature is above about 21.degree. C.

  4. Multiple imputation of missing values was not necessary before performing a longitudinal mixed-model analysis.

    Science.gov (United States)

    Twisk, Jos; de Boer, Michiel; de Vente, Wieke; Heymans, Martijn

    2013-09-01

    As a result of the development of sophisticated techniques, such as multiple imputation, the interest in handling missing data in longitudinal studies has increased enormously in past years. Within the field of longitudinal data analysis, there is a current debate on whether it is necessary to use multiple imputations before performing a mixed-model analysis to analyze the longitudinal data. In the current study this necessity is evaluated. The results of mixed-model analyses with and without multiple imputation were compared with each other. Four data sets with missing values were created-one data set with missing completely at random, two data sets with missing at random, and one data set with missing not at random). In all data sets, the relationship between a continuous outcome variable and two different covariates were analyzed: a time-independent dichotomous covariate and a time-dependent continuous covariate. Although for all types of missing data, the results of the mixed-model analysis with or without multiple imputations were slightly different, they were not in favor of one of the two approaches. In addition, repeating the multiple imputations 100 times showed that the results of the mixed-model analysis with multiple imputation were quite unstable. It is not necessary to handle missing data using multiple imputations before performing a mixed-model analysis on longitudinal data. Copyright © 2013 Elsevier Inc. All rights reserved.

  5. Multiple imputation of missing covariates for the Cox proportional hazards cure model.

    Science.gov (United States)

    Beesley, Lauren J; Bartlett, Jonathan W; Wolf, Gregory T; Taylor, Jeremy M G

    2016-11-20

    We explore several approaches for imputing partially observed covariates when the outcome of interest is a censored event time and when there is an underlying subset of the population that will never experience the event of interest. We call these subjects 'cured', and we consider the case where the data are modeled using a Cox proportional hazards (CPH) mixture cure model. We study covariate imputation approaches using fully conditional specification. We derive the exact conditional distribution and suggest a sampling scheme for imputing partially observed covariates in the CPH cure model setting. We also propose several approximations to the exact distribution that are simpler and more convenient to use for imputation. A simulation study demonstrates that the proposed imputation approaches outperform existing imputation approaches for survival data without a cure fraction in terms of bias in estimating CPH cure model parameters. We apply our multiple imputation techniques to a study of patients with head and neck cancer. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  6. Biases in multilevel analyses caused by cluster-specific fixed-effects imputation.

    Science.gov (United States)

    Speidel, Matthias; Drechsler, Jörg; Sakshaug, Joseph W

    2017-08-24

    When datasets are affected by nonresponse, imputation of the missing values is a viable solution. However, most imputation routines implemented in commonly used statistical software packages do not accommodate multilevel models that are popular in education research and other settings involving clustering of units. A common strategy to take the hierarchical structure of the data into account is to include cluster-specific fixed effects in the imputation model. Still, this ad hoc approach has never been compared analytically to the congenial multilevel imputation in a random slopes setting. In this paper, we evaluate the impact of the cluster-specific fixed-effects imputation model on multilevel inference. We show analytically that the cluster-specific fixed-effects imputation strategy will generally bias inferences obtained from random coefficient models. The bias of random-effects variances and global fixed-effects confidence intervals depends on the cluster size, the relation of within- and between-cluster variance, and the missing data mechanism. We illustrate the negative implications of cluster-specific fixed-effects imputation using simulation studies and an application based on data from the National Educational Panel Study (NEPS) in Germany.

  7. Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels

    Directory of Open Access Journals (Sweden)

    Xiaoyi eGao

    2012-06-01

    Full Text Available Genotype imputation is a vital tool in genome-wide association studies (GWAS and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for reference panels. Multiple GWAS and meta-analyses are targeting Latinos, the most populous and fastest growing minority group in the US. However, genotype imputation resources for Latinos are rather limited compared to individuals of European ancestry at present, largely because of the lack of good reference data. One choice of reference panel for Latinos is one derived from the population of Mexican individuals in Los Angeles contained in the HapMap Phase 3 project and the 1000 Genomes Project. However, a detailed evaluation of the quality of the imputed genotypes derived from the public reference panels has not yet been reported. Using simulation studies, the Illumina OmniExpress GWAS data from the Los Angles Latino Eye Study and the MACH software package, we evaluated the accuracy of genotype imputation in Latinos. Our results show that the 1000 Genomes Project AMR+CEU+YRI reference panel provides the highest imputation accuracy for Latinos, and that also including Asian samples in the panel can reduce imputation accuracy. We also provide the imputation accuracy for each autosomal chromosome using the 1000 Genomes Project panel for Latinos. Our results serve as a guide to future imputation-based analysis in Latinos.

  8. Statistical Methods for Single-Particle Electron Cryomicroscopy

    DEFF Research Database (Denmark)

    Jensen, Katrine Hommelhoff

    , several randomly oriented copies of the protein are available, each representing a certain viewing direction of the structure. This implies two main computational problems: (1) to determine the angular relationship between the individual projection images, i.e. determine the protein pose in each view...... from the noisy, randomly oriented projection images. Many statistical approaches to SPR have been proposed in the past. Typically, due to the computation time complexity, they rely on approximated maximum likelihood (ML) or maximum a posteriori (MAP) estimate of the structure. All methods presented...... statistical inversion to optimally cope with the high amount of noise, as well as to incorporate prior information to obtain more reliable estimates. For the first problem, we investigate the statistical recovery of the geometry between a set of projection images. In more detail, we show the equivalence...

  9. Increasing imputation and prediction accuracy for Chinese Holsteins using joint Chinese-Nordic reference population

    DEFF Research Database (Denmark)

    Ma, Peipei; Lund, Mogens Sandø; Ding, X

    2015-01-01

    This study investigated the effect of including Nordic Holsteins in the reference population on the imputation accuracy and prediction accuracy for Chinese Holsteins. The data used in this study include 85 Chinese Holstein bulls genotyped with both 54K chip and 777K (HD) chip, 2862 Chinese cows...... was improved slightly when using the marker data imputed based on the combined HD reference data, compared with using the marker data imputed based on the Chinese HD reference data only. On the other hand, when using the combined reference population including 4398 Nordic Holstein bulls, the accuracy...... to increase reference population rather than increasing marker density...

  10. Improving accuracy of genomic prediction in Brangus cattle by adding animals with imputed low-density SNP genotypes.

    Science.gov (United States)

    Lopes, F B; Wu, X-L; Li, H; Xu, J; Perkins, T; Genho, J; Ferretti, R; Tait, R G; Bauck, S; Rosa, G J M

    2018-02-01

    Reliable genomic prediction of breeding values for quantitative traits requires the availability of sufficient number of animals with genotypes and phenotypes in the training set. As of 31 October 2016, there were 3,797 Brangus animals with genotypes and phenotypes. These Brangus animals were genotyped using different commercial SNP chips. Of them, the largest group consisted of 1,535 animals genotyped by the GGP-LDV4 SNP chip. The remaining 2,262 genotypes were imputed to the SNP content of the GGP-LDV4 chip, so that the number of animals available for training the genomic prediction models was more than doubled. The present study showed that the pooling of animals with both original or imputed 40K SNP genotypes substantially increased genomic prediction accuracies on the ten traits. By supplementing imputed genotypes, the relative gains in genomic prediction accuracies on estimated breeding values (EBV) were from 12.60% to 31.27%, and the relative gain in genomic prediction accuracies on de-regressed EBV was slightly small (i.e. 0.87%-18.75%). The present study also compared the performance of five genomic prediction models and two cross-validation methods. The five genomic models predicted EBV and de-regressed EBV of the ten traits similarly well. Of the two cross-validation methods, leave-one-out cross-validation maximized the number of animals at the stage of training for genomic prediction. Genomic prediction accuracy (GPA) on the ten quantitative traits was validated in 1,106 newly genotyped Brangus animals based on the SNP effects estimated in the previous set of 3,797 Brangus animals, and they were slightly lower than GPA in the original data. The present study was the first to leverage currently available genotype and phenotype resources in order to harness genomic prediction in Brangus beef cattle. © 2018 Blackwell Verlag GmbH.

  11. Multiple imputation of rainfall missing data in the Iberian Mediterranean context

    Science.gov (United States)

    Miró, Juan Javier; Caselles, Vicente; Estrela, María José

    2017-11-01

    Given the increasing need for complete rainfall data networks, in recent years have been proposed diverse methods for filling gaps in observed precipitation series, progressively more advanced that traditional approaches to overcome the problem. The present study has consisted in validate 10 methods (6 linear, 2 non-linear and 2 hybrid) that allow multiple imputation, i.e., fill at the same time missing data of multiple incomplete series in a dense network of neighboring stations. These were applied for daily and monthly rainfall in two sectors in the Júcar River Basin Authority (east Iberian Peninsula), which is characterized by a high spatial irregularity and difficulty of rainfall estimation. A classification of precipitation according to their genetic origin was applied as pre-processing, and a quantile-mapping adjusting as post-processing technique. The results showed in general a better performance for the non-linear and hybrid methods, highlighting that the non-linear PCA (NLPCA) method outperforms considerably the Self Organizing Maps (SOM) method within non-linear approaches. On linear methods, the Regularized Expectation Maximization method (RegEM) was the best, but far from NLPCA. Applying EOF filtering as post-processing of NLPCA (hybrid approach) yielded the best results.

  12. Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy.

    Science.gov (United States)

    Bouwman, Aniek C; Veerkamp, Roel F

    2014-10-03

    The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chicken breeders, who have to choose wisely how to spend their sequencing efforts over all the breeds or lines they evaluate. Sequence data from cattle breeds was used, because there are currently relatively many individuals from several breeds sequenced within the 1,000 Bull Genomes project. The advantage of whole-genome sequence data is that it carries the causal mutations, but the question is whether it is possible to impute the causal variants accurately. This study therefore focussed on imputation accuracy of variants with low minor allele frequency and breed specific variants. Imputation accuracy was assessed for chromosome 1 and 29 as the correlation between observed and imputed genotypes. For chromosome 1, the average imputation accuracy was 0.70 with a reference population of 20 Holstein, and increased to 0.83 when the reference population was increased by including 3 other dairy breeds with 20 animals each. When the same amount of animals from the Holstein breed were added the accuracy improved to 0.88, while adding the 3 other breeds to the reference population of 80 Holstein improved the average imputation accuracy marginally to 0.89. For chromosome 29, the average imputation accuracy was lower. Some variants benefitted from the inclusion of other breeds in the reference population, initially determined by the MAF of the variant in each breed, but even Holstein specific variants did gain imputation accuracy from the multi-breed reference population. This study shows that splitting sequencing effort over multiple breeds and combining the reference populations is a good strategy for imputation from high-density SNP panels towards whole-genome sequence when reference

  13. Method for obtaining secondary amines from nitrobenzene in a single reactor

    OpenAIRE

    Corma, Avelino; Leyva, Antonio; Rubio Marqués, Paula

    2012-01-01

    [EN] The invention relates to a method for directly obtaining secondary amines, such as, for example, cyclohexylaniline or dicyclohexylaniline and substituted derivatives thereof, from nitrobenzene and derivatives in a single reactor, characterised in that: a nitrobenzene derivative, a solid catalyst, a solvent, an acid and hydrogen are introduced into the reactor, and it comprises a single step (one pot)

  14. Photovoltaic device using single wall carbon nanotubes and method of fabricating the same

    Science.gov (United States)

    Biris, Alexandru S.; Li, Zhongrui

    2012-11-06

    A photovoltaic device and methods for forming the same. In one embodiment, the photovoltaic device has a silicon substrate, and a film comprising a plurality of single wall carbon nanotubes disposed on the silicon substrate, wherein the plurality of single wall carbon nanotubes forms a plurality of heterojunctions with the silicon in the substrate.

  15. Single-cell epigenomics: powerful new methods for understanding gene regulation and cell identity.

    Science.gov (United States)

    Clark, Stephen J; Lee, Heather J; Smallwood, Sébastien A; Kelsey, Gavin; Reik, Wolf

    2016-04-18

    Emerging single-cell epigenomic methods are being developed with the exciting potential to transform our knowledge of gene regulation. Here we review available techniques and future possibilities, arguing that the full potential of single-cell epigenetic studies will be realized through parallel profiling of genomic, transcriptional, and epigenetic information.

  16. Predictors of clinical outcome in pediatric oligodendroglioma: meta-analysis of individual patient data and multiple imputation.

    Science.gov (United States)

    Wang, Kevin Yuqi; Vankov, Emilian R; Lin, Doris Da May

    2018-02-01

    OBJECTIVE Oligodendroglioma is a rare primary CNS neoplasm in the pediatric population, and only a limited number of studies in the literature have characterized this entity. Existing studies are limited by small sample sizes and discrepant interstudy findings in identified prognostic factors. In the present study, the authors aimed to increase the statistical power in evaluating for potential prognostic factors of pediatric oligodendrogliomas and sought to reconcile the discrepant findings present among existing studies by performing an individual-patient-data (IPD) meta-analysis and using multiple imputation to address data not directly available from existing studies. METHODS A systematic search was performed, and all studies found to be related to pediatric oligodendrogliomas and associated outcomes were screened for inclusion. Each study was searched for specific demographic and clinical characteristics of each patient and the duration of event-free survival (EFS) and overall survival (OS). Given that certain demographic and clinical information of each patient was not available within all studies, a multivariable imputation via chained equations model was used to impute missing data after the mechanism of missing data was determined. The primary end points of interest were hazard ratios for EFS and OS, as calculated by the Cox proportional-hazards model. Both univariate and multivariate analyses were performed. The multivariate model was adjusted for age, sex, tumor grade, mixed pathologies, extent of resection, chemotherapy, radiation therapy, tumor location, and initial presentation. A p value of less than 0.05 was considered statistically significant. RESULTS A systematic search identified 24 studies with both time-to-event and IPD characteristics available, and a total of 237 individual cases were available for analysis. A median of 19.4% of the values among clinical, demographic, and outcome variables in the compiled 237 cases were missing. Multivariate

  17. Hierarchical imputation of systematically and sporadically missing data: An approximate Bayesian approach using chained equations.

    Science.gov (United States)

    Jolani, Shahab

    2018-03-01

    In health and medical sciences, multiple imputation (MI) is now becoming popular to obtain valid inferences in the presence of missing data. However, MI of clustered data such as multicenter studies and individual participant data meta-analysis requires advanced imputation routines that preserve the hierarchical structure of data. In clustered data, a specific challenge is the presence of systematically missing data, when a variable is completely missing in some clusters, and sporadically missing data, when it is partly missing in some clusters. Unfortunately, little is known about how to perform MI when both types of missing data occur simultaneously. We develop a new class of hierarchical imputation approach based on chained equations methodology that simultaneously imputes systematically and sporadically missing data while allowing for arbitrary patterns of missingness among them. Here, we use a random effect imputation model and adopt a simplification over fully Bayesian techniques such as Gibbs sampler to directly obtain draws of parameters within each step of the chained equations. We justify through theoretical arguments and extensive simulation studies that the proposed imputation methodology has good statistical properties in terms of bias and coverage rates of parameter estimates. An illustration is given in a case study with eight individual participant datasets. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. Missing Value Imputation Based on Gaussian Mixture Model for the Internet of Things

    Directory of Open Access Journals (Sweden)

    Xiaobo Yan

    2015-01-01

    Full Text Available This paper addresses missing value imputation for the Internet of Things (IoT. Nowadays, the IoT has been used widely and commonly by a variety of domains, such as transportation and logistics domain and healthcare domain. However, missing values are very common in the IoT for a variety of reasons, which results in the fact that the experimental data are incomplete. As a result of this, some work, which is related to the data of the IoT, can’t be carried out normally. And it leads to the reduction in the accuracy and reliability of the data analysis results. This paper, for the characteristics of the data itself and the features of missing data in IoT, divides the missing data into three types and defines three corresponding missing value imputation problems. Then, we propose three new models to solve the corresponding problems, and they are model of missing value imputation based on context and linear mean (MCL, model of missing value imputation based on binary search (MBS, and model of missing value imputation based on Gaussian mixture model (MGI. Experimental results showed that the three models can improve the accuracy, reliability, and stability of missing value imputation greatly and effectively.

  19. Application of dead-reckoning in the single-base station location and tracking method

    Science.gov (United States)

    Zhang, Ruyun; Fan, Dandan; Xu, Mingyan; Chen, Ming

    2007-11-01

    A location and tracking method utilizing Dead-Reckoning using single base station in cellular networks is proposed in this paper. Extended Kalman Filter (EKF) and the location method utilizing the AOA and the variance ratio of AOA are introduced to smooth the measurement error. The simulation result is illustrated to explain the location and tracking efficiency of the proposed method.

  20. A new method for solving single and multi-objective fuzzy minimum ...

    Indian Academy of Sciences (India)

    Several authors have proposed different methods for solving fuzzy minimum cost flow (MCF) problems. In this paper, some single and multi-objective fuzzy MCF problems are chosen which cannot be solved by using any of the existing methods and a new method is proposed for solving such type of problems. The main ...

  1. Servizi finanziari imputati e interdipendenze settoriali: un'analisi settoriale del ruolo del credito nel sistema economico. (Imputed bank services and sectoral interdependences: a structural analysis of the role of credit in the economy

    Directory of Open Access Journals (Sweden)

    C. BIANCHI

    2013-12-01

    Full Text Available I sistemi di contabilità nazionale in base alla metodologia SEC sono soliti comportarsi in modo da garantire l'impossibilità pratica di effettuare qualsiasi assegnazione significativa di servizi bancari imputati tra i singoli rami di attività economica.  Il presente lavoro mostra come questo vieta l'analisi strutturale del ruolo del credito nel sistema di interdipendenze. . L'analisi mette in evidenza la duplice natura del credito come contenuti a valore aggiunto altamente intermedio e alto. È in grado di influenzare forte su i costi di produzione degli altri rami, senza essere influenzato da loro. Queste proprietà conferiscono al settore bancario un potenziale molto elevato per l'inflazione.National accounts systems based on the SEC methodology are usually thought to comport the practical impossibility of carrying out any meaningful allocation of imputed bank services among the single branches of economic activity. As a consequence, the total value of the net interest earned by the credit system as a whole is considered as a cost entry and a negative component of added value in an ad-hoc additional industry, to be aggregated to the main credit one in the typical input-output analysis. The present work shows how this prohibits the structural analysis of the role of credit in the system of interdependencies. A method of is proposed in which imputed services of credit are distributed by branches, on the basis of existing statistics, proving valuable in assessing the significance of certain quantities of national accounts, such as operating results. The analysis highlights the dual nature of credit as highly intermediate and high value-added content. It is able to strongly influence the production costs of the other branches, without being influenced by them. These properties give the banking industry a very high potential for inflation. JEL: E51, G21

  2. Imaging by the SSFSE single slice method at different viscosities of bile

    International Nuclear Information System (INIS)

    Kubo, Hiroya; Usui, Motoki; Fukunaga, Kenichi; Yamamoto, Naruto; Ikegami, Toshimi

    2001-01-01

    The single shot fast spin echo single thick slice method (single slice method) is a technique that visualizes the water component alone using a heavy T 2 . However, this method is considered to be markedly affected by changes in the viscosity of the material because a very long TE is used, and changes in the T 2 value, which are related to viscosity, directly affect imaging. In this study, we evaluated the relationship between the effects of TE and the T 2 value of bile in the single slice method and also examined the relationship between the signal intensity of bile on T 1 - and T 2 -weighted images and imaging by MR cholangiography (MRC). It was difficult to image bile with high viscosities at a usual effective TE level of 700-1,500 ms. With regard to the relationship between the signal intensity of bile and MRC imaging, all T 2 values of the bile samples showing relatively high signal intensities on the T 1 -weighted images suggested high viscosities, and MRC imaging of these bile samples was poor. In conclusion, MRC imaging of bile with high viscosities was poor with the single slice method. Imaging by the single slice method alone of bile showing a relatively high signal intensity on T 1 -weighted images should be avoided, and combination with other MRC sequences should be used. (author)

  3. Nanolithography based contacting method for electrical measurements on single template synthesized nanowires

    DEFF Research Database (Denmark)

    Fusil, S.; Piraux, L.; Mátéfi-Tempfli, Stefan

    2005-01-01

    A reliable method enabling electrical measurements on single nanowires prepared by electrodeposition in an alumina template is described. This technique is based on electrically controlled nanoindentation of a thin insulating resist deposited on the top face of the template filled by the nanowires....... We show that this method is very flexible, allowing us to electrically address single nanowires of controlled length down to 100 nm and of desired composition. Using this approach, current densities as large as 10 A cm were successfully injected through a point contact on a single magnetic...

  4. The search for stable prognostic models in multiple imputed data sets

    Directory of Open Access Journals (Sweden)

    de Vet Henrica CW

    2010-09-01

    Full Text Available Abstract Background In prognostic studies model instability and missing data can be troubling factors. Proposed methods for handling these situations are bootstrapping (B and Multiple imputation (MI. The authors examined the influence of these methods on model composition. Methods Models were constructed using a cohort of 587 patients consulting between January 2001 and January 2003 with a shoulder problem in general practice in the Netherlands (the Dutch Shoulder Study. Outcome measures were persistent shoulder disability and persistent shoulder pain. Potential predictors included socio-demographic variables, characteristics of the pain problem, physical activity and psychosocial factors. Model composition and performance (calibration and discrimination were assessed for models using a complete case analysis, MI, bootstrapping or both MI and bootstrapping. Results Results showed that model composition varied between models as a result of how missing data was handled and that bootstrapping provided additional information on the stability of the selected prognostic model. Conclusion In prognostic modeling missing data needs to be handled by MI and bootstrap model selection is advised in order to provide information on model stability.

  5. Comparison of Model Reliabilities from Single-Step and Bivariate Blending Methods

    DEFF Research Database (Denmark)

    Taskinen, Matti; Mäntysaari, Esa; Lidauer, Martin

    2013-01-01

    the production trait evaluation of Nordic Red dairy cattle. Genotyped bulls with daughters are used as training animals, and genotyped bulls and producing cows as candidate animals. For simplicity, size of the data is chosen so that the full inverses of the mixed model equation coefficient matrices can......Model based reliabilities in genetic evaluation are compared between three methods: animal model BLUP, single-step BLUP, and bivariate blending after genomic BLUP. The original bivariate blending is revised in this work to better account animal models. The study data is extracted from...... be calculated. Model reliabilities by the single-step and the bivariate blending methods were higher than by animal model due to genomic information. Compared to the single-step method, the bivariate blending method reliability estimates were, in general, lower. Computationally bivariate blending method was...

  6. Correcting bias due to missing stage data in the non-parametric estimation of stage-specific net survival for colorectal cancer using multiple imputation.

    Science.gov (United States)

    Falcaro, Milena; Carpenter, James R

    2017-06-01

    Population-based net survival by tumour stage at diagnosis is a key measure in cancer surveillance. Unfortunately, data on tumour stage are often missing for a non-negligible proportion of patients and the mechanism giving rise to the missingness is usually anything but completely at random. In this setting, restricting analysis to the subset of complete records gives typically biased results. Multiple imputation is a promising practical approach to the issues raised by the missing data, but its use in conjunction with the Pohar-Perme method for estimating net survival has not been formally evaluated. We performed a resampling study using colorectal cancer population-based registry data to evaluate the ability of multiple imputation, used along with the Pohar-Perme method, to deliver unbiased estimates of stage-specific net survival and recover missing stage information. We created 1000 independent data sets, each containing 5000 patients. Stage data were then made missing at random under two scenarios (30% and 50% missingness). Complete records analysis showed substantial bias and poor confidence interval coverage. Across both scenarios our multiple imputation strategy virtually eliminated the bias and greatly improved confidence interval coverage. In the presence of missing stage data complete records analysis often gives severely biased results. We showed that combining multiple imputation with the Pohar-Perme estimator provides a valid practical approach for the estimation of stage-specific colorectal cancer net survival. As usual, when the percentage of missing data is high the results should be interpreted cautiously and sensitivity analyses are recommended. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Nonparametric Multiple Imputation for Questionnaires with Individual Skip Patterns and Constraints: The Case of Income Imputation in The National Educational Panel Study

    Science.gov (United States)

    Aßmann, Christian; Würbach, Ariane; Goßmann, Solange; Geissler, Ferdinand; Bela, Anika

    2017-01-01

    Large-scale surveys typically exhibit data structures characterized by rich mutual dependencies between surveyed variables and individual-specific skip patterns. Despite high efforts in fieldwork and questionnaire design, missing values inevitably occur. One approach for handling missing values is to provide multiply imputed data sets, thus…

  8. Imputation of the Date of HIV Seroconversion in a Cohort of Seroprevalent Subjects: Implications for Analysis of Late HIV Diagnosis

    Directory of Open Access Journals (Sweden)

    Paz Sobrino-Vegas

    2012-01-01

    Full Text Available Objectives. Since subjects may have been diagnosed before cohort entry, analysis of late HIV diagnosis (LD is usually restricted to the newly diagnosed. We estimate the magnitude and risk factors of LD in a cohort of seroprevalent individuals by imputing seroconversion dates. Methods. Multicenter cohort of HIV-positive subjects who were treatment naive at entry, in Spain, 2004–2008. Multiple-imputation techniques were used. Subjects with times to HIV diagnosis longer than 4.19 years were considered LD. Results. Median time to HIV diagnosis was 2.8 years in the whole cohort of 3,667 subjects. Factors significantly associated with LD were: male sex; Sub-Saharan African, Latin-American origin compared to Spaniards; and older age. In 2,928 newly diagnosed subjects, median time to diagnosis was 3.3 years, and LD was more common in injecting drug users. Conclusions. Estimates of the magnitude and risk factors of LD for the whole cohort differ from those obtained for new HIV diagnoses.

  9. Evaluation of Railway Networks with Single Track Operation Using the UIC 406 Capacity Method

    DEFF Research Database (Denmark)

    Landex, Alex

    2009-01-01

    Many capacity analyses using the UIC 406 capacity method for double track lines have been carried out and presented international, but few capacity analyses applying the capacity method to single track lines have been presented. Therefore, the differences between capacity analyses of double track...

  10. Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction.

    Science.gov (United States)

    Chomczynski, P; Sacchi, N

    1987-04-01

    A new method of total RNA isolation by a single extraction with an acid guanidinium thiocyanate-phenol-chloroform mixture is described. The method provides a pure preparation of undegraded RNA in high yield and can be completed within 4 h. It is particularly useful for processing large numbers of samples and for isolation of RNA from minute quantities of cells or tissue samples.

  11. Method and apparatus for single-stepping coherence events in a multiprocessor system under software control

    Science.gov (United States)

    Blumrich, Matthias A.; Salapura, Valentina

    2010-11-02

    An apparatus and method are disclosed for single-stepping coherence events in a multiprocessor system under software control in order to monitor the behavior of a memory coherence mechanism. Single-stepping coherence events in a multiprocessor system is made possible by adding one or more step registers. By accessing these step registers, one or more coherence requests are processed by the multiprocessor system. The step registers determine if the snoop unit will operate by proceeding in a normal execution mode, or operate in a single-step mode.

  12. Oversampling method to extract excitatory and inhibitory conductances from single-trial membrane potential recordings.

    Science.gov (United States)

    Bédard, Claude; Béhuret, Sebastien; Deleuze, Charlotte; Bal, Thierry; Destexhe, Alain

    2012-09-15

    Variations of excitatory and inhibitory conductances determine the membrane potential (V(m)) activity of neurons, as well as their spike responses, and are thus of primary importance. Methods to estimate these conductances require clamping the cell at several different levels of V(m), thus making it impossible to estimate conductances from "single trial" V(m) recordings. We present here a new method that allows extracting estimates of the full time course of excitatory and inhibitory conductances from single-trial V(m) recordings. This method is based on oversampling of the V(m). We test the method numerically using models of increasing complexity. Finally, the method is evaluated using controlled conductance injection in cortical neurons in vitro using the dynamic-clamp technique. This conductance extraction method should be very useful for future in vivo applications. Copyright © 2011 Elsevier B.V. All rights reserved.

  13. A Novel MPPT Control Method of Thermoelectric Power Generation with Single Sensor

    OpenAIRE

    Yamada, Hiroaki; Kimura, Koji; Hanamoto, Tsuyoshi; Ishiyama, Toshihiko; Sakaguchi, Tadashi; Takahashi, Tsuyoshi

    2013-01-01

    This paper proposes a novel Maximum Power Point Tracking (MPPT) control method of thermoelectric power generation for the constant load. This paper reveals the characteristics and the internal resistance of thermoelectric power module (TM). Analyzing the thermoelectric power generation system with boost chopper by state space averaging method, the output voltage and current of TM are estimated by with only single current sensor. The proposed method can seek without calculating the output powe...

  14. THE STANDARD SINGLE COST METHOD AND THE EFFICIENCY OF INDUSTRIAL COMPANIES’ MANAGEMENT

    Directory of Open Access Journals (Sweden)

    Claudiu C. CONSTANTINESCU

    2008-12-01

    Full Text Available This article briefly describes the premises for the application of the standard direct cost calculation method in industry, the standard single cost calculation method, the stages of standard cost calculation per product and the calculation methods of standards per product. It also briefly underlines the possibilities of cost calculation and monitoring of deviation of the costs of raw materials and other materials as compared to the pre-established standard costs.

  15. Imputation-based meta-analysis of severe malaria in three African populations.

    Directory of Open Access Journals (Sweden)

    Gavin Band

    2013-05-01

    Full Text Available Combining data from genome-wide association studies (GWAS conducted at different locations, using genotype imputation and fixed-effects meta-analysis, has been a powerful approach for dissecting complex disease genetics in populations of European ancestry. Here we investigate the feasibility of applying the same approach in Africa, where genetic diversity, both within and between populations, is far more extensive. We analyse genome-wide data from approximately 5,000 individuals with severe malaria and 7,000 population controls from three different locations in Africa. Our results show that the standard approach is well powered to detect known malaria susceptibility loci when sample sizes are large, and that modern methods for association analysis can control the potential confounding effects of population structure. We show that pattern of association around the haemoglobin S allele differs substantially across populations due to differences in haplotype structure. Motivated by these observations we consider new approaches to association analysis that might prove valuable for multicentre GWAS in Africa: we relax the assumptions of SNP-based fixed effect analysis; we apply Bayesian approaches to allow for heterogeneity in the effect of an allele on risk across studies; and we introduce a region-based test to allow for heterogeneity in the location of causal alleles.

  16. Multiple imputation to evaluate the impact of an assay change in national surveys

    Science.gov (United States)

    Sternberg, Maya

    2017-01-01

    National health surveys, such as the National Health and Nutrition Examination Survey, are used to monitor trends of nutritional biomarkers. These surveys try to maintain the same biomarker assay over time, but there are a variety of reasons why the assay may change. In these cases, it is important to evaluate the potential impact of a change so that any observed fluctuations in concentrations over time are not confounded by changes in the assay. To this end, a subset of stored specimens previously analyzed with the old assay is retested using the new assay. These paired data are used to estimate an adjustment equation, which is then used to ‘adjust’ all the old assay results and convert them into ‘equivalent’ units of the new assay. In this paper, we present a new way of approaching this problem using modern statistical methods designed for missing data. Using simulations, we compare the proposed multiple imputation approach with the adjustment equation approach currently in use. We also compare these approaches using real National Health and Nutrition Examination Survey data for 25-hydroxyvitamin D. PMID:28419523

  17. RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

    KAUST Repository

    Kim, Ji-Sung

    2018-04-26

    Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest, support vector machines, and gradient-boosted decision trees). RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error), precision, recall, and area under the curve for receiver operating characteristic plots (all p < 10-9). We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1) a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2) uneven accessibility and subjective importance of prophylactic health, (3) possible variation in lifestyle, such as dietary habits, and (4) differences in background genetic variation which predispose to diseases.

  18. A New Power Calculation Method for Single-Phase Grid-Connected Systems

    DEFF Research Database (Denmark)

    Yang, Yongheng; Blaabjerg, Frede

    2013-01-01

    A new method to calculate average active power and reactive power for single-phase systems is proposed in this paper. It can be used in different applications where the output active power and reactive power need to be calculated accurately and fast. For example, a grid-connected photovoltaic...... system in low voltage ride through operation mode requires a power feedback for the power control loop. Commonly, a Discrete Fourier Transform (DFT) based power calculation method can be adopted in such systems. However, the DFT method introduces at least a one-cycle time delay. The new power calculation...... method, which is based on the adaptive filtering technique, can achieve a faster response. The performance of the proposed method is verified by experiments and demonstrated in a 1 kW single-phase grid-connected system operating under different conditions.Experimental results show the effectiveness...

  19. An In Vitro Single-Primer Site-Directed Mutagenesis Method for Use in Biotechnology.

    Science.gov (United States)

    Huang, Yanchao; Zhang, Likui

    2017-01-01

    Site-directed mutagenesis is a powerful method to introduce mutation(s) into DNA sequences. A number of methods have been developed over the years with a main goal being to create a high number of mutant genes. The single-mutagenic primer method for site-directed mutagenesis is the most direct method that yields mutant genes in about 25-50 % of transformants in a robust, low-cost reaction. The supercompetent XL10-Gold bacteria used in the Stratagene protocol carry a phage, which may be a problem for some applications; however, in our single-mutagenic primer method the supercompetent bacteria are not needed. A thermostable DNA polymerase with high fidelity and processivity, such as Phusion DNA polymerase, is required for our optimized procedure to avoid extra mutation(s) and enhance mutagenic efficiency.

  20. On multivariate imputation and forecasting of decadal wind speed missing data.

    Science.gov (United States)

    Wesonga, Ronald

    2015-01-01

    This paper demonstrates the application of multiple imputations by chained equations and time series forecasting of wind speed data. The study was motivated by the high prevalence of missing wind speed historic data. Findings based on the fully conditional specification under multiple imputations by chained equations, provided reliable wind speed missing data imputations. Further, the forecasting model shows, the smoothing parameter, alpha (0.014) close to zero, confirming that recent past observations are more suitable for use to forecast wind speeds. The maximum decadal wind speed for Entebbe International Airport was estimated to be 17.6 metres per second at a 0.05 level of significance with a bound on the error of estimation of 10.8 metres per second. The large bound on the error of estimations confirms the dynamic tendencies of wind speed at the airport under study.

  1. A power supply error correction method for single-ended digital audio class D amplifiers

    Science.gov (United States)

    Yu, Zeqi; Wang, Fengqin; Fan, Yangyu

    2016-12-01

    In single-ended digital audio class D amplifiers (CDAs), the errors caused by power supply noise in the power stages degrade the output performance seriously. In this article, a novel power supply error correction method is proposed. This method introduces the power supply noise of the power stage into the digital signal processing block and builds a power supply error corrector between the interpolation filter and the uniform-sampling pulse width modulation (UPWM) lineariser to pre-correct the power supply error in the single-ended digital audio CDA. The theoretical analysis and implementation of the method are also presented. To verify the effectiveness of the method, a two-channel single-ended digital audio CDA with different power supply error correction methods is designed, simulated, implemented and tested. The simulation and test results obtained show that the method can greatly reduce the error caused by the power supply noise with low hardware cost, and that the CDA with the proposed method can achieve a total harmonic distortion + noise (THD + N) of 0.058% for a -3 dBFS, 1 kHz input when a 55 V linear unregulated direct current (DC) power supply (with the -51 dBFS, 100 Hz power supply noise) is used in the power stages.

  2. Protein structural model selection by combining consensus and single scoring methods.

    Directory of Open Access Journals (Sweden)

    Zhiquan He

    Full Text Available Quality assessment (QA for predicted protein structural models is an important and challenging research problem in protein structure prediction. Consensus Global Distance Test (CGDT methods assess each decoy (predicted structural model based on its structural similarity to all others in a decoy set and has been proved to work well when good decoys are in a majority cluster. Scoring functions evaluate each single decoy based on its structural properties. Both methods have their merits and limitations. In this paper, we present a novel method called PWCom, which consists of two neural networks sequentially to combine CGDT and single model scoring methods such as RW, DDFire and OPUS-Ca. Specifically, for every pair of decoys, the difference of the corresponding feature vectors is input to the first neural network which enables one to predict whether the decoy-pair are significantly different in terms of their GDT scores to the native. If yes, the second neural network is used to decide which one of the two is closer to the native structure. The quality score for each decoy in the pool is based on the number of winning times during the pairwise comparisons. Test results on three benchmark datasets from different model generation methods showed that PWCom significantly improves over consensus GDT and single scoring methods. The QA server (MUFOLD-Server applying this method in CASP 10 QA category was ranked the second place in terms of Pearson and Spearman correlation performance.

  3. Children and youth with disabilities: innovative methods for single qualitative interviews.

    Science.gov (United States)

    Teachman, Gail; Gibson, Barbara E

    2013-02-01

    There is a paucity of explicit literature outlining methods for single-interview studies with children, and almost none have focused on engaging children with disabilities. Drawing from a pilot study, we address these gaps by describing innovative techniques, strategies, and methods for engaging children and youth with disabilities in a single qualitative interview. In the study, we explored the beliefs, assumptions, and experiences of children and youth with cerebral palsy and their parents regarding the importance of walking. We describe three key aspects of our child-interview methodological approach: collaboration with parents, a toolkit of customizable interview techniques, and strategies to consider the power differential inherent in child-researcher interactions. Examples from our research illustrate what worked well and what was less successful. Researchers can optimize single interviews with children with disabilities by collaborating with family members and by preparing a toolkit of customizable interview techniques.

  4. Methods of preparing and using single chain anti-tumor antibodies

    Science.gov (United States)

    Cheung, Nai-Kong; Guo, Hong-Fen

    2010-02-23

    This invention provides a method for identifying cells expressing a target single chain antibody (scFv) directed against a target antigen from a collection of cells that includes cells that do not express the target scFv, comprising the step of combining the collection of cells with an anti-idiotype directed to an antibody specific for the target antigen and detecting interaction, if any, of the anti-idiotype with the cells, wherein the occurrence of an interaction identifies the cell as one which expresses the target scFv. This invention also provides a method for making a single chain antibody (scFv) directed against an antigen, wherein the selection of clones is made based upon interaction of those clones with an appropriate anti-idiotype, and heretofore inaccessible scFv so made. This invention provides the above methods or any combination thereof. Finally, this invention provides various uses of these methods.

  5. A NEW FRACTIONAL MODEL OF SINGLE DEGREE OF FREEDOM SYSTEM, BY USING GENERALIZED DIFFERENTIAL TRANSFORM METHOD

    Directory of Open Access Journals (Sweden)

    HASHEM SABERI NAJAFI

    2016-07-01

    Full Text Available Generalized differential transform method (GDTM is a powerful method to solve the fractional differential equations. In this paper, a new fractional model for systems with single degree of freedom (SDOF is presented, by using the GDTM. The advantage of this method compared with some other numerical methods has been shown. The analysis of new approximations, damping and acceleration of systems are also described. Finally, by reducing damping and analysis of the errors, in one of the fractional cases, we have shown that in addition to having a suitable solution for the displacement close to the exact one, the system enjoys acceleration once crossing the equilibrium point.

  6. Imputation-Based Fine-Mapping Suggests That Most QTL in an Outbred Chicken Advanced Intercross Body Weight Line Are Due to Multiple, Linked Loci.

    Science.gov (United States)

    Brandt, Monika; Ahsan, Muhammad; Honaker, Christa F; Siegel, Paul B; Carlborg, Örjan

    2017-01-05

    The Virginia chicken lines have been divergently selected for juvenile body weight for more than 50 generations. Today, the high- and low-weight lines show a >12-fold difference for the selected trait, 56-d body weight. These lines provide unique opportunities to study the genetic architecture of long-term, single-trait selection. Previously, several quantitative trait loci (QTL) contributing to weight differences between the lines were mapped in an F 2 -cross between them, and these were later replicated and fine-mapped in a nine-generation advanced intercross of them. Here, we explore the possibility to further increase the fine-mapping resolution of these QTL via a pedigree-based imputation strategy that aims to better capture the genetic diversity in the divergently selected, but outbred, founder lines. The founders of the intercross were high-density genotyped, and then pedigree-based imputation was used to assign genotypes throughout the pedigree. Imputation increased the marker density 20-fold in the selected QTL, providing 6911 markers for the subsequent analysis. Both single-marker association and multi-marker backward-elimination analyses were used to explore regions associated with 56-d body weight. The approach revealed several statistically and population structure independent associations and increased the mapping resolution. Further, most QTL were also found to contain multiple independent associations to markers that were not fixed in the founder populations, implying a complex underlying architecture due to the combined effects of multiple, linked loci perhaps located on independent haplotypes that still segregate in the selected lines. Copyright © 2017 Brandt et al.

  7. Multiple Imputation under Violated Distributional Assumptions: A Systematic Evaluation of the Assumed Robustness of Predictive Mean Matching

    Science.gov (United States)

    Kleinke, Kristian

    2017-01-01

    Predictive mean matching (PMM) is a standard technique for the imputation of incomplete continuous data. PMM imputes an actual observed value, whose predicted value is among a set of k = 1 values (the so-called donor pool), which are closest to the one predicted for the missing case. PMM is usually better able to preserve the original distribution…

  8. 21 CFR 1404.630 - May the Office of National Drug Control Policy impute conduct of one person to another?

    Science.gov (United States)

    2010-04-01

    ... 21 Food and Drugs 9 2010-04-01 2010-04-01 false May the Office of National Drug Control Policy impute conduct of one person to another? 1404.630 Section 1404.630 Food and Drugs OFFICE OF NATIONAL DRUG... Suspension and Debarment Actions § 1404.630 May the Office of National Drug Control Policy impute conduct of...

  9. Hybrid Control Method for a Single Phase PFC using a Low Cost Microcontroller

    DEFF Research Database (Denmark)

    Jakobsen, Lars Tønnes; Nielsen, Nils; Wolf, Christian

    2005-01-01

    This paper presents a hybrid control method for single phase boost PFCs. The high bandwidth current loop is analog while the voltage loop is implemented in an 8-bit microcontroller. The design focuses on minimizing the number of calculations done in the microcontroller. A 1kW prototype has been...

  10. A MISO-ARX-Based Method for Single-Trial Evoked Potential Extraction

    Directory of Open Access Journals (Sweden)

    Nannan Yu

    2017-01-01

    Full Text Available In this paper, we propose a novel method for solving the single-trial evoked potential (EP estimation problem. In this method, the single-trial EP is considered as a complex containing many components, which may originate from different functional brain sites; these components can be distinguished according to their respective latencies and amplitudes and are extracted simultaneously by multiple-input single-output autoregressive modeling with exogenous input (MISO-ARX. The extraction process is performed in three stages: first, we use a reference EP as a template and decompose it into a set of components, which serve as subtemplates for the remaining steps. Then, a dictionary is constructed with these subtemplates, and EPs are preliminarily extracted by sparse coding in order to roughly estimate the latency of each component. Finally, the single-trial measurement is parametrically modeled by MISO-ARX while characterizing spontaneous electroencephalographic activity as an autoregression model driven by white noise and with each component of the EP modeled by autoregressive-moving-average filtering of the subtemplates. Once optimized, all components of the EP can be extracted. Compared with ARX, our method has greater tracking capabilities of specific components of the EP complex as each component is modeled individually in MISO-ARX. We provide exhaustive experimental results to show the effectiveness and feasibility of our method.

  11. Using the Image Analysis Method for Describing Soil Detachment by a Single Water Drop Impact

    Directory of Open Access Journals (Sweden)

    Magdalena Ryżak

    2012-08-01

    Full Text Available The aim of the present work was to develop a method based on image analysis for describing soil detachment caused by the impact of a single water drop. The method consisted of recording tracks made by splashed particles on blotting paper under an optical microscope. The analysis facilitated division of the recorded particle tracks on the paper into drops, “comets” and single particles. Additionally, the following relationships were determined: (i the distances of splash; (ii the surface areas of splash tracks into relation to distance; (iii the surface areas of the solid phase transported over a given distance; and (iv the ratio of the solid phase to the splash track area in relation to distance. Furthermore, the proposed method allowed estimation of the weight of soil transported by a single water drop splash in relation to the distance of the water drop impact. It was concluded that the method of image analysis of splashed particles facilitated analysing the results at very low water drop energy and generated by single water drops.

  12. A Generic Topology Derivation Method for Single-phase Converters with Active Capacitive DC-links

    DEFF Research Database (Denmark)

    Wang, Haoran; Wang, Huai; Zhu, Guorong

    2016-01-01

    capacitive DCDC- link solutions, but important aspects of the topology assess-ment, such as the total energy storage, overall capacitive energy buffer ratio, cost, and reliability are still not available. This paper proposes a generic topology derivation method of single-phase power converters...

  13. Iterative method of analysis of single queue, multi-server with limited ...

    African Journals Online (AJOL)

    In this paper, analysis of single queue, multi-server with limited system capacity under first come first served discipline was carried out using iterative method. The arrivals of customers and service times of customers are assumed poisson and exponential distributions respectively. This queuing model is an extension of ...

  14. Optical characteristics of ZnO single crystal grown by the hydrothermal method

    Energy Technology Data Exchange (ETDEWEB)

    Chen, G. Z.; Yin, J. G., E-mail: gzhchen@siom.ac.cn, E-mail: yjg@siom.ac.cn; Zhang, L. H.; Zhang, P. X.; Wang, X. Y.; Liu, Y. C. [Chinese Academy of Sciences, Key Laboratory of High Power Laser Materials, Shanghai Institute of Optics and Fine Mechanics (China); Zhang, C. L. [Guilin Research Institute of Geology for Mineral Resources (China); Gu, S. L. [Nanjing University, Department of Physics (China); Hang, Y., E-mail: yhang@siom.ac.cn [Chinese Academy of Sciences, Key Laboratory of High Power Laser Materials, Shanghai Institute of Optics and Fine Mechanics (China)

    2015-12-15

    ZnO single crystals have been grown by the hydrothermal method. Raman scattering and Photoluminescence spectroscopy (PL) have been used to study samples of ZnO that were unannealed or annealed in different ambient gases. It is suggested that the green emission may originate from defects related to copper in our samples.

  15. A Method for Robust Strategic Railway Dispatch Applied to a Single Track Line

    DEFF Research Database (Denmark)

    Harrod, Steven

    2013-01-01

    A method is presented for global optimization of a dispatch plan assuming perfect information over a given time horizon. An example problem is solved for the North American case of a single dominant high-speed train sharing a network with a majority flow of slower trains. Initial dispatch priority...

  16. See me, feel me: methods to concurrently visualize and manipulate single DNA molecules and associated proteins

    NARCIS (Netherlands)

    van Mameren, J.; Peterman, E.J.G.; Wuite, G.J.L.

    2008-01-01

    Direct visualization of DNA and proteins allows researchers to investigate DNA-protein interactions with great detail. Much progress has been made in this area as a result of increasingly sensitive single-molecule fluorescence techniques. At the same time, methods that control the conformation of

  17. A New Synchronous Reference Frame-Based Method for Single-Phase Shunt Active Power Filters

    DEFF Research Database (Denmark)

    Monfared, Mohammad; Golestan, Saeed; Guerrero, Josep M.

    2013-01-01

    This paper deals with the design of a novel method in the synchronous reference frame (SRF) to extract the reference compensating current for single-phase shunt active power filters (APFs). Unlike previous works in the SRF, the proposed method has an innovative feature that it does not need the f...... the excellent performance of the suggested approach. Theoretical evaluations are confirmed by experimental results....

  18. Accuracy of single count methods of WL determination for open-pit uranium mines

    International Nuclear Information System (INIS)

    Solomon, S.B.; Kennedy, K. N.

    1983-01-01

    A study of single count methods of WL determination was made using a database respresentative of Australian open pit uranium mine conditions. The aim of the study was to check the existence of the optimum time delay coresponding to the Rolle method, to determine the accuracy of the conversion factor for Australian conditions and to examine any systematic use of data bases of representative radon daughter concentration

  19. Single Tracking Location Methods Suppress Speckle Noise in Shear Wave Velocity Estimation

    OpenAIRE

    Elegbe, Etana C.; McAleavey, Stephen A.

    2013-01-01

    In ultrasound-based elastography methods, the estimation of shear wave velocity typically involves the tracking of speckle motion due to an applied force. The errors in the estimates of tissue displacement, and thus shear wave velocity, are generally attributed to electronic noise and decorrelation due to physical processes. We present our preliminary findings on another source of error, namely, speckle-induced bias in phase estimation. We find that methods that involve tracking in a single l...

  20. Characterization of strained InGaAs single quantum well structures by ion beam methods

    International Nuclear Information System (INIS)

    Yu, K.M.; Chan, K.T.

    1990-01-01

    We have investigated strained InGaAs single quantum well structures using MeV ion beam methods. The structural properties of these structures, including composition and well size, have been studied. It has been found that the composition obtained by Rutherford backscattering spectrometry and particle-induced x-ray emission techniques agrees very well with that obtained by the ion channeling method

  1. Organometallic halide perovskite single crystals having low deffect density and methods of preparation thereof

    KAUST Repository

    Bakr, Osman M.

    2016-02-18

    The present disclosure presents a method of making a single crystal organometallic halide perovskites, with the formula: AMX3, wherein A is an organic cation, M is selected from the group consisting of: Pb, Sn, Cu, Ni, Co, Fe, Mn, Pd, Cd, Ge, and Eu, and X is a halide. The method comprises the use of two reservoirs containing different precursors and allowing the vapor diffusion from one reservoir to the other one. A solar cell comprising said crystal is also disclosed.

  2. Explicit Singly Diagonally Implicit Runge-Kutta Methods and Adaptive Stepsize Control for Reservoir Simulation

    DEFF Research Database (Denmark)

    Völcker, Carsten; Jørgensen, John Bagterp; Thomsen, Per Grove

    2010-01-01

    The implicit Euler method, normally refered to as the fully implicit (FIM) method, and the implicit pressure explicit saturation (IMPES) method are the traditional choices for temporal discretization in reservoir simulation. The FIM method offers unconditionally stability in the sense of discrete....... Current reservoir simulators apply timestepping algorithms that are based on safeguarded heuristics, and can neither guarantee convergence in the underlying equation solver, nor provide estimates of the relations between convergence, integration error and stepsizes. We establish predictive stepsize...... control applied to high order methods for temporal discretization in reservoir simulation. The family of Runge-Kutta methods is presented and in particular the explicit singly diagonally implicit Runge-Kutta (ESDIRK) method with an embedded error estimate is described. A predictive stepsize adjustment...

  3. Gallium arsenide single crystal solar cell structure and method of making

    Science.gov (United States)

    Stirn, Richard J. (Inventor)

    1983-01-01

    A production method and structure for a thin-film GaAs crystal for a solar cell on a single-crystal silicon substrate (10) comprising the steps of growing a single-crystal interlayer (12) of material having a closer match in lattice and thermal expansion with single-crystal GaAs than the single-crystal silicon of the substrate, and epitaxially growing a single-crystal film (14) on the interlayer. The material of the interlayer may be germanium or graded germanium-silicon alloy, with low germanium content at the silicon substrate interface, and high germanium content at the upper surface. The surface of the interface layer (12) is annealed for recrystallization by a pulsed beam of energy (laser or electron) prior to growing the interlayer. The solar cell structure may be grown as a single-crystal n.sup.+ /p shallow homojunction film or as a p/n or n/p junction film. A Ga(Al)AS heteroface film may be grown over the GaAs film.

  4. A modified single-cell electroporation method for molecule delivery into a motile protist, Euglena gracilis.

    Science.gov (United States)

    Ohmachi, Masashi; Fujiwara, Yoshie; Muramatsu, Shuki; Yamada, Koji; Iwata, Osamu; Suzuki, Kengo; Wang, Dan Ohtan

    2016-11-01

    Single-cell transfection is a powerful technique for delivering chemicals, drugs, or probes into arbitrary, specific single cells. This technique is especially important when the analysis of molecular function and cellular behavior in individual microscopic organisms such as protists requires the precise identification of the target cell, as fluorescence labeling of bulk populations makes tracking of individual motile protists virtually impossible. Herein, we have modified current single-cell electroporation techniques for delivering fluorescent markers into single Euglena gracilis, a motile photosynthetic microalga. Single-cell electroporation introduced molecules into individual living E. gracilis cells after a negative pressure was applied through a syringe connected to the micropipette to the target cell. The new method achieves high transfection efficiency and viability after electroporation. With the new technique, we successfully introduced a variety of molecules such as GFP, Alexa Fluor 488, and exciton-controlled hybridization-sensitive fluorescent oligonucleotide (ECHO) RNA probes into individual motile E. gracilis cells. We demonstrate imaging of endogenous mRNA in living E. gracilis without interfering with their physiological functions, such as swimming or division, over an extended period of time. Thus the modified single-cell electroporation technique is suitable for delivering versatile functional molecules into individual motile protists. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Anisotropic surface hole-transport property of triphenylamine-derivative single crystal prepared by solution method

    Energy Technology Data Exchange (ETDEWEB)

    Umeda, Minoru, E-mail: mumeda@vos.nagaokaut.ac.jp [Nagaoka University of Technology, Kamitomioka, Nagaoka, Niigata 940-2188 (Japan); Katagiri, Mitsuhiko; Shironita, Sayoko [Nagaoka University of Technology, Kamitomioka, Nagaoka, Niigata 940-2188 (Japan); Nagayama, Norio [Nagaoka University of Technology, Kamitomioka, Nagaoka, Niigata 940-2188 (Japan); Ricoh Company, Ltd., Nishisawada, Numazu, Shizuoka 410-0007 (Japan)

    2016-12-01

    Highlights: • A hole transport molecule was investigated based on its electrochemical redox characteristics. • The solubility and supersolubility curves of the molecule were measured in order to prepare a large crystal. • The polarization micrograph and XRD results revealed that a single crystal was obtained. • An anisotropic surface conduction, in which the long-axis direction exceeds that of the amorphous layer, was observed. • The anisotropic surface conduction was well explained by the molecular stacked structure. - Abstract: This paper reports the anisotropic hole transport at the triphenylamine-derivative single crystal surface prepared by a solution method. Triphenylamine derivatives are commonly used in a hole-transport material for organic photoconductors of laser-beam printers, in which the materials are used as an amorphous form. For developing organic photovoltaics using the photoconductor’s technology, preparation of a single crystal seems to be a specific way by realizing the high mobility of an organic semiconductor. In this study, a single crystal of 4-(2,2-diphenylethenyl)-N,N-bis(4-methylphenyl)-benzenamine (TPA) was prepared and its anisotropic hole-transport property measured. First, the hole-transport property of the TPA was investigated based on its chemical structure and electrochemical redox characteristics. Next, a large-scale single crystal formation at a high rate was developed by employing a solution method based on its solubility and supersolubility curves. The grown TPA was found to be a single crystal based on the polarization micrograph observation and crystallographic analysis. For the TPA single crystal, an anisotropic surface conduction was found, which was well explained by its molecular stack structure. The measured current in the long-axis direction is one order of magnitude greater than that of amorphous TPA.

  6. Hybrid coupled cluster methods: combining active space coupled cluster methods with coupled cluster singles, doubles, and perturbative triples.

    Science.gov (United States)

    Kou, Zhuangfei; Shen, Jun; Xu, Enhua; Li, Shuhua

    2012-05-21

    Based on the coupled-cluster singles, doubles, and a hybrid treatment of triples (CCSD(T)-h) method developed by us [J. Shen, E. Xu, Z. Kou, and S. Li, J. Chem. Phys. 132, 114115 (2010); and ibid. 133, 234106 (2010); and ibid. 134, 044134 (2011)], we developed and implemented a new hybrid coupled cluster (CC) method, named CCSD(T)q-h, by combining CC singles and doubles, and active triples and quadruples (CCSDtq) with CCSD(T) to deal with the electronic structures of molecules with significant multireference character. These two hybrid CC methods can be solved with non-canonical and canonical MOs. With canonical MOs, the CCSD(T)-like equations in these two methods can be solved directly without iteration so that the storage of all triple excitation amplitudes can be avoided. A practical procedure to divide canonical MOs into active and inactive subsets is proposed. Numerical calculations demonstrated that CCSD(T)-h with canonical MOs can well reproduce the corresponding results obtained with non-canonical MOs. For three atom exchange reactions, we found that CCSD(T)-h can offer a significant improvement over the popular CCSD(T) method in describing the reaction barriers. For the bond-breaking processes in F(2) and H(2)O, our calculations demonstrated that CCSD(T)q-h is a good approximation to CCSDTQ over the entire bond dissociation processes.

  7. Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation

    Science.gov (United States)

    Wang, Chaolong; Zhan, Xiaowei; Liang, Liming; Abecasis, Gonçalo R.; Lin, Xihong

    2015-01-01

    Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources. PMID:26027497

  8. A simple and rapid method for high-resolution visualization of single-ion tracks

    Directory of Open Access Journals (Sweden)

    Masaaki Omichi

    2014-11-01

    Full Text Available Prompt determination of spatial points of single-ion tracks plays a key role in high-energy particle induced-cancer therapy and gene/plant mutations. In this study, a simple method for the high-resolution visualization of single-ion tracks without etching was developed through the use of polyacrylic acid (PAA-N, N’-methylene bisacrylamide (MBAAm blend films. One of the steps of the proposed method includes exposure of the irradiated films to water vapor for several minutes. Water vapor was found to promote the cross-linking reaction of PAA and MBAAm to form a bulky cross-linked structure; the ion-track scars were detectable at a nanometer scale by atomic force microscopy. This study demonstrated that each scar is easily distinguishable, and the amount of generated radicals of the ion tracks can be estimated by measuring the height of the scars, even in highly dense ion tracks. This method is suitable for the visualization of the penumbra region in a single-ion track with a high spatial resolution of 50 nm, which is sufficiently small to confirm that a single ion hits a cell nucleus with a size ranging between 5 and 20 μm.

  9. Single well surfactant test to evaluate surfactant floods using multi tracer method

    Science.gov (United States)

    Sheely, Clyde Q.

    1979-01-01

    Data useful for evaluating the effectiveness of or designing an enhanced recovery process said process involving mobilizing and moving hydrocarbons through a hydrocarbon bearing subterranean formation from an injection well to a production well by injecting a mobilizing fluid into the injection well, comprising (a) determining hydrocarbon saturation in a volume in the formation near a well bore penetrating formation, (b) injecting sufficient mobilizing fluid to mobilize and move hydrocarbons from a volume in the formation near the well bore, and (c) determining the hydrocarbon saturation in a volume including at least a part of the volume of (b) by an improved single well surfactant method comprising injecting 2 or more slugs of water containing the primary tracer separated by water slugs containing no primary tracer. Alternatively, the plurality of ester tracers can be injected in a single slug said tracers penetrating varying distances into the formation wherein the esters have different partition coefficients and essentially equal reaction times. The single well tracer method employed is disclosed in U.S. Pat. No. 3,623,842. This method designated the single well surfactant test (SWST) is useful for evaluating the effect of surfactant floods, polymer floods, carbon dioxide floods, micellar floods, caustic floods and the like in subterranean formations in much less time and at much reduced cost compared to conventional multiwell pilot tests.

  10. Iron-based composition for magnetocaloric effect (MCE) applications and method of making a single crystal

    Science.gov (United States)

    Evans, III, Boyd Mccutchen; Kisner, Roger A.; Ludtka, Gail Mackiewicz; Ludtka, Gerard Michael; Melin, Alexander M.; Nicholson, Donald M.; Parish; , Chad M.; Rios, Orlando; Sefat, Athena S.; West, David L.; Wilgen, John B.

    2016-02-09

    A method of making a single crystal comprises heating a material comprising magnetic anisotropy to a temperature T sufficient to form a melt of the material. A magnetic field of at least about 1 Tesla is applied to the melt at the temperature T, where a magnetic free energy difference .DELTA.G.sub.m between different crystallographic axes is greater than a thermal energy kT. While applying the magnetic field, the melt is cooled at a rate of about 30.degree. C./min or higher, and the melt solidifies to form a single crystal of the material.

  11. The single-sink fixed-charge transportation problem: Applications and solution methods

    DEFF Research Database (Denmark)

    Goertz, Simon; Klose, Andreas

    2007-01-01

    The single-sink fixed-charge transportation problem (SSFCTP) consists in finding a minimum cost flow from a number of supplier nodes to a single demand node. Shipping costs comprise costs proportional to the amount shipped as well as a fixed-charge. Although the SSFCTP is an important special case...... of the well-known fixed-charge transportation problem, just a few methods for solving this problem have been proposed in the literature. After summarising some applications of this problem arising in manufacturing and transportation, we give an overview on approximation algorithms and worst-case results....... Finally, we briefly compare some exact solution algorithms for this problem....

  12. Method for single crystal growth of photovoltaic perovskite material and devices

    Energy Technology Data Exchange (ETDEWEB)

    Huang, Jinsong; Dong, Qingfeng

    2017-11-07

    Systems and methods for perovskite single crystal growth include using a low temperature solution process that employs a temperature gradient in a perovskite solution in a container, also including at least one small perovskite single crystal, and a substrate in the solution upon which substrate a perovskite crystal nucleates and grows, in part due to the temperature gradient in the solution and in part due to a temperature gradient in the substrate. For example, a top portion of the substrate external to the solution may be cooled.

  13. Multi-Level Wavelet Shannon Entropy-Based Method for Single-Sensor Fault Location

    Directory of Open Access Journals (Sweden)

    Qiaoning Yang

    2015-10-01

    Full Text Available In actual application, sensors are prone to failure because of harsh environments, battery drain, and sensor aging. Sensor fault location is an important step for follow-up sensor fault detection. In this paper, two new multi-level wavelet Shannon entropies (multi-level wavelet time Shannon entropy and multi-level wavelet time-energy Shannon entropy are defined. They take full advantage of sensor fault frequency distribution and energy distribution across multi-subband in wavelet domain. Based on the multi-level wavelet Shannon entropy, a method is proposed for single sensor fault location. The method firstly uses a criterion of maximum energy-to-Shannon entropy ratio to select the appropriate wavelet base for signal analysis. Then multi-level wavelet time Shannon entropy and multi-level wavelet time-energy Shannon entropy are used to locate the fault. The method is validated using practical chemical gas concentration data from a gas sensor array. Compared with wavelet time Shannon entropy and wavelet energy Shannon entropy, the experimental results demonstrate that the proposed method can achieve accurate location of a single sensor fault and has good anti-noise ability. The proposed method is feasible and effective for single-sensor fault location.

  14. The artificial compression method for computation of shocks and contact discontinuities. I - Single conservation laws

    Science.gov (United States)

    Harten, A.

    1977-01-01

    The paper discusses the use of the artificial compression method for the computation of discontinuous solutions of a single conservation law by finite difference methods. The single conservation law has either a shock or a contact discontinuity. Any monotone finite difference scheme applied to the original equation smears the discontinuity, while the same scheme applied to the equation modified by an artificial compression flux produces steady progressing profiles. If L is any finite difference scheme in conservation form and C is an artificial compressor, the split flux artificial compression method CL is a corrective scheme: L smears the discontinuity while propagating it; C compresses the smeared transition toward a sharp discontinuity. Numerical implementation of artificial compression is described.

  15. A method of dopant electron energy spectrum parameterization for calculation of single-electron nanodevices

    Science.gov (United States)

    Shorokhov, V. V.

    2017-05-01

    Solitary dopants in semiconductors and dielectrics that possess stable electron structures and interesting physical properties may be used as building blocks of quantum computers and sensor systems that operate based on new physical principles. This study proposes a phenomenological method of parameterization for a single-particle energy spectrum of dopant valence electrons in crystalline semiconductors and dielectrics that takes electron-electron interactions into account. It is proposed to take electron-electron interactions in the framework of the outer electron shell model into account. The proposed method is applied to construct the procedure for the determination of the effective dopant outer shell capacity and the method for calculation of the tunneling current in a single-electron device with one or several active dopants-charge centers.

  16. A method for measuring three-dimensional mandibular kinematics in vivo using single-plane fluoroscopy

    Science.gov (United States)

    Chen, C-C; Lin, C-C; Chen, Y-J; Hong, S-W; Lu, T-W

    2013-01-01

    Objectives Accurate measurement of the three-dimensional (3D) motion of the mandible in vivo is essential for relevant clinical applications. Existing techniques are either of limited accuracy or require the use of transoral devices that interfere with jaw movements. This study aimed to develop further an existing method for measuring 3D, in vivo mandibular kinematics using single-plane fluoroscopy; to determine the accuracy of the method; and to demonstrate its clinical applicability via measurements on a healthy subject during opening/closing and chewing movements. Methods The proposed method was based on the registration of single-plane fluoroscopy images and 3D low-radiation cone beam CT data. It was validated using roentgen single-plane photogrammetric analysis at static positions and during opening/closing and chewing movements. Results The method was found to have measurement errors of 0.1 ± 0.9 mm for all translations and 0.2° ± 0.6° for all rotations in static conditions, and of 1.0 ± 1.4 mm for all translations and 0.2° ± 0.7° for all rotations in dynamic conditions. Conclusions The proposed method is considered an accurate method for quantifying the 3D mandibular motion in vivo. Without relying on transoral devices, the method has advantages over existing methods, especially in the assessment of patients with missing or unstable teeth, making it useful for the research and clinical assessment of the temporomandibular joint and chewing function. PMID:22842637

  17. Single-ended transition state finding with the growing string method.

    Science.gov (United States)

    Zimmerman, Paul M

    2015-04-05

    Reaction path finding and transition state (TS) searching are important tasks in computational chemistry. Methods that seek to optimize an evenly distributed set of structures to represent a chemical reaction path are known as double-ended string methods. Such methods can be highly reliable because the endpoints of the string are fixed, which effectively lowers the dimensionality of the reaction path search. String methods, however, require that the reactant and product structures are known beforehand, which limits their ability for systematic exploration of reactive steps. In this article, a single-ended growing string method (GSM) is introduced which allows for reaction path searches starting from a single structure. The method works by sequentially adding nodes along coordinates that drive bonds, angles, and/or torsions to a desired reactive outcome. After the string is grown and an approximate reaction path through the TS is found, string optimization commences and the exact TS is located along with the reaction path. Fast convergence of the string is achieved through use of internal coordinates and eigenvector optimization schemes combined with Hessian estimates. Comparison to the double-ended GSM shows that single-ended method can be even more computationally efficient than the already rapid double-ended method. Examples, including transition metal reactivity and a systematic, automated search for unknown reactivity, demonstrate the efficacy of the new method. This automated reaction search is able to find 165 reaction paths from 333 searches for the reaction of NH3 BH3 and (LiH)4 , all without guidance from user intuition. © 2015 Wiley Periodicals, Inc.

  18. Relative efficiency of joint-model and full-conditional-specification multiple imputation when conditional models are compatible: The general location model.

    Science.gov (United States)

    Seaman, Shaun R; Hughes, Rachael A

    2016-09-05

    Estimating the parameters of a regression model of interest is complicated by missing data on the variables in that model. Multiple imputation is commonly used to handle these missing data. Joint model multiple imputation and full-conditional specification multiple imputation are known to yield imputed data with the same asymptotic distribution when the conditional models of full-conditional specification are compatible with that joint model. We show that this asymptotic equivalence of imputation distributions does not imply that joint model multiple imputation and full-conditional specification multiple imputation will also yield asymptotically equally efficient inference about the parameters of the model of interest, nor that they will be equally robust to misspecification of the joint model. When the conditional models used by full-conditional specification multiple imputation are linear, logistic and multinomial regressions, these are compatible with a restricted general location joint model. We show that multiple imputation using the restricted general location joint model can be substantially more asymptotically efficient than full-conditional specification multiple imputation, but this typically requires very strong associations between variables. When associations are weaker, the efficiency gain is small. Moreover, full-conditional specification multiple imputation is shown to be potentially much more robust than joint model multiple imputation using the restricted general location model to mispecification of that model when there is substantial missingness in the outcome variable. © The Author(s) 2016.

  19. A Novel MPPT Control Method of Thermoelectric Power Generation with Single Sensor

    Directory of Open Access Journals (Sweden)

    Tadashi Sakaguchi

    2013-04-01

    Full Text Available This paper proposes a novel Maximum Power Point Tracking (MPPT control method of thermoelectric power generation for the constant load. This paper reveals the characteristics and the internal resistance of thermoelectric power module (TM. Analyzing the thermoelectric power generation system with boost chopper by state space averaging method, the output voltage and current of TM are estimated by with only single current sensor. The proposed method can seek without calculating the output power of TM in this proposed method. The basic principle of the proposed MPPT control method is discussed, and then confirmed by digital computer simulation using PSIM. Simulation results demonstrate that the output voltage can track the maximum power point voltage by the proposed MPPT control method. The generated power of the TM is 0.36 W when the temperature difference is 35 °C. This is well accorded with the V-P characteristics.

  20. Estimation of Tree Lists from Airborne Laser Scanning Using Tree Model Clustering and k-MSN Imputation

    Directory of Open Access Journals (Sweden)

    Jörgen Wallerman

    2013-04-01

    Full Text Available Individual tree crowns may be delineated from airborne laser scanning (ALS data by segmentation of surface models or by 3D analysis. Segmentation of surface models benefits from using a priori knowledge about the proportions of tree crowns, which has not yet been utilized for 3D analysis to any great extent. In this study, an existing surface segmentation method was used as a basis for a new tree model 3D clustering method applied to ALS returns in 104 circular field plots with 12 m radius in pine-dominated boreal forest (64°14'N, 19°50'E. For each cluster below the tallest canopy layer, a parabolic surface was fitted to model a tree crown. The tree model clustering identified more trees than segmentation of the surface model, especially smaller trees below the tallest canopy layer. Stem attributes were estimated with k-Most Similar Neighbours (k-MSN imputation of the clusters based on field-measured trees. The accuracy at plot level from the k-MSN imputation (stem density root mean square error or RMSE 32.7%; stem volume RMSE 28.3% was similar to the corresponding results from the surface model (stem density RMSE 33.6%; stem volume RMSE 26.1% with leave-one-out cross-validation for one field plot at a time. Three-dimensional analysis of ALS data should also be evaluated in multi-layered forests since it identified a larger number of small trees below the tallest canopy layer.

  1. The method and equipment for the investigation of ions orienting transmission through thin single crystals

    CERN Document Server

    Soroka, V Y; Maznij, Y O

    2003-01-01

    A new approach is proposed to solve the task of angular distribution measurement of intensity strongly differentiated ions fluxes. Channeling effect makes this problem a regular feature of experimental study of ions orientating transmission through thin single crystals. The approach is based on the use of ions additional scattering by an amorphous (polycrystalline) target after passing through single crystal. The additional target manipulator is joined with the principal target chamber equipment with three-axis goniometer. The manipulator allows to move an additional target in the vicinity of the accelerator beam within the limits of +- 3 sup 0 in all directions and allows to measure the angular distribution of scattered ions with the accuracy of 1 min. The method and equipment were tested at the single ended electrostatic accelerator (EG-5) using a proton beam. At present the measurements have been resumed at the tandem accelerator (EG-10) of the Institute for Nuclear Research of the Academy of Sciences of U...

  2. Evaluation of single and double centrifugation tube methods for concentrating equine platelets.

    Science.gov (United States)

    Argüelles, D; Carmona, J U; Pastor, J; Iborra, A; Viñals, L; Martínez, P; Bach, E; Prades, M

    2006-10-01

    The aim of this study was to evaluate single and double centrifugation tube methods for concentrating equine platelets. Whole blood samples were collected from clinically normal horses and processed by use of single and double centrifugation tube methods to obtain four platelet concentrates (PCs): PC-A, PC-B, PC-C, and PC-D, which were analyzed using a flow cytometry hematology system for hemogram and additional platelet parameters (mean platelet volume, platelet distribution width, mean platelet component concentration, mean platelet component distribution width). Concentrations of transforming growth factor beta 1 (TGF-beta(1)) were determined in all the samples. Platelet concentrations for PC-A, PC-B, PC-C, and PC-D were 45%, 44%, 71%, and 21% higher, respectively, compared to the same values for citrated whole blood samples. TGF-beta(1) concentrations for PC-A, PC-B, PC-C, and PC-D were 38%, 44%, 44%, and 37% higher, respectively, compared to citrated whole blood sample values. In conclusion, the single and double centrifugation tube methods are reliable methods for concentrating equine platelets and for obtaining potentially therapeutic TGF-beta(1) levels.

  3. A single-step method for rapid extraction of total lipids from green microalgae.

    Directory of Open Access Journals (Sweden)

    Martin Axelsson

    Full Text Available Microalgae produce a wide range of lipid compounds of potential commercial interest. Total lipid extraction performed by conventional extraction methods, relying on the chloroform-methanol solvent system are too laborious and time consuming for screening large numbers of samples. In this study, three previous extraction methods devised by Folch et al. (1957, Bligh and Dyer (1959 and Selstam and Öquist (1985 were compared and a faster single-step procedure was developed for extraction of total lipids from green microalgae. In the single-step procedure, 8 ml of a 2∶1 chloroform-methanol (v/v mixture was added to fresh or frozen microalgal paste or pulverized dry algal biomass contained in a glass centrifuge tube. The biomass was manually suspended by vigorously shaking the tube for a few seconds and 2 ml of a 0.73% NaCl water solution was added. Phase separation was facilitated by 2 min of centrifugation at 350 g and the lower phase was recovered for analysis. An uncharacterized microalgal polyculture and the green microalgae Scenedesmus dimorphus, Selenastrum minutum, and Chlorella protothecoides were subjected to the different extraction methods and various techniques of biomass homogenization. The less labour intensive single-step procedure presented here allowed simultaneous recovery of total lipid extracts from multiple samples of green microalgae with quantitative yields and fatty acid profiles comparable to those of the previous methods. While the single-step procedure is highly correlated in lipid extractability (r² = 0.985 to the previous method of Folch et al. (1957, it allowed at least five times higher sample throughput.

  4. A multi breed reference improves genotype imputation accuracy in Nordic Red cattle

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Ma, Peipei; Lund, Mogens Sandø

    2012-01-01

    the subsequent effect of the imputed HD data on the reliability of genomic prediction. HD genotype data was available for 247 Danish, 210 Swedish and 249 Finnish Red bulls, and for 546 Holstein bulls. A subset 50 of bulls from each of the Nordic Red populations was selected for validation. After quality control...

  5. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation

    NARCIS (Netherlands)

    Artigas, Maria Soler; Wain, Louise V.; Miller, Suzanne; Kheirallah, Abdul Kader; Huffman, Jennifer E.; Ntalla, Ioanna; Shrine, Nick; Obeidat, Ma'en; Trochet, Holly; McArdle, Wendy L.; Alves, Alexessander Couto; Hui, Jennie; Zhao, Jing Hua; Joshi, Peter K.; Teumer, Alexander; Albrecht, Eva; Imboden, Medea; Rawal, Rajesh; Lopez, Lorna M.; Marten, Jonathan; Enroth, Stefan; Surakka, Ida; Polasek, Ozren; Lyytikainen, Leo-Pekka; Granell, Raquel; Hysi, Pirro G.; Flexeder, Claudia; Mahajan, Anubha; Beilby, John; Bosse, Yohan; Brandsma, Corry-Anke; Campbell, Harry; Gieger, Christian; Glaeser, Sven; Gonzalez, Juan R.; Grallert, Harald; Hammond, Chris J.; Harris, Sarah E.; Hartikainen, Anna-Liisa; Heliovaara, Markku; Henderson, John; Hocking, Lynne; Horikoshi, Momoko; Hutri-Kahonen, Nina; Ingelsson, Erik; Johansson, Asa; Kemp, John P.; Kolcic, Ivana; Kumar, Ashish; Lind, Lars; Melen, Erik; Musk, Arthur W.; Navarro, Pau; Nickle, David C.; Padmanabhan, Sandosh; Raitakari, Olli T.; Ried, Janina S.; Ripatti, Samuli; Schulz, Holger; Scott, Robert A.; Sin, Don D.; Starr, John M.; Vinuela, Ana; Voelzke, Henry; Wild, Sarah H.; Wright, Alan F.; Zemunik, Tatijana; Jarvis, Deborah L.; Spector, Tim D.; Evans, David M.; Lehtimaki, Terho; Vitart, Veronique; Kahonen, Mika; Gyllensten, Ulf; Rudan, Igor; Deary, Ian J.; Karrasch, Stefan; Probst-Hensch, Nicole M.; Heinrich, Joachim; Stubbe, Beate; Wilson, James F.; Wareham, Nicholas J.; James, Alan L.; Morris, Andrew P.; Jarvelin, Marjo-Riitta; Hayward, Caroline; Sayers, Ian; Strachan, David P.; Hall, Ian P.; Tobin, Martin D.; Deloukas, Panos; Hansell, Anna L.; Hubbard, Richard; Jackson, Victoria E.; Marchini, Jonathan; Pavord, Ian; Thomson, Neil C.; Zeggini, Eleftheria

    2015-01-01

    Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed

  6. Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research

    Science.gov (United States)

    Manly, Catherine A.; Wells, Ryan S.

    2015-01-01

    Higher education researchers using survey data often face decisions about handling missing data. Multiple imputation (MI) is considered by many statisticians to be the most appropriate technique for addressing missing data in many circumstances. In particular, it has been shown to be preferable to listwise deletion, which has historically been a…

  7. Handling Missing Data: Analysis of a Challenging Data Set Using Multiple Imputation

    Science.gov (United States)

    Pampaka, Maria; Hutcheson, Graeme; Williams, Julian

    2016-01-01

    Missing data is endemic in much educational research. However, practices such as step-wise regression common in the educational research literature have been shown to be dangerous when significant data are missing, and multiple imputation (MI) is generally recommended by statisticians. In this paper, we provide a review of these advances and their…

  8. Multiple imputation as one tool to provide longitudinal databases for modelling human height and weight development.

    Science.gov (United States)

    Aßmann, C

    2016-06-01

    Besides large efforts regarding field work, provision of valid databases requires statistical and informational infrastructure to enable long-term access to longitudinal data sets on height, weight and related issues. To foster use of longitudinal data sets within the scientific community, provision of valid databases has to address data-protection regulations. It is, therefore, of major importance to hinder identifiability of individuals from publicly available databases. To reach this goal, one possible strategy is to provide a synthetic database to the public allowing for pretesting strategies for data analysis. The synthetic databases can be established using multiple imputation tools. Given the approval of the strategy, verification is based on the original data. Multiple imputation by chained equations is illustrated to facilitate provision of synthetic databases as it allows for capturing a wide range of statistical interdependencies. Also missing values, typically occurring within longitudinal databases for reasons of item non-response, can be addressed via multiple imputation when providing databases. The provision of synthetic databases using multiple imputation techniques is one possible strategy to ensure data protection, increase visibility of longitudinal databases and enhance the analytical potential.

  9. The Effect of Auxiliary Variables and Multiple Imputation on Parameter Estimation in Confirmatory Factor Analysis

    Science.gov (United States)

    Yoo, Jin Eun

    2009-01-01

    This Monte Carlo study investigates the beneficiary effect of including auxiliary variables during estimation of confirmatory factor analysis models with multiple imputation. Specifically, it examines the influence of sample size, missing rates, missingness mechanism combinations, missingness types (linear or convex), and the absence or presence…

  10. Multiple imputation of missing values was not necessary before performing a longitudinal mixed-model analysis

    NARCIS (Netherlands)

    Twisk, J.; de Boer, M.; de Vente, W.; Heymans, M.

    2013-01-01

    Background and Objectives: As a result of the development of sophisticated techniques, such as multiple imputation, the interest in handling missing data in longitudinal studies has increased enormously in past years. Within the field of longitudinal data analysis, there is a current debate on

  11. Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy

    NARCIS (Netherlands)

    Bouwman, A.C.; Veerkamp, R.F.

    2014-01-01

    The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chicken

  12. Limitations in Using Multiple Imputation to Harmonize Individual Participant Data for Meta-Analysis.

    Science.gov (United States)

    Siddique, Juned; de Chavez, Peter J; Howe, George; Cruden, Gracelyn; Brown, C Hendricks

    2018-02-01

    Individual participant data (IPD) meta-analysis is a meta-analysis in which the individual-level data for each study are obtained and used for synthesis. A common challenge in IPD meta-analysis is when variables of interest are measured differently in different studies. The term harmonization has been coined to describe the procedure of placing variables on the same scale in order to permit pooling of data from a large number of studies. Using data from an IPD meta-analysis of 19 adolescent depression trials, we describe a multiple imputation approach for harmonizing 10 depression measures across the 19 trials by treating those depression measures that were not used in a study as missing data. We then apply diagnostics to address the fit of our imputation model. Even after reducing the scale of our application, we were still unable to produce accurate imputations of the missing values. We describe those features of the data that made it difficult to harmonize the depression measures and provide some guidelines for using multiple imputation for harmonization in IPD meta-analysis.

  13. Estimating past hepatitis C infection risk from reported risk factor histories: implications for imputing age of infection and modeling fibrosis progression

    Directory of Open Access Journals (Sweden)

    Busch Michael P

    2007-12-01

    Full Text Available Abstract Background Chronic hepatitis C virus infection is prevalent and often causes hepatic fibrosis, which can progress to cirrhosis and cause liver cancer or liver failure. Study of fibrosis progression often relies on imputing the time of infection, often as the reported age of first injection drug use. We sought to examine the accuracy of such imputation and implications for modeling factors that influence progression rates. Methods We analyzed cross-sectional data on hepatitis C antibody status and reported risk factor histories from two large studies, the Women's Interagency HIV Study and the Urban Health Study, using modern survival analysis methods for current status data to model past infection risk year by year. We compared fitted distributions of past infection risk to reported age of first injection drug use. Results Although injection drug use appeared to be a very strong risk factor, models for both studies showed that many subjects had considerable probability of having been infected substantially before or after their reported age of first injection drug use. Persons reporting younger age of first injection drug use were more likely to have been infected after, and persons reporting older age of first injection drug use were more likely to have been infected before. Conclusion In cross-sectional studies of fibrosis progression where date of HCV infection is estimated from risk factor histories, modern methods such as multiple imputation should be used to account for the substantial uncertainty about when infection occurred. The models presented here can provide the inputs needed by such methods. Using reported age of first injection drug use as the time of infection in studies of fibrosis progression is likely to produce a spuriously strong association of younger age of infection with slower rate of progression.

  14. A single-probe heat pulse method for estimating sap velocity in trees.

    Science.gov (United States)

    López-Bernal, Álvaro; Testi, Luca; Villalobos, Francisco J

    2017-10-01

    Available sap flow methods are still far from being simple, cheap and reliable enough to be used beyond very specific research purposes. This study presents and tests a new single-probe heat pulse (SPHP) method for monitoring sap velocity in trees using a single-probe sensor, rather than the multi-probe arrangements used up to now. Based on the fundamental conduction-convection principles of heat transport in sapwood, convective velocity (V h ) is estimated from the temperature increase in the heater after the application of a heat pulse (ΔT). The method was validated against measurements performed with the compensation heat pulse (CHP) technique in field trees of six different species. To do so, a dedicated three-probe sensor capable of simultaneously applying both methods was produced and used. Experimental measurements in the six species showed an excellent agreement between SPHP and CHP outputs for moderate to high flow rates, confirming the applicability of the method. In relation to other sap flow methods, SPHP presents several significant advantages: it requires low power inputs, it uses technically simpler and potentially cheaper instrumentation, the physical damage to the tree is minimal and artefacts caused by incorrect probe spacing and alignment are removed. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.

  15. A visualization method for probing grain boundaries of single layer graphene via molecular beam epitaxy

    Science.gov (United States)

    Zhan, Linjie; Wan, Wen; Zhu, Zhenwei; Zhao, Zhijuan; Zhang, Zhenhan; Shih, Tien-Mo; Cai, Weiwei

    2017-07-01

    Graphene, a member of layered two-dimensional (2D) materials, possesses high carrier mobility, mechanical flexibility, and optical transparency, as well as enjoying a wide range of promising applications in electronics. Adopting the chemical vaporization deposition method, the majority of investigators have ubiquitously grown single layer graphene (SLG), which inevitably involves polycrystalline properties. Here we demonstrate a simple method for the direct visualization of arbitrarily large-size SLG domains by synthesizing one-hundred-nm-scale MoS2 single crystals via a high-vacuum molecular beam epitaxy process. The present study based on epitaxial growth provides a guide for probing the grain boundaries of various 2D materials and implements higher potentials for the next-generation electronic devices.

  16. Method of preparing and applying single stranded DNA probes to double stranded target DNAs in situ

    Science.gov (United States)

    Gray, J.W.; Pinkel, D.

    1991-07-02

    A method is provided for producing single stranded non-self-complementary nucleic acid probes, and for treating target DNA for use therewith. The probe is constructed by treating DNA with a restriction enzyme and an exonuclease to form template/primers for a DNA polymerase. The digested strand is resynthesized in the presence of labeled nucleoside triphosphate precursor. Labeled single stranded fragments are separated from the resynthesized fragments to form the probe. Target DNA is treated with the same restriction enzyme used to construct the probe, and is treated with an exonuclease before application of the probe. The method significantly increases the efficiency and specificity of hybridization mixtures by increasing effective probe concentration by eliminating self-hybridization between both probe and target DNAs, and by reducing the amount of target DNA available for mismatched hybridizations. No Drawings

  17. Multicriteria decision-making method using the correlation coefficient under single-valued neutrosophic environment

    Science.gov (United States)

    Ye, Jun

    2013-05-01

    The paper presents the correlation and correlation coefficient of single-valued neutrosophic sets (SVNSs) based on the extension of the correlation of intuitionistic fuzzy sets and demonstrates that the cosine similarity measure is a special case of the correlation coefficient in SVNS. Then a decision-making method is proposed by the use of the weighted correlation coefficient or the weighted cosine similarity measure of SVNSs, in which the evaluation information for alternatives with respect to criteria is carried out by truth-membership degree, indeterminacy-membership degree, and falsity-membership degree under single-valued neutrosophic environment. We utilize the weighted correlation coefficient or the weighted cosine similarity measure between each alternative and the ideal alternative to rank the alternatives and to determine the best one(s). Finally, an illustrative example demonstrates the application of the proposed decision-making method.

  18. Dynamic Raman Spectroelectrochemistry of Single Walled Carbon Nanotubes modified electrodes using a Langmuir-Schaefer method

    OpenAIRE

    Ibáñez, David; Romero, Edna Cecilia; Colina, Álvaro; Heras, Aránzazu

    2014-01-01

    Raman spectroelectrochemistry is a fundamental technique to characterize single walled carbon nanotube (SWCNT) films. In this work, we have performed the study of SWCNT films transferred to a glassy carbon electrode using a Langmuir-Schaefer method. Langmuir balance has allowed us to control the characteristics of the film that can be easily transferred to the electrode support. Time-resolved Raman spectroelectrochemistry experiments at scan rates between 20 and 400 mV s−1 were done in two di...

  19. Growth of Ce-Doped LSO Single Crystals by Stockbarger-Bridgman Modified Crystallization Method

    International Nuclear Information System (INIS)

    Namtalishvili, M.; Sanadze, T.; Basharuli, N.; Magalashvili, P.; Mikaberidze, A.; Razmadze, Z.; Gabeskiria, M.

    2006-01-01

    The modified Stockbarger-Bridgman method was suggested for the growth of optically perfect LSO:Ce single crystals. Our investigations have shown that the most perfect crystals are grown by by the horizontally directed crystallization. In this case the elements of directional crystallyzation are combined with the zone melting. Crystallization is carried out in the conditions of sufficiently developed mirror of meltin. As a result in this case the chemical purity of grown crystals increases. (author)

  20. Standard test method for isotopic analysis of uranium hexafluoride by double standard single-collector gas mass spectrometer method

    CERN Document Server

    American Society for Testing and Materials. Philadelphia

    2010-01-01

    1.1 This is a quantitative test method applicable to determining the mass percent of uranium isotopes in uranium hexafluoride (UF6) samples with 235U concentrations between 0.1 and 5.0 mass %. 1.2 This test method may be applicable for the entire range of 235U concentrations for which adequate standards are available. 1.3 This test method is for analysis by a gas magnetic sector mass spectrometer with a single collector using interpolation to determine the isotopic concentration of an unknown sample between two characterized UF6 standards. 1.4 This test method is to replace the existing test method currently published in Test Methods C761 and is used in the nuclear fuel cycle for UF6 isotopic analyses. 1.5 The values stated in SI units are to be regarded as standard. No other units of measurement are included in this standard. 1.6 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appro...

  1. Critical evaluation of the pulsed laser method for single event effects testing and fundamental studies

    International Nuclear Information System (INIS)

    Melinger, J.S.; Buchner, S.; McMorrow, D.; Stapor, W.J.; Weatherford, T.R.; Campbell, A.B.; Eisen, H.

    1994-01-01

    In this paper the authors present an evaluation of the pulsed laser as a technique for single events effects (SEE) testing. They explore in detail the important optical effects, such as laser beam propagation, surface reflection, and linear and nonlinear absorption, which determine the nature of laser-generated charge tracks in semiconductor materials. While there are differences in the structure of laser- and ion-generated charge tracks, they show that in many cases the pulsed laser remains an invaluable tool for SEE testing. Indeed, for several SEE applications, they show that the pulsed laser method represents a more practical approach than conventional accelerator-based methods

  2. Single tracking location methods suppress speckle noise in shear wave velocity estimation.

    Science.gov (United States)

    Elegbe, Etana C; McAleavey, Stephen A

    2013-04-01

    In ultrasound-based elastography methods, the estimation of shear wave velocity typically involves the tracking of speckle motion due to an applied force. The errors in the estimates of tissue displacement, and thus shear wave velocity, are generally attributed to electronic noise and decorrelation due to physical processes. We present our preliminary findings on another source of error, namely, speckle-induced bias in phase estimation. We find that methods that involve tracking in a single location, as opposed to multiple locations, are less sensitive to this source of error since the measurement is differential in nature and cancels out speckle-induced phase errors.

  3. The method of arbitrarily large moments to calculate single scale processes in quantum field theory

    Directory of Open Access Journals (Sweden)

    Johannes Blümlein

    2017-08-01

    Full Text Available We devise a new method to calculate a large number of Mellin moments of single scale quantities using the systems of differential and/or difference equations obtained by integration-by-parts identities between the corresponding Feynman integrals of loop corrections to physical quantities. These scalar quantities have a much simpler mathematical structure than the complete quantity. A sufficiently large set of moments may even allow the analytic reconstruction of the whole quantity considered, holding in case of first order factorizing systems. In any case, one may derive highly precise numerical representations in general using this method, which is otherwise completely analytic.

  4. The method of arbitrarily large moments to calculate single scale processes in quantum field theory

    Energy Technology Data Exchange (ETDEWEB)

    Bluemlein, Johannes [Deutsches Elektronen-Synchrotron (DESY), Zeuthen (Germany); Schneider, Carsten [Johannes Kepler Univ., Linz (Austria). Research Inst. for Symbolic Computation (RISC)

    2017-01-15

    We device a new method to calculate a large number of Mellin moments of single scale quantities using the systems of differential and/or difference equations obtained by integration-by-parts identities between the corresponding Feynman integrals of loop corrections to physical quantities. These scalar quantities have a much simpler mathematical structure than the complete quantity. A sufficiently large set of moments may even allow the analytic reconstruction of the whole quantity considered, holding in case of first order factorizing systems. In any case, one may derive highly precise numerical representations in general using this method, which is otherwise completely analytic.

  5. [A new method of calibration and positioning in quantitative analysis of multicomponents by single marker].

    Science.gov (United States)

    He, Bing; Yang, Shi-Yan; Zhang, Yan

    2012-12-01

    This paper aims to establish a new method of calibration and positioning in quantitative analysis of multicomponents by single marker (QAMS), using Shuanghuanglian oral liquid as the research object. Establishing relative correction factors with reference chlorogenic acid to other 11 active components (neochlorogenic acid, cryptochlorogenic acid, cafferic acid, forsythoside A, scutellarin, isochlorogenic acid B, isochlorogenic acid A, isochlorogenic acid C, baicalin and phillyrin wogonoside) in Shuanghuanglian oral liquid by 3 correction methods (multipoint correction, slope correction and quantitative factor correction). At the same time chromatographic peak was positioned by linear regression method. Only one standard uas used to determine the content of 12 components in Shuanghuanglian oral liquid, in stead of needing too many reference substance in quality control. The results showed that within the linear ranges, no significant differences were found in the quantitative results of 12 active constituents in 3 batches of Shuanghuanglian oral liquid determined by 3 correction methods and external standard method (ESM) or standard curve method (SCM). And this method is simpler and quicker than literature methods. The results were accurate and reliable, and had good reproducibility. While the positioning chromatographic peaks by linear regression method was more accurate than relative retention time in literature. The slope and the quantitative factor correction controlling the quality of Chinese traditional medicine is feasible and accurate.

  6. Accuracy of hemoglobin A1c imputation using fasting plasma glucose in diabetes research using electronic health records data

    Directory of Open Access Journals (Sweden)

    Stanley Xu

    2014-05-01

    Full Text Available In studies that use electronic health record data, imputation of important data elements such as Glycated hemoglobin (A1c has become common. However, few studies have systematically examined the validity of various imputation strategies for missing A1c values. We derived a complete dataset using an incident diabetes population that has no missing values in A1c, fasting and random plasma glucose (FPG and RPG, age, and gender. We then created missing A1c values under two assumptions: missing completely at random (MCAR and missing at random (MAR. We then imputed A1c values, compared the imputed values to the true A1c values, and used these data to assess the impact of A1c on initiation of antihyperglycemic therapy. Under MCAR, imputation of A1c based on FPG 1 estimated a continuous A1c within ± 1.88% of the true A1c 68.3% of the time; 2 estimated a categorical A1c within ± one category from the true A1c about 50% of the time. Including RPG in imputation slightly improved the precision but did not improve the accuracy. Under MAR, including gender and age in addition to FPG improved the accuracy of imputed continuous A1c but not categorical A1c. Moreover, imputation of up to 33% of missing A1c values did not change the accuracy and precision and did not alter the impact of A1c on initiation of antihyperglycemic therapy. When using A1c values as a predictor variable, a simple imputation algorithm based only on age, sex, and fasting plasma glucose gave acceptable results.

  7. Series solution for continuous population models for single and interacting species by the homotopy analysis method

    Directory of Open Access Journals (Sweden)

    Magdy A. El-Tawil

    2012-07-01

    Full Text Available The homotopy analysis method (HAM is used to find approximate analytical solutions of continuous population models for single and interacting species. The homotopy analysis method contains the auxiliary parameter $hbar,$ which provides us with a simple way to adjust and control the convergence region of series solution. the solutions are compared with the numerical results obtained using NDSolve, an ordinary differential equation solver found in the Mathematica package and a good agreement is found. Also the solutions are compared with the available analytic results obtained by other methods and more accurate and convergent series solution found. The convergence region is also computed which shows the validity of the HAM solution. This method is reliable and manageable.

  8. The "curved lead pathway" method to enable a single lead to reach any two intracranial targets.

    Science.gov (United States)

    Ding, Chen-Yu; Yu, Liang-Hong; Lin, Yuan-Xiang; Chen, Fan; Lin, Zhang-Ya; Kang, De-Zhi

    2017-01-11

    Deep brain stimulation is an effective way to treat movement disorders, and a powerful research tool for exploring brain functions. This report proposes a "curved lead pathway" method for lead implantation, such that a single lead can reach in sequence to any two intracranial targets. A new type of stereotaxic system for implanting a curved lead to the brain of human/primates was designed, the auxiliary device needed for this method to be used in rat/mouse was fabricated and verified in rat, and the Excel algorithm used for automatically calculating the necessary parameters was implemented. This "curved lead pathway" method of lead implantation may complement the current method, make lead implantation for multiple targets more convenient, and expand the experimental techniques of brain function research.

  9. Comparison of two methods of inquiry for torture with East African refugees: single query versus checklist.

    Science.gov (United States)

    Westermeyer, Joseph; Hollifield, Michael; Spring, Marline; Johnson, David; Jaranson, James

    2011-01-01

    First to compare two methods of inquiry regarding torture: i.e., the traditional means of inquiry versus a checklist of torture experiences previously identified for these African refugees. Second, we hoped to identify factors that might influence refugees to not report torture on a single query when checklist data indicated torture events had occurred or to report torture when checklist data indicated that torture had not occurred. Consisted of queries to 1,134 community-dwelling East African refugees (Somalia and Ethiopia) regarding the presence-versus-absence of torture in Africa (single query), a checklist of torture experiences in Africa that we had previously identified as occurring in these groups, demography, non-torture traumatic experiences in Africa, and current posttraumatic symptoms. Showed that 14% of the study participants reported a torture experience on a checklist, but not on a single query. Nine percent responded positively to the single query on torture, but then failed to check any torture experience. Those reporting trauma on an open-ended query, but not on a checklist, had been highly traumatized in other ways (warfare, civil chaos, robbery, assault, rape, trauma during flight out of the country). Those who reported torture on the checklist but not on the single query reported fewer instances of torture, suggesting that perhaps a "threshold" of torture experience influenced the single-query report. In addition, certain types of torture appeared more apt to be associated with a singlequery endorsement of torture. On regression analysis, a single-query self-report of torture was associated with traumatic experiences consistent with torture, older age, female gender, and nontorture trauma in Africa. Inconsistent reporting of torture occurred when two methods of inquiry (one openended and one a checklist) were employed in this sample. We believe that specific contexts of torture and non-torture trauma, together with individual demographic

  10. Single- versus multiple-sample method to measure glomerular filtration rate.

    Science.gov (United States)

    Delanaye, Pierre; Flamant, Martin; Dubourg, Laurence; Vidal-Petiot, Emmanuelle; Lemoine, Sandrine; Cavalier, Etienne; Schaeffner, Elke; Ebert, Natalie; Pottel, Hans

    2018-01-08

    There are many different ways to measure glomerular filtration rate (GFR) using various exogenous filtration markers, each having their own strengths and limitations. However, not only the marker, but also the methodology may vary in many ways, including the use of urinary or plasma clearance, and, in the case of plasma clearance, the number of time points used to calculate the area under the concentration-time curve, ranging from only one (Jacobsson method) to eight (or more) blood samples. We collected the results obtained from 5106 plasma clearances (iohexol or 51Cr-ethylenediaminetetraacetic acid (EDTA)) using three to four time points, allowing GFR calculation using the slope-intercept method and the Bröchner-Mortensen correction. For each time point, the Jacobsson formula was applied to obtain the single-sample GFR. We used Bland-Altman plots to determine the accuracy of the Jacobsson method at each time point. The single-sample method showed within 10% concordances with the multiple-sample method of 66.4%, 83.6%, 91.4% and 96.0% at the time points 120, 180, 240 and ≥300 min, respectively. Concordance was poorer at lower GFR levels, and this trend is in parallel with increasing age. Results were similar in males and females. Some discordance was found in the obese subjects. Single-sample GFR is highly concordant with a multiple-sample strategy, except in the low GFR range (<30 mL/min). © The Author 2018. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.

  11. New Method Developed To Purify Single Wall Carbon Nanotubes for Aerospace Applications

    Science.gov (United States)

    Lebron, Marisabel; Meador, Michael A.

    2003-01-01

    Single wall carbon nanotubes have attracted considerable attention because of their remarkable mechanical properties and electrical and thermal conductivities. Use of these materials as primary or secondary reinforcements in polymers or ceramics could lead to new materials with significantly enhanced mechanical strength and electrical and thermal conductivity. Use of carbon-nanotube-reinforced materials in aerospace components will enable substantial reductions in component weight and improvements in durability and safety. Potential applications for single wall carbon nanotubes include lightweight components for vehicle structures and propulsion systems, fuel cell components (bipolar plates and electrodes) and battery electrodes, and ultra-lightweight materials for use in solar sails. A major barrier to the successful use of carbon nanotubes in these components is the need for methods to economically produce pure carbon nanotubes in large enough quantities to not only evaluate their suitability for certain applications but also produce actual components. Most carbon nanotube synthesis methods, including the HiPCO (high pressure carbon monoxide) method developed by Smalley and others, employ metal catalysts that remain trapped in the final product. These catalyst impurities can affect nanotube properties and accelerate their decomposition. The development of techniques to remove most, if not all, of these impurities is essential to their successful use in practical applications. A new method has been developed at the NASA Glenn Research Center to purify gram-scale quantities of single wall carbon nanotubes. This method, a modification of a gas phase purification technique previously reported by Smalley and others, uses a combination of high-temperature oxidations and repeated extractions with nitric and hydrochloric acid. This improved procedure significantly reduces the amount of impurities (catalyst and nonnanotube forms of carbon) within the nanotubes, increasing

  12. Experimental Evaluation of a Method for Turbocharging Four-Stroke, Single Cylinder, Internal Combustion Engines

    Science.gov (United States)

    Buchman, Michael; Winter, Amos

    2015-11-01

    Turbocharging an engine increases specific power, improves fuel economy, reduces emissions, and lowers cost compared to a naturally aspirated engine of the same power output. These advantages make turbocharging commonplace for multi-cylinder engines. Single cylinder engineers are not commonly turbocharged due to the phase lag between the exhaust stroke, which powers the turbocharger, and the intake stroke, when air is pumped into the engine. Our proposed method of turbocharging single cylinder engines is to add an ``air capacitor'' to the intake manifold, an additional volume that acts as a buffer to store compressed air between the exhaust and intake strokes, and smooth out the pressure pulses from the turbocharger. This talk presents experimental results from a single cylinder, turbocharged diesel engine fit with various sized air capacitors. Power output from the engine was measured using a dynamometer made from a generator, with the electrical power dissipated with resistive heating elements. We found that intake air density increases with capacitor size as theoretically predicted, ranging from 40 to 60 percent depending on heat transfer. Our experiment was able to produce 29 percent more power compared to using natural aspiration. These results validated that an air capacitor and turbocharger may be a simple, cost effective means of increasing the power density of single cylinder engines.

  13. Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations

    DEFF Research Database (Denmark)

    Dassonneville, R; Brøndum, Rasmus Froberg; Druet, T

    2011-01-01

    for prediction of DGV and in France using a genomic marker-assisted selection approach for prediction of GEBV. Imputation in both studies was done using a combination of the DAGPHASE 1.1 and Beagle 2.1.3 software. Traits considered were protein yield, fertility, somatic cell count, and udder depth. Imputation...... of missing markers and prediction of breeding values were performed using 2 different reference populations in each country: either a national reference population or a combined EuroGenomics reference population. Validation for accuracy of imputation and genomic prediction was done based on national test...

  14. Determining Complex Structures using Docking Method with Single Particle Scattering Data

    Directory of Open Access Journals (Sweden)

    Haiguang Liu

    2017-04-01

    Full Text Available Protein complexes are critical for many molecular functions. Due to intrinsic flexibility and dynamics of complexes, their structures are more difficult to determine using conventional experimental methods, in contrast to individual subunits. One of the major challenges is the crystallization of protein complexes. Using X-ray free electron lasers (XFELs, it is possible to collect scattering signals from non-crystalline protein complexes, but data interpretation is more difficult because of unknown orientations. Here, we propose a hybrid approach to determine protein complex structures by combining XFEL single particle scattering data with computational docking methods. Using simulations data, we demonstrate that a small set of single particle scattering data collected at random orientations can be used to distinguish the native complex structure from the decoys generated using docking algorithms. The results also indicate that a small set of single particle scattering data is superior to spherically averaged intensity profile in distinguishing complex structures. Given the fact that XFEL experimental data are difficult to acquire and at low abundance, this hybrid approach should find wide applications in data interpretations.

  15. Determining Complex Structures using Docking Method with Single Particle Scattering Data.

    Science.gov (United States)

    Wang, Hongxiao; Liu, Haiguang

    2017-01-01

    Protein complexes are critical for many molecular functions. Due to intrinsic flexibility and dynamics of complexes, their structures are more difficult to determine using conventional experimental methods, in contrast to individual subunits. One of the major challenges is the crystallization of protein complexes. Using X-ray free electron lasers (XFELs), it is possible to collect scattering signals from non-crystalline protein complexes, but data interpretation is more difficult because of unknown orientations. Here, we propose a hybrid approach to determine protein complex structures by combining XFEL single particle scattering data with computational docking methods. Using simulations data, we demonstrate that a small set of single particle scattering data collected at random orientations can be used to distinguish the native complex structure from the decoys generated using docking algorithms. The results also indicate that a small set of single particle scattering data is superior to spherically averaged intensity profile in distinguishing complex structures. Given the fact that XFEL experimental data are difficult to acquire and at low abundance, this hybrid approach should find wide applications in data interpretations.

  16. New real-time heartbeat detection method using the angle of a single-lead electrocardiogram.

    Science.gov (United States)

    Song, Mi-Hye; Cho, Sung-Pil; Kim, Wonky; Lee, Kyoung-Joung

    2015-04-01

    This study presents a new real-time heartbeat detection algorithm using the geometric angle between two consecutive samples of single-lead electrocardiogram (ECG) signals. The angle was adopted as a new index representing the slope of ECG signal. The method consists of three steps: elimination of high-frequency noise, calculation of the angle of ECG signal, and detection of R-waves using a simple adaptive thresholding technique. The MIT-BIH arrhythmia database, QT database, European ST-T database, T-wave alternans database and synthesized ECG signals were used to evaluate the performance of the proposed algorithm and compare with the results of other methods suggested in literature. The proposed method shows a high detection rate-99.95% of the sensitivity, 99.95% of the positive predictivity, and 0.10% of the fail detection rate on the four databases. The result shows that the proposed method can yield better or comparable performance than other literature despite the relatively simple process. The proposed algorithm needs only a single-lead ECG, and involves a simple and quick calculation. Moreover, it does not require post-processing to enhance the detection. Thus, it can be effectively applied to various real-time healthcare and medical devices. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. The Seepage Simulation of Single Hole and Composite Gas Drainage Based on LB Method

    Science.gov (United States)

    Chen, Yanhao; Zhong, Qiu; Gong, Zhenzhao

    2018-01-01

    Gas drainage is the most effective method to prevent and solve coal mine gas power disasters. It is very important to study the seepage flow law of gas in fissure coal gas. The LB method is a simplified computational model based on micro-scale, especially for the study of seepage problem. Based on fracture seepage mathematical model on the basis of single coal gas drainage, using the LB method during coal gas drainage of gas flow numerical simulation, this paper maps the single-hole drainage gas, symmetric slot and asymmetric slot, the different width of the slot combined drainage area gas flow under working condition of gas cloud of gas pressure, flow path diagram and flow velocity vector diagram, and analyses the influence on gas seepage field under various working conditions, and also discusses effective drainage method of the center hole slot on both sides, and preliminary exploration that is related to the combination of gas drainage has been carried on as well.

  18. Methods for the preparation of large quantities of complex single-stranded oligonucleotide libraries.

    Science.gov (United States)

    Murgha, Yusuf E; Rouillard, Jean-Marie; Gulari, Erdogan

    2014-01-01

    Custom-defined oligonucleotide collections have a broad range of applications in fields of synthetic biology, targeted sequencing, and cytogenetics. Also, they are used to encode information for technologies like RNA interference, protein engineering and DNA-encoded libraries. High-throughput parallel DNA synthesis technologies developed for the manufacture of DNA microarrays can produce libraries of large numbers of different oligonucleotides, but in very limited amounts. Here, we compare three approaches to prepare large quantities of single-stranded oligonucleotide libraries derived from microarray synthesized collections. The first approach, alkaline melting of double-stranded PCR amplified libraries with a biotinylated strand captured on streptavidin coated magnetic beads results in little or no non-biotinylated ssDNA. The second method wherein the phosphorylated strand of PCR amplified libraries is nucleolyticaly hydrolyzed is recommended when small amounts of libraries are needed. The third method combining in vitro transcription of PCR amplified libraries to reverse transcription of the RNA product into single-stranded cDNA is our recommended method to produce large amounts of oligonucleotide libraries. Finally, we propose a method to remove any primer binding sequences introduced during library amplification.

  19. Method for preparation and readout of polyatomic molecules in single quantum states

    Science.gov (United States)

    Patterson, David

    2018-03-01

    Polyatomic molecular ions contain many desirable attributes of a useful quantum system, including rich internal degrees of freedom and highly controllable coupling to the environment. To date, the vast majority of state-specific experimental work on molecular ions has concentrated on diatomic species. The ability to prepare and read out polyatomic molecules in single quantum states would enable diverse experimental avenues not available with diatomics, including new applications in precision measurement, sensitive chemical and chiral analysis at the single-molecule level, and precise studies of Hz-level molecular tunneling dynamics. While cooling the motional state of a polyatomic ion via sympathetic cooling with a laser-cooled atomic ion is straightforward, coupling this motional state to the internal state of the molecule has proven challenging. Here we propose a method for readout and projective measurement of the internal state of a trapped polyatomic ion. The method exploits the rich manifold of technically accessible rotational states in the molecule to realize robust state preparation and readout with far less stringent engineering than quantum logic methods recently demonstrated on diatomic molecules. The method can be applied to any reasonably small (≲10 atoms) polyatomic ion with an anisotropic polarizability.

  20. DNA-electrophoresis of single cells - a method to screen for irradiated foodstuffs

    International Nuclear Information System (INIS)

    Leffke, A.; Helle, N.; Boegl, K.W.; Schreiber, G.A.

    1993-01-01

    Microelectrophoresis of single cells can be used to detect γ-irradiation over a wide dose range and for a variety of products. It is a simple and rapid test for DNA damages and can be used for screening. The method was tested on cell suspensions of bone marrow and muscle cells from frozen chicken legs, chicken heart, turkey liver, beef and pork irradiated with doses up to 3 kGy. Cell suspensions were prepared by incubation of tissues in EDTA-SDS-buffer at pH 8. Single cell electrophoresis was performed in 0.75% agarose gel. DNA was visualised by silver staining. In unirradiated samples no or only a small amount of DNA penetrated the cell membranes. Cells of irradiated samples appeared like a ''comet'' due to to migration of DNA-fragments out of cell. (orig.)

  1. SINGLE TREE DETECTION FROM AIRBORNE LASER SCANNING DATA USING A MARKED POINT PROCESS BASED METHOD

    Directory of Open Access Journals (Sweden)

    J. Zhang

    2013-05-01

    Full Text Available Tree detection and reconstruction is of great interest in large-scale city modelling. In this paper, we present a marked point process model to detect single trees from airborne laser scanning (ALS data. We consider single trees in ALS recovered canopy height model (CHM as a realization of point process of circles. Unlike traditional marked point process, we sample the model in a constraint configuration space by making use of image process techniques. A Gibbs energy is defined on the model, containing a data term which judge the fitness of the model with respect to the data, and prior term which incorporate the prior knowledge of object layouts. We search the optimal configuration through a steepest gradient descent algorithm. The presented hybrid framework was test on three forest plots and experiments show the effectiveness of the proposed method.

  2. Improved Design Methods for Robust Single- and Three-Phase ac-dc-ac Power Converters

    DEFF Research Database (Denmark)

    Qin, Zian

    After a century of fast developing, the society is facing energy issues again, e.g. the exhaustion of fossil fuel, emission caused air pollution, radiation leakage of nuclear generation, and so on. How to produce and use electricity in a more sustainable, efficient, and cost-effective way thus...... becomes a emerging challenge. Accordingly, installation of sustainable power generators like wind turbines and solar panels has experienced a large increase during the last decades. Meanwhile, power electronics converters, as interfaces in electrical system, are delivering approximately 80 % electricity......, the emerging challenges, and the structure of the thesis. The main content of the thesis starts with single-phase converters: Chapter 2 and Chapter 3 propose new modulation methods for single-phase B6 and H6 converters, respectively, in order to retain the same dc link voltage with two full-bridges connected...

  3. A method for determination of muscle fiber diameter using single fiber potential (SFP) analysis.

    Science.gov (United States)

    Zalewska, Ewa; Nandedkar, Sanjeev D; Hausmanowa-Petrusewicz, Irena

    2012-12-01

    We have used computer simulation to study the relationship between the muscle fiber diameter and parameters: peak-to-peak amplitude and duration of the negative peak of the muscle fiber action potential. We found that the negative peak duration is useful in the determination of fiber diameter via the diameter dependence of conduction velocity. We have shown a direct link between the underlying physiology and the measurements characterizing single fiber potential. Using data from simulations, a graphical tool and an analytical method to estimate the muscle fiber diameter from the recorded action potential has been developed. The ability to quantify the fiber diameter can add significantly to the single fiber electromyography examination. It may help study of muscle fiber diameter variability and thus compliment the muscle biopsy studies.

  4. Twinning processes in Cu-Al-Ni martensite single crystals investigated by neutron single crystal diffraction method

    Czech Academy of Sciences Publication Activity Database

    Molnar, P.; Šittner, P.; Novák, V.; Lukáš, Petr

    2008-01-01

    Roč. 481, Sp.Iss.SI (2008), s. 513-517 ISSN 0921-5093 R&D Projects: GA AV ČR IAA100480704 Institutional research plan: CEZ:AV0Z10480505 Keywords : Cu-Al-Ni * single crystals * neutron diffraction Subject RIV: BG - Nuclear, Atomic and Molecular Physics, Colliders Impact factor: 1.806, year: 2008

  5. Single particle electron microscopy reconstruction of the exosome complex using the random conical tilt method.

    Science.gov (United States)

    Liu, Xueqi; Wang, Hong-Wei

    2011-03-28

    Single particle electron microscopy (EM) reconstruction has recently become a popular tool to get the three-dimensional (3D) structure of large macromolecular complexes. Compared to X-ray crystallography, it has some unique advantages. First, single particle EM reconstruction does not need to crystallize the protein sample, which is the bottleneck in X-ray crystallography, especially for large macromolecular complexes. Secondly, it does not need large amounts of protein samples. Compared with milligrams of proteins necessary for crystallization, single particle EM reconstruction only needs several micro-liters of protein solution at nano-molar concentrations, using the negative staining EM method. However, despite a few macromolecular assemblies with high symmetry, single particle EM is limited at relatively low resolution (lower than 1 nm resolution) for many specimens especially those without symmetry. This technique is also limited by the size of the molecules under study, i.e. 100 kDa for negatively stained specimens and 300 kDa for frozen-hydrated specimens in general. For a new sample of unknown structure, we generally use a heavy metal solution to embed the molecules by negative staining. The specimen is then examined in a transmission electron microscope to take two-dimensional (2D) micrographs of the molecules. Ideally, the protein molecules have a homogeneous 3D structure but exhibit different orientations in the micrographs. These micrographs are digitized and processed in computers as "single particles". Using two-dimensional alignment and classification techniques, homogenous molecules in the same views are clustered into classes. Their averages enhance the signal of the molecule's 2D shapes. After we assign the particles with the proper relative orientation (Euler angles), we will be able to reconstruct the 2D particle images into a 3D virtual volume. In single particle 3D reconstruction, an essential step is to correctly assign the proper orientation

  6. Combustion Model and Control Parameter Optimization Methods for Single Cylinder Diesel Engine

    Directory of Open Access Journals (Sweden)

    Bambang Wahono

    2014-01-01

    Full Text Available This research presents a method to construct a combustion model and a method to optimize some control parameters of diesel engine in order to develop a model-based control system. The construction purpose of the model is to appropriately manage some control parameters to obtain the values of fuel consumption and emission as the engine output objectives. Stepwise method considering multicollinearity was applied to construct combustion model with the polynomial model. Using the experimental data of a single cylinder diesel engine, the model of power, BSFC, NOx, and soot on multiple injection diesel engines was built. The proposed method succesfully developed the model that describes control parameters in relation to the engine outputs. Although many control devices can be mounted to diesel engine, optimization technique is required to utilize this method in finding optimal engine operating conditions efficiently beside the existing development of individual emission control methods. Particle swarm optimization (PSO was used to calculate control parameters to optimize fuel consumption and emission based on the model. The proposed method is able to calculate control parameters efficiently to optimize evaluation item based on the model. Finally, the model which added PSO then was compiled in a microcontroller.

  7. Comparing Single-Point and Multi-point Calibration Methods in Modulated DSC

    Energy Technology Data Exchange (ETDEWEB)

    Van Buskirk, Caleb Griffith [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2017-06-14

    Heat capacity measurements for High Density Polyethylene (HDPE) and Ultra-high Molecular Weight Polyethylene (UHMWPE) were performed using Modulated Differential Scanning Calorimetry (mDSC) over a wide temperature range, -70 to 115 °C, with a TA Instruments Q2000 mDSC. The default calibration method for this instrument involves measuring the heat capacity of a sapphire standard at a single temperature near the middle of the temperature range of interest. However, this method often fails for temperature ranges that exceed a 50 °C interval, likely because of drift or non-linearity in the instrument's heat capacity readings over time or over the temperature range. Therefore, in this study a method was developed to calibrate the instrument using multiple temperatures and the same sapphire standard.

  8. Rapid synthesis of single-phase bismuth ferrite by microwave-assisted hydrothermal method

    International Nuclear Information System (INIS)

    Cao, Wenqian; Chen, Zhi; Gao, Tong; Zhou, Dantong; Leng, Xiaonan; Niu, Feng; Zhu, Yuxiang; Qin, Laishun; Wang, Jiangying; Huang, Yuexiang

    2016-01-01

    This paper describes on the fast synthesis of bismuth ferrite by the simple microwave-assisted hydrothermal method. The phase transformation and the preferred growth facets during the synthetic process have been investigated by X-ray diffraction. Bismuth ferrite can be quickly prepared by microwave hydrothermal method by simply controlling the reaction time, which is further confirmed by Fourier Transform infrared spectroscopy and magnetic measurement. - Graphical abstract: Single-phase BiFeO 3 could be realized at a shortest reaction time of 65 min. The reaction time has strong influences on the phase transformation and the preferred growth facets. - Highlights: • Rapid synthesis (65 min) of BiFeO 3 by microwave-assisted hydrothermal method. • Reaction time has influence on the purity and preferred growth facets. • FTIR and magnetic measurement further confirm the pure phase.

  9. Characterization of single-crystal sapphire substrates by X-ray methods and atomic force microscopy

    International Nuclear Information System (INIS)

    Prokhorov, I. A.; Zakharov, B. G.; Asadchikov, V. E.; Butashin, A. V.; Roshchin, B. S.; Tolstikhina, A. L.; Zanaveskin, M. L.; Grishchenko, Yu. V.; Muslimov, A. E.; Yakimchuk, I. V.; Volkov, Yu. O.; Kanevskii, V. M.; Tikhonov, E. O.

    2011-01-01

    The possibility of characterizing a number of practically important parameters of sapphire substrates by X-ray methods is substantiated. These parameters include wafer bending, traces of an incompletely removed damaged layer that formed as a result of mechanical treatment (scratches and marks), surface roughness, damaged layer thickness, and the specific features of the substrate real structure. The features of the real structure of single-crystal sapphire substrates were investigated by nondestructive methods of double-crystal X-ray diffraction and plane-wave X-ray topography. The surface relief of the substrates was investigated by atomic force microscopy and X-ray scattering. The use of supplementing analytical methods yields the most complete information about the structural inhomogeneities and state of crystal surface, which is extremely important for optimizing the technology of substrate preparation for epitaxy.

  10. Detection of Brucella melitensis and Brucella abortus strains using a single-stage PCR method

    Directory of Open Access Journals (Sweden)

    Alamian, S.

    2015-04-01

    Full Text Available Brucella melitensis and Brucella abortus are of the most important causes of brucellosis, an infectious disease which is transmitted either directly or indirectly including consuming unpasteurized dairy products. Both strains are considered endemic in Iran. Common diagnostic methods such as bacteriologic cultures are difficult and time consuming regarding the bacteria. The aim of this study was to suggest a single-stage PCR method using a pair of primers to detect both B. melitensis and B. abortus. The primers were named UF1 and UR1 and the results showed that the final size of PCR products were 84 bp and 99 bp for B. melitensis and B. abortus, respectively. Therefore the method could be useful for rapid detection of B. melitensis and B. abortus simultaneously.

  11. An improved method for the molecular identification of single dinoflagellate cysts

    Directory of Open Access Journals (Sweden)

    Yangchun Gao

    2017-04-01

    Full Text Available Background Dinoflagellate cysts (i.e., dinocysts are biologically and ecologically important as they can help dinoflagellate species survive harsh environments, facilitate their dispersal and serve as seeds for harmful algal blooms. In addition, dinocysts derived from some species can produce more toxins than vegetative forms, largely affecting species through their food webs and even human health. Consequently, accurate identification of dinocysts represents the first crucial step in many ecological studies. As dinocysts have limited or even no available taxonomic keys, molecular methods have become the first priority for dinocyst identification. However, molecular identification of dinocysts, particularly when using single cells, poses technical challenges. The most serious is the low success rate of PCR, especially for heterotrophic species. Methods In this study, we aim to improve the success rate of single dinocyst identification for the chosen dinocyst species (Gonyaulax spinifera, Polykrikos kofoidii, Lingulodinium polyedrum, Pyrophacus steinii, Protoperidinium leonis and Protoperidinium oblongum distributed in the South China Sea. We worked on two major technical issues: cleaning possible PCR inhibitors attached on the cyst surface and designing new dinoflagellate-specific PCR primers to improve the success of PCR amplification. Results For the cleaning of single dinocysts separated from marine sediments, we used ultrasonic wave-based cleaning and optimized cleaning parameters. Our results showed that the optimized ultrasonic wave-based cleaning method largely improved the identification success rate and accuracy of both molecular and morphological identifications. For the molecular identification with the newly designed dinoflagellate-specific primers (18S634F-18S634R, the success ratio was as high as 86.7% for single dinocysts across multiple taxa when using the optimized ultrasonic wave-based cleaning method, and much higher than that

  12. Missing data treatments matter: an analysis of multiple imputation for anterior cervical discectomy and fusion procedures.

    Science.gov (United States)

    Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Cui, Jonathan J; Basques, Bryce A; Albert, Todd J; Grauer, Jonathan N

    2018-04-09

    The presence of missing data is a limitation of large datasets, including the National Surgical Quality Improvement Program (NSQIP). In addressing this issue, most studies utilize complete case analysis, which excludes cases with missing data, thus potentially introducing selection bias. Multiple imputation, a statistically rigorous approach that approximates missing data and preserves sample size, may be an improvement over complete case analysis. To evaluate the impact of using multiple imputation in comparison to complete case analysis for assessing the associations between preoperative laboratory values and adverse outcomes following anterior cervical discectomy and fusion (ACDF) procedures. Retrospective review of prospectively collected data PATIENT SAMPLE: Patients undergoing one-level ACDF were identified in NSQIP 2012-2015. Perioperative adverse outcome variables assessed included the occurrence of any adverse event, severe adverse events, and hospital readmission. Missing preoperative albumin and hematocrit values were handled using complete case analysis and multiple imputation. These preoperative laboratory levels were then tested for associations with 30-day postoperative outcomes using logistic regression. A total of 11,999 patients were included. Of this cohort, 63.5% of patients were missing preoperative albumin and 9.9% were missing preoperative hematocrit. When utilizing complete case analysis, only 4,311 patients were studied. The removed patients were significantly younger, healthier, of a common BMI and male. Logistic regression analysis failed to identify either preoperative hypoalbuminemia or preoperative anemia as significantly associated with adverse outcomes. When employing multiple imputation, all 11,999 patients were included. Preoperative hypoalbuminemia was significantly associated with the occurrence of any adverse event and severe adverse events. Preoperative anemia was significantly associated with the occurrence of any adverse

  13. Imputing forest carbon stock estimates from inventory plots to a nationally continuous coverage

    Directory of Open Access Journals (Sweden)

    Wilson Barry Tyler

    2013-01-01

    Full Text Available Abstract The U.S. has been providing national-scale estimates of forest carbon (C stocks and stock change to meet United Nations Framework Convention on Climate Change (UNFCCC reporting requirements for years. Although these currently are provided as national estimates by pool and year to meet greenhouse gas monitoring requirements, there is growing need to disaggregate these estimates to finer scales to enable strategic forest management and monitoring activities focused on various ecosystem services such as C storage enhancement. Through application of a nearest-neighbor imputation approach, spatially extant estimates of forest C density were developed for the conterminous U.S. using the U.S.’s annual forest inventory. Results suggest that an existing forest inventory plot imputation approach can be readily modified to provide raster maps of C density across a range of pools (e.g., live tree to soil organic carbon and spatial scales (e.g., sub-county to biome. Comparisons among imputed maps indicate strong regional differences across C pools. The C density of pools closely related to detrital input (e.g., dead wood is often highest in forests suffering from recent mortality events such as those in the northern Rocky Mountains (e.g., beetle infestations. In contrast, live tree carbon density is often highest on the highest quality forest sites such as those found in the Pacific Northwest. Validation results suggest strong agreement between the estimates produced from the forest inventory plots and those from the imputed maps, particularly when the C pool is closely associated with the imputation model (e.g., aboveground live biomass and live tree basal area, with weaker agreement for detrital pools (e.g., standing dead trees. Forest inventory imputed plot maps provide an efficient and flexible approach to monitoring diverse C pools at national (e.g., UNFCCC and regional scales (e.g., Reducing Emissions from Deforestation and Forest

  14. The BDS Triple Frequency Pseudo-range Correlated Stochastic Model of Single Station Modeling Method

    Directory of Open Access Journals (Sweden)

    HUANG Lingyong

    2017-05-01

    Full Text Available In order to provide a reliable pseudo-range stochastic model, a method is studied to estimate the BDS triple-frequency pseudo-range related stochastic model based on three BDS triple-frequency pseudo-range minus carrier (GIF combinations using the data of a single station. In this algorithm, the low order polynomial fitting method is used to fit the GIF combination in order to eliminate the error and other constants except non pseudo noise at first. And then, multiple linear regression analysis method is used to model the stochastic function of three linearly independent GIF combinations. Finally the related stochastic model of the original BDS triple-frequency pseudo-range observations is obtained by linear transformation. The BDS triple-frequency data verification results show that this algorithm can get a single station related stochastic model of BDS triple-frequency pseudo-range observation, and it is advantageous to provide accurate stochastic model for navigation and positioning and integrity monitoring.

  15. SWATH-ID: An instrument method which combines identification and quantification in a single analysis.

    Science.gov (United States)

    Kang, Yang; Burton, Lyle; Lau, Adam; Tate, Stephen

    2017-04-07

    Data-independent acquisition (DIA) approaches, such as SWATH ® -MS, are showing great potential to reliably quantify significant numbers of peptides and proteins in an unbiased manner. These developments have enhanced interest in developing a single DIA method which integrates qualitative and quantitative analysis, eliminating the need of a pre-built library of peptide spectra which are created through data-dependent acquisition (DDA) methods or from public repositories. Here we introduce a new DIA approach, referred to as "SWATH-ID", which was developed to allow peptide identification as well as quantitation. The SWATH-ID method is composed of small Q1 windows, achieving better selectivity and thus significantly improving high-confidence peptide extractions from data files. Furthermore, the SWATH-ID approach transmits precursor ions without fragmentation as well as their fragments within the same SWATH acquisition period. This provides a single scan which includes all precursor ions within the isolation window as well as a record of all of their fragment ions, substantially negating the need for a survey scan. In this way all precursors present in a small Q1 window are associated with their fragment ions, improving the identification specificity and providing a more comprehensive and in-depth view of protein and peptide species in complex samples. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  16. Single determinant N-representability and the kernel energy method applied to water clusters.

    Science.gov (United States)

    Polkosnik, Walter; Massa, Lou

    2017-10-24

    The Kernel energy method (KEM) is a quantum chemical calculation method that has been shown to provide accurate energies for large molecules. KEM performs calculations on subsets of a molecule (called kernels) and so the computational difficulty of KEM calculations scales more softly than full molecule methods. Although KEM provides accurate energies those energies are not required to satisfy the variational theorem. In this article, KEM is extended to provide a full molecule single-determinant N-representable one-body density matrix. A kernel expansion for the one-body density matrix analogous to the kernel expansion for energy is defined. This matrix is converted to a normalized projector by an algorithm due to Clinton. The resulting single-determinant N-representable density matrix maps to a quantum mechanically valid wavefunction which satisfies the variational theorem. The process is demonstrated on clusters of three to twenty water molecules. The resulting energies are more accurate than the straightforward KEM energy results and all violations of the variational theorem are resolved. The N-representability studied in this article is applicable to the study of quantum crystallography. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  17. Power quality improvement of single-phase photovoltaic systems through a robust synchronization method

    DEFF Research Database (Denmark)

    Hadjidemetriou, Lenos; Kyriakides, Elias; Yang, Yongheng

    2014-01-01

    An increasing amount of single-phase photovoltaic (PV) systems on the distribution network requires more advanced synchronization methods in order to meet the grid codes with respect to power quality and fault ride through capability. The response of the synchronization technique selected...... to the harmonic voltage distortion without affecting the dynamic response of the synchronization. Therefore, the accurate response of the proposed MHDC-PLL enhances the power quality of the PV inverter systems and additionally, the proper fault ride-through operation of PV systems can be enabled by the fast...

  18. Sources of variability for the single-comparator method in a heavy-water reactor

    International Nuclear Information System (INIS)

    Damsgaard, E.; Heydorn, K.

    1978-11-01

    The well thermalized flux in the heavy-water-moderated DR 3 reactor at Risoe prompted us to investigate to what extent a single comparator could be used for multi-element determination instead of multiple comparators. The reliability of the single-comparator method is limited by the thermal-to-epithermal ratio, and experiments were designed to determine the variations in this ratio throughout a reactor operating period (4 weeks including a shut-down period of 4-5 days). The bi-isotopic method using zirconium as monitor was chosen, because 94 Zr and 96 Zr exhibit a large difference in their Isub(o)/Σsub(th) values, and would permit determination of the flux ratio with a precision sufficient to determine variations. One of the irradiation facilities comprises a rotating magazine with 3 channels, each of which can hold five aluminium cans. In this rig, five cans, each holding a polyvial with 1 ml of aqueous zirconium solution were irradiated simultaneously in one channel. Irradiations were carried out in the first and the third week of 4 periods. In another facility consisting of a pneumatic tube system, two samples were simultaneously irradiated on top of each other in a polyethylene rabbit. Experiments were carried out once a week for 4 periods. All samples were counted on a Ge(Li)-detector for 95 Zr, 97 sup(m)Nb and 97 Nb. The thermal-to-epithermal flux ratio was calculated from the induced activity, the nuclear data for the two zirconium isotopes and the detector efficiency. By analysis of variance the total variation of the flux ratio was separated into a random variation between reactor periods, and systematic differences between the positions, as well as the weeks in the operating period. If the variations are in statistical control, the error resulting from use of the single-comparator method in multi-element determination can be estimated for any combination of irradiation position and day in the operating period. With the measure flux ratio variations in DR

  19. 29 CFR 98.630 - May the Department of Labor impute conduct of one person to another?

    Science.gov (United States)

    2010-07-01

    ... SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 98.630 May the Department of Labor impute conduct of one person to another? For purposes of actions taken under this rule...

  20. 2 CFR 180.630 - May a Federal agency impute the conduct of one person to another?

    Science.gov (United States)

    2010-01-01

    ... AND SUSPENSION (NONPROCUREMENT) General Principles Relating to Suspension and Debarment Actions § 180.630 May a Federal agency impute the conduct of one person to another? For purposes of actions taken...